Classification metrics When evaluating classification models, different metrics are used compared to regression models. These metrics help to understand how well the model is performing, especially in terms of correctly predicting different classes. Let’s look at what they are: 2. Precision (best for binary classification – with only two classes): Also known as positive predictive […]
Understanding decision trees – Predictions Don’t Grow on Trees, or Do They?
Understanding decision trees Decision trees are supervised models that can either perform regression or classification. They are a flowchart-like structure in which each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label (for classification) or a value (for regression). One […]
Exploring the Titanic dataset – Predictions Don’t Grow on Trees, or Do They?
Exploring the Titanic dataset The Titanic dataset is truly a classic in the field of data science, often used to illustrate the fundamentals of ML. It details the tragic sinking of the RMS Titanic, one of the most infamous shipwrecks in history. This dataset serves as a rich source of demographic and travel information about […]
Dummy variables – Predictions Don’t Grow on Trees, or Do They?
Dummy variables Dummy variables are used when we are hoping to convert a categorical feature into a quantitative one. Remember that we have two types of categorical features: nominal and ordinal. Ordinal features have natural order among them, while nominal data does not. Encoding qualitative (nominal) data using separate columns is called making dummy variables, […]
Diving deep into UL – Predictions Don’t Grow on Trees, or Do They?
Diving deep into UL It’s time to see some examples of UL, given that we’ve spent some time on SL algorithms. When to use UL There are many times when UL can be appropriate. Some very common examples include the following: The first tends to be the most common reason that data scientists choose to […]
An illustrative example of clustering – Predictions Don’t Grow on Trees, or Do They?
An illustrative example of clustering Imagine that we have data points in a two-dimensional space, as seen in Figure 11.11: Figure 11.11 – A mock dataset to be clustered Each dot is colored gray to assume no prior grouping before applying the k-means algorithm. The goal here is to eventually color in each dot and […]
An illustrative example – beer! – Predictions Don’t Grow on Trees, or Do They?
An illustrative example – beer! Let’s run a cluster analysis on a new dataset outlining different beers with different characteristics. We know that there are many types of beer, but I wonder if we could possibly group beers into different categories based on different quantitative features. Let’s try! Let’s import a dataset of just a […]
Classification metrics 2 – Predictions Don’t Grow on Trees, or Do They?
For example, imagine we have an email with three words: send cash now. We’ll use naïve Bayes to classify the email as either being spam or ham: We are concerned with the difference of these two numbers. We can use the following criteria to classify any single text sample: Because both equations have P (send […]
The Silhouette Coefficient – Predictions Don’t Grow on Trees, or Do They?
The Silhouette Coefficient The Silhouette Coefficient is a common metric for evaluating clustering performance in situations when the true cluster assignments are not known. The Silhouette Coefficient is a measure used to assess the quality of clusters created by a clustering algorithm. It quantifies how similar an object is to its own cluster (cohesion) compared […]