Communicating Data and Data Science Certification Exams

Classification metrics – Predictions Don’t Grow on Trees, or Do They?

2022-01-172024-09-03 Marilyn Maha Posted in Certification Exams of Data Science, Data Science Certification Exams, Introducing ML, On the more formal side of things, Understanding decision treesLeave a Comment

Classification metrics When evaluating classification models, different metrics are used compared to regression models. These metrics help to understand how well the model is performing, especially in terms of correctly predicting different classes. Let’s look at what they are: 2. Precision (best for binary classification – with only two classes): Also known as positive predictive […]

Understanding decision trees – Predictions Don’t Grow on Trees, or Do They?

2021-11-112024-09-03 Marilyn Maha Posted in Certification Exams of Data Science, Data Science Certification Exams, On the more formal side of things, Performing naïve Bayes classification, Regression metricsLeave a Comment

Understanding decision trees Decision trees are supervised models that can either perform regression or classification. They are a flowchart-like structure in which each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label (for classification) or a value (for regression). One […]

Exploring the Titanic dataset – Predictions Don’t Grow on Trees, or Do They?

2021-10-032024-09-03 Marilyn Maha Posted in Certification Exams of Data Science, Data Science Certification Exams, On the more formal side of things, Performing naïve Bayes classification, Understanding decision treesLeave a Comment

Exploring the Titanic dataset The Titanic dataset is truly a classic in the field of data science, often used to illustrate the fundamentals of ML. It details the tragic sinking of the RMS Titanic, one of the most infamous shipwrecks in history. This dataset serves as a rich source of demographic and travel information about […]

Dummy variables – Predictions Don’t Grow on Trees, or Do They?

2021-08-032024-09-03 Marilyn Maha Posted in Certification Exams of Data Science, Data Science Certification Exams, Performing naïve Bayes classification, Regression metrics, Understanding decision treesLeave a Comment

Dummy variables Dummy variables are used when we are hoping to convert a categorical feature into a quantitative one. Remember that we have two types of categorical features: nominal and ordinal. Ordinal features have natural order among them, while nominal data does not. Encoding qualitative (nominal) data using separate columns is called making dummy variables, […]

Diving deep into UL – Predictions Don’t Grow on Trees, or Do They?

2021-06-202024-09-03 Marilyn Maha Posted in Certification Exams of Data Science, Data Science Certification Exams, Introducing ML, On the more formal side of things, Performing naïve Bayes classification, Regression metricsLeave a Comment

Diving deep into UL It’s time to see some examples of UL, given that we’ve spent some time on SL algorithms. When to use UL There are many times when UL can be appropriate. Some very common examples include the following: The first tends to be the most common reason that data scientists choose to […]

An illustrative example of clustering – Predictions Don’t Grow on Trees, or Do They?

2021-05-152024-09-03 Marilyn Maha Posted in Certification Exams of Data Science, Data Science Certification Exams, Introducing ML, On the more formal side of things, Performing naïve Bayes classification, Understanding decision treesLeave a Comment

An illustrative example of clustering Imagine that we have data points in a two-dimensional space, as seen in Figure 11.11: Figure 11.11 – A mock dataset to be clustered Each dot is colored gray to assume no prior grouping before applying the k-means algorithm. The goal here is to eventually color in each dot and […]

An illustrative example – beer! – Predictions Don’t Grow on Trees, or Do They?

2021-03-052024-09-03 Marilyn Maha Posted in Certification Exams of Data Science, Data Science Certification Exams, On the more formal side of things, Performing naïve Bayes classification, Regression metrics, Understanding decision treesLeave a Comment

An illustrative example – beer! Let’s run a cluster analysis on a new dataset outlining different beers with different characteristics. We know that there are many types of beer, but I wonder if we could possibly group beers into different categories based on different quantitative features. Let’s try! Let’s import a dataset of just a […]

Classification metrics 2 – Predictions Don’t Grow on Trees, or Do They?

2021-01-182024-09-03 Marilyn Maha Posted in Certification Exams of Data Science, Data Science Certification Exams, Introducing ML, On the more formal side of things, Performing naïve Bayes classification, Regression metrics, Understanding decision treesLeave a Comment

For example, imagine we have an email with three words: send cash now. We’ll use naïve Bayes to classify the email as either being spam or ham: We are concerned with the difference of these two numbers. We can use the following criteria to classify any single text sample: Because both equations have P (send […]

The Silhouette Coefficient – Predictions Don’t Grow on Trees, or Do They?

2021-01-032024-09-03 Marilyn Maha Posted in Certification Exams of Data Science, Data Science Certification Exams, Introducing MLLeave a Comment

The Silhouette Coefficient The Silhouette Coefficient is a common metric for evaluating clustering performance in situations when the true cluster assignments are not known. The Silhouette Coefficient is a measure used to assess the quality of clusters created by a clustering algorithm. It quantifies how similar an object is to its own cluster (cohesion) compared […]

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31