Introducing ML
In Chapter 1, Data Science Terminology, we defined ML as giving computers the ability to learn from data without being given explicit rules by a programmer. This definition still holds true. ML is concerned with the ability to ascertain certain patterns (signals) out of data, even if the data has inherent errors in it (noise).
ML models are able to learn from data without the explicit direction of a human. That is the main difference between ML models and classical non-ML algorithms.
Classical algorithms are told directly by a human how to find the best answer in a complex system, and the algorithm then achieves these best solutions, often working faster and more efficiently than a human. However, the bottleneck here is that the human has to first come up with the best solution in order to tell the algorithm what to do. In ML, the model is not told the best solution and, instead, is given several examples of the problem and told to figure out the best solution for itself.
ML is just another tool in the belt of a data scientist. It is on the same level as statistical tests (chi-square or t-tests) or uses basic probability or statistics to estimate population parameters. ML is often regarded as the only thing data scientists know how to do, and this is simply untrue. A true data scientist is able to recognize when ML is applicable and, more importantly, when it is not.
ML is a game of correlations and relationships. Most ML algorithms in existence are concerned with finding and/or exploiting relationships between datasets (often represented as columns in a pandas DataFrame). Once ML algorithms can pinpoint certain correlations, the model can either use these relationships to predict future observations or generalize the data to reveal interesting patterns.
Perhaps a great way to explain ML is to offer an example of a problem coupled with two possible solutions: one using an ML algorithm and the other utilizing a non-ML algorithm.