There are a Growing Number of Resources for People Interested in Machine Learning

Stephen DeAngelis

October 23, 2019

Every now and then I run across articles listing resources for people interested in learning more about artificial intelligence (AI), cognitive computing, and machine learning. And, on occasion, people contact me about resources they provide in which they think readers of Enterra Insights might be interested. I thought I would pull a few of these articles and resources together in one article. The list of resources is not exhaustive. The point I wish to make is that anyone interested in AI, cognitive computing, and/or machine learning can find numerous resources to satisfy their curiosity or get them started on a new career path. One site that provides a stream of articles on these subjects is KDnuggets. I will reference several recent articles from that source. Another useful site is Kite, which focuses on the use of Python. In an email, Tara Smith, from the Kite editorial team, stated, “Kite has a huge repository of Python documentation and our code completion plugin is free to download and try.” Another person who contacted me was Claire Whittaker (@ElizabetClaire), who fields a blog called Artificially Intelligent Claire. Claire wrote, “I help inquisitive millennials who love to learn about tech and AI by blogging learning to code and innovations in the industry.”

 

It’s all about the algorithms

 

Rahul Agarwal (@MLWhiz), a self-described Nerd, Geek, and Data Guy at WalmartLabs, writes, “Algorithms are at the core of data science.”[1] In his article, he discusses five sampling algorithms he believes every data scientist should know. He explains, “Sampling is a critical technical that can make or break a project. Learn more about the most common sampling techniques used, so you can select the best approach while working with your data.” James Le, a staff writer for Built In, discusses 10 of the top algorithms currently being used. He writes, “In machine learning, there’s something called the ‘No Free Lunch’ theorem. In a nutshell, it states that no one algorithm works best for every problem.”[2] His top 10 algorithms are:

 

1. Linear Regression. Le writes, “Linear regression is perhaps one of the most well-known and well-understood algorithms in statistics and machine learning.”

 

2. Logistic Regression. “Logistic regression is another technique borrowed by machine learning from the field of statistics. It is the go-to method for binary classification problems (problems with two class values).”

 

3. Linear Discriminant Analysis. “Logistic Regression is a classification algorithm traditionally limited to only two-class classification problems. If you have more than two classes then the Linear Discriminant Analysis algorithm is the preferred linear classification technique.”

 

4. Classification and Regression Trees. “Decision Trees are an important type of algorithm for predictive modeling machine learning. The representation of the decision tree model is a binary tree.”

 

5. Naive Bayes. “Naive Bayes is a simple but surprisingly powerful algorithm for predictive modeling. The model is comprised of two types of probabilities that can be calculated directly from your training data: 1) The probability of each class; and 2) The conditional probability for each class given each x value.”

 

6. K-Nearest Neighbors. “The KNN algorithm is very simple and very effective. The model representation for KNN is the entire training dataset. … Predictions are made for a new data point by searching through the entire training set for the K most similar instances (the neighbors) and summarizing the output variable for those K instances.”

 

7. Learning Vector Quantization. “A downside of K-Nearest Neighbors is that you need to hang on to your entire training dataset. The Learning Vector Quantization algorithm (or LVQ for short) is an artificial neural network algorithm that allows you to choose how many training instances to hang onto and learns exactly what those instances should look like.”

 

8. Support Vector Machines. “Support Vector Machines are perhaps one of the most popular and talked about machine learning algorithms. A hyperplane is a line that splits the input variable space. In SVM, a hyperplane is selected to best separate the points in the input variable space by their class, either class 0 or class 1.”

 

9. Bagging and Random Forest. “Random Forest is one of the most popular and most powerful machine learning algorithms. It is a type of ensemble machine learning algorithm called Bootstrap Aggregation or bagging. The bootstrap is a powerful statistical method for estimating a quantity from a data sample.”

 

10. Boosting and Adaboost. “Boosting is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers. This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added. AdaBoost was the first really successful boosting algorithm developed for binary classification. It is the best starting point for understanding boosting.”

 

Le’s article goes into greater detail and provides examples of each algorithm’s output. He observes there are many more algorithms and he cautions, “Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms.” As Agarwal noted above, among other available algorithms are sampling algorithms. He discusses five of them: Stratified Sampling; Reservoir Sampling; Random Undersampling and Oversampling; Undersampling and Oversampling using imbalanced-learn; and Oversampling using SMOTE. He concludes, “Sampling is an important topic in data science, and we really don’t talk about it as much as we should. A good sampling strategy sometimes could pull the whole project forward. A bad sampling strategy could give us incorrect results. So one should be careful while selecting a sampling strategy.” Data science is all about algorithms and knowing how to the select the best one for the task at hand. One of Whittaker’s article explains how you can create powerful object detection algorithms. She writes, “One of the most compelling examples of machine learning in action is object detection.”[3] You’ll find lots of other interesting articles as well.

 

The importance of Python

 

Daniel Pyrathon, a Software Engineer at 0x, provides a primer on practical machine learning with Python and Keras on the Kite site. He writes, “Machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to ‘learn’ (e.g., progressively improve performance on a specific task) from data, without being explicitly programmed.”[4] Gregory Piatetsky-Shapiro (@kdnuggets), President of KDnuggets, writes, “Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share.”[5] Daniel Bourke, Machine Learning Engineer, suggests five friendly steps individuals can take to learn machine learning and data science with Python.[6] He writes, “For someone who’s completely new [to machine learning]. Spend a few months learning Python code at the same time [you are learning] different machine learning concepts. You’ll need them both.”

 

If you are really interested in learning to code with Python, go to the Kite site. The site notes, “[You can learn to] code faster in Python with intelligent snippets.” To help, they provide Kite, “a plugin for your integrated development environment (IDE) that uses machine learning to give you useful code completions for Python.” You can download it for free. Matthew Mayo (@mattmayo13), a Machine Learning Researcher and the Editor of KDnuggets, provides “a collection of 10 interesting resources in the form of articles and tutorials for the aspiring data scientist new to Python, meant to provide both insight and practical instruction when starting on your journey.”[7] He notes, “Python is one of the most widely used languages in data science, and an incredibly popular general programming language on its own.” There are a number of programming languages available and in use; however, Python is so widely used, it is a good place to start and Mayo’s list will prove helpful.

 

Concluding thoughts

 

As I noted at the beginning of this article, the list of articles and resources provided above is not exhaustive. It’s a sampler. The bottom line is that there are numerous resources an individual can find to help them hone their machine learning skills.

 

Footnotes
[1] Rahul Agarwal, “The 5 Sampling Algorithms every Data Scientist need to know,” KDnuggets, September 2019.
[2] James Le, “A Tour of the Top 10 Algorithms for Machine Learning Newbies,” Built In, 19 June 2019.
[3] Claire Whittaker, “How to Create Powerful Object Detection Algorithms,” Artificially Intelligent Claire,
[4] Daniel Pyrathon, “Practical Machine Learning with Python and Keras,” Kite Blog, 30 January 2019.
[5] Gregory Piatetsky, “Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis,” KDnuggets, May 2019.
[6] Daniel Bourke, “5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python,” KDnuggets, September 2019.
[7] Matthew Mayo, “10 Great Python Resources for Aspiring Data Scientists,” KDnuggets, September 2019.