As the field of machine learning has become ever more popular, a litany of online courses has emerged claiming to teach the skills necessary to “build a career in AI”.
As the field of machine learning has become ever more popular, a litany of online courses has emerged claiming to teach the skills necessary to “build a career in AI”. But before signing up for such a course, you should know whether the skills acquired will directly allow you to apply machine learning better. These questions are not limited to online courses but rather encompass machine learning classes that have begun to fill lecture halls at many universities. Are these classes that students flock towards actually helping them achieve their practical goals?
Having taken the main slate of the seminal machine learning courses at one of the top universities for AI, I have found a general guideline most classes follow. First, they tend to start with linear classifiers and introduce the concepts of both regression and classification along with the concepts of loss functions and optimization. Afterward, a week or two is spent on honing the skill of backpropagation after which they dive into neural networks fully. If the course focuses on deep learning, it tends to spend the majority of the remaining time diving extensively into the different forms of neural networks (RNN, LSTMs, CNNs, etc) and about recently published seminal architectures (ResNet, BERT, etc). If the course instead focuses on more general machine learning principles, it introduces other avenues such as unsupervised and reinforcement learning.
Thus we see that the key topics covered in these courses can be distilled into the following: an overview of supervised learning, a brief introduction to the mathematical foundations underlying supervised learning and neural networks, and then either an introduction to deep learning methodologies or to other areas of machine learning.
Additionally, taking a look at topics covered in the assignments of these courses helps us ascertain the main learning goals. Assignments are often structured as follows: 1) students are provided with a well-structured dataset 2) a model or core machine learning idea is introduced and students work through underpinnings of the concept 3) students implement the concept 4) run the implemented model on the given dataset and do some light hyperparameter tuning 5) plot the results to see how the idea performs.
Having examined both the content covered in courses as well as that present in assignments we have a basis with which to understand the information students are expected to learn. Machine learning courses hope to impart knowledge about the key models being used in the area the class focuses on. This occurs by briefly covering the theoretical underpinnings of said models and having students implement the main key features on assignments.
Talking with peers that worked in machine learning related industrial positions, I have found that there are a couple key skills necessary to be successful. The first pertinent skill is to understand how to properly clean and analyze data. A fellow classmate related to me that a recent internship required him to spend his first 8 weeks collecting and preprocessing data before he could even begin to apply a model to the dataset. As machine learning models are extremely data dependent, mastering skills that ensure you know how to take advantage of key features of the dataset are extremely important.
Next, at the industry level, we see that large datasets are not available for most tasks. Because of this, many deep learning techniques can not be applied due to the possibility of overfitting and poor generalization. As a result, simpler models such as random forests or logistic regression -- which don’t require large amounts of data -- are often used instead. Thus, being able to properly apply such models using appropriate libraries like sci-kit learn is a valuable skill. In fact, a friend told me that his machine learning internship at Microsoft over the summer involved different variations of only logistic regression. Additionally, with the advent of large pre-trained models in both computer vision and NLP tasks, deep learning can be incorporated in certain scenarios for fine tuning. This further increases the importance of familiarity with seminal models.
Yet, at the research level, where larger data sets are often easily accessible and time constraints aren’t as big of an issue, we can train larger deep-learning models. For instance, consider Open-AI’s GPT-3 model with 175 billion parameters. In order to create such large architectures, the key skill needed is the knowledge of how to engineer such a large-scale deep learning system. This requires intimate knowledge with one of PyTorch or Tensorflow. Doing so allows a researcher to quickly and effectively implement theoreized models.
While being able to implement needed architectures is important, without hyperparameter tuning most models do not perform well. Thus, when creating applied machine learning systems, it is crucial not only to perform hyperparameter tuning but to have intuition on how certain design decisions can be helpful or harmful. Take for example a friend of mine that recently interned at Nvidia. He was having trouble tuning the hyperparameters of the model before realizing that the initialization region he was considering led to the majority of the ReLU activation functions in the architecture to die and hence for learning to stagnate.
Having analyzed both the current state of machine learning education as well as the skills needed to create important applied machine learning systems, we now comment on the gap between the two sides. Based on what classes cover and what applications require, it is clear that students are not taught enough about how to properly manage the data they are working with. Not only do the classes provide students with cleaned up datasets that already have been neatly pre-processed, they don’t promote much exploration beyond visualizing a couple data points. This lack of hands-on learning with how to normalize and explore datasets is detrimental to a student’s practical ability to conduct ML.
Additionally, while classes provide basic intuition on the mathematical background of key frameworks, not enough is done to fully expose students to the theory behind why a given model performs well for a certain task while others don’t. While the student is familiar with a variety of models, they can not discern which models would be the best given a certain dataset and task. Without understanding the mathematical underpinnings of key models and techniques in full detail, students aren’t able to quickly choose the right models for certain scenarios.
Read the full story on The Gradient