By Prasoon Goyal – PhD student in AI at UT Austin
I. THE BIG PICTURE:
Problem we are trying to solve: Given some data, the goal of machine learning is to find pattern in the data. There are various settings, like supervised learning, unsupervised learning, reinforcement learning, etc. But the most common one is supervised learning; so we’re going to focus only on that in the big picture. Here, you are given labelled data [called the “training data”], and you want to infer labels on new data [called the “test data”]. For instance, consider selfdriving cars. Labelled data would include the image of the road ahead at a particular instance as seen from the car, and the corresponding label would be the steering angle [let’s assume the speed is controlled manually, for simplicity]. The goal of selfdriving car is, given a new image of the road ahead, the system should be able to figure out the optimal steering angle.
How to solve: Most of supervised machine learning can be looked at using the following framework — You are given training data points $(x_1,y_1),…,(x_n,y_n)$, where $x_i$ is the data [e.g. road image in the example above], and $y_i$ is the corresponding label. You want to find a function ff that fits the data well, that is, given $x_i$, it outputs something close enough to $y_i$. Now where do you get this function ff from?
One way, which is the most common in ML, is to define a class of functions F , and search in this class the function that best fits the data.
For example, if you want to predict the price of an apartment based on features like number of bedrooms, number of bathrooms, covered area, etc. you can reasonably assume that the price is a linear combination of all these features, in which case, the function class F is defined to be the class of all linear functions. For selfdriving cars, the function class F you need will be much more complex.
How to evaluate: Note that just fitting the training data is not enough. Data are noisy — for instance, every apartment with the same number of bedrooms, same number of bathrooms and same covered area are not priced equally. Similarly, if you label data for selfdriving cars, you can expect some randomness due to the human driver. What you need is that your framework should be able to extract out the pattern, and ignore the random noise. In other words, it should do well on unseen data. Therefore, the way to evaluate models is to hold out a part of the training data [called “validation set”], and predict on this held out data to measure how good your model is.
Now whatever you study in machine learning, you should try to relate the topics to the above big picture. For instance, in linear regression, the function class is linear and the evaluation method is square loss, in linear SVM, the function class is linear and the evaluation method is hinge loss, and so on. First understand these algorithms at highlevel. Then, go into the technical details. You will see that finding the best function ff in the function class FF often results in an optimization problem, for which you use stochastic gradient descent.
II. ROADMAP FOR LEARNING MACHINE LEARNING:
To have a basic mathematical background, you need to have some knowledge of the following mathematical concepts:
– Probability and statistics
– Linear algebra
– Optimization
– Multivariable calculus
– Functional analysis (not essential)
– Firstorder logic (not essential)
You can find some reasonable material on most of these by searching for “<topic> lecture notes” on Google. Usually, you’ll find good lecture notes compiled by some professor teaching that course. The first few results should give you a good set to choose from.
See Prasoon Goyal’s answer to How should I start learning the maths for machine learning and from where?
Skim through these. You don’t need to go through them in a lot of detail. You can come back to studying the math as and when required while learning ML.
Then, for a quick overview of ML, you can follow the roadmap below.
Day 1:
 Basic terminology:

 Most common settings: Supervised setting, Unsupervised setting, Semisupervised setting, Reinforcement learning.
 Most common problems: Classification (binary & multiclass), Regression, Clustering.
 Preprocessing of data: Data normalization.
 Concepts of hypothesis sets, empirical error, true error, complexity of hypotheses sets, regularization, biasvariance tradeoff, loss functions, crossvalidation.
Day 2:
 Optimization basics:

 Terminology & Basic concepts: Convex optimization, Lagrangian, Primaldual problems, Gradients & subgradients, ℓ1ℓ1 and ℓ2ℓ2regularized objective functions.
 Algorithms: Batch gradient descent & stochastic gradient descent, Coordinate gradient descent.
 Implementation: Write code for stochastic gradient descent for a simple objective function, tune the step size, and get an intuition of the algorithm.
Day 3:
 Classification:

 Logistic Regression
 Support vector machines: Geometric intuition, primaldual formulations, notion of support vectors, kernel trick, understanding of hyperparameters, grid search.
 Online tool for SVM: Play with this online SVM tool (scroll down to “Graphic Interface”) to get some intuition of the algorithm.
Day 4:
 Regression:

 Ridge regression
 Clustering:

 kmeans & ExpectationMaximization algorithm.
 Topdown and bottomup hierarchical clustering.
Day 5:
 Bayesian methods:

 Basic terminology: Priors, posteriors, likelihood, maximum likelihood estimation and maximumaposteriori inference.
 Gaussian Mixture Models
 Latent Dirichlet Allocation: The generative model and basic idea of parameter estimation.
Day 6:
 Graphical models:

 Basic terminology: Bayesian networks, Markov networks / Markov random fields.
 Inference algorithms: Variable elimination, Belief propagation.
 Simple examples: Hidden Markov Models. Ising model.
Days 7–8:

 Basic terminology: Neuron, Activation function, Hidden layer.
 Convolutional neural networks: Convolutional layer, pooling layer, Backpropagation.
 Memorybased neural networks: Recurrent Neural Networks, Longshort term memory.
 Tutorials: I’m familiar with this Torch tutorial (you’ll want to look at 1_supervised1_supervised directory). There might be other tutorials in other deep learning frameworks.
Day 9:
 Miscellaneous topics:

 Ensemble methods
 Decision trees
 Recommender systems
 Markov decision processes
 Multiarmed bandits
Day 10: (Budget day)
 You can use the last day to catch up on anything left from previous days, or learn more about whatever topic you found most interesting / useful for your future work.
Once you’ve gone through the above, you’ll want to start going through some standard online course or ML text. Andrew Ng’s course on Coursera is a good starting point. An advanced version of the course is available on The Open Academy (Machine Learning  The Open Academy). The popular books that I have some experience with are the following:
 Pattern Recognition and Machine Learning: Christopher Bishop
 Machine Learning: A Probabilistic Perspective: Kevin P. Murphy
While Murphy’s book is more current and is more elaborate, I find Bishop’s to be more accessible for beginners. You can choose one of them according to your level.
At this point, you should have a working knowledge of machine learning. Beyond this, if you’re interested in a particular topic, look for specific online resources on the topic, read seminal papers in the subfield, try finding some simpler problems and implement them.
For deep learning, here’s a tutorial from Yoshua Bengio’s lab that was written in the initial days of deep learning : Deep Learning Tutorials. This explains the central ideas in deep learning, without going into a lot of detail.
Because deep learning is a field that is more empirical than theoretical, it is important to code and experiment with models. Here is a tutorial in TensorFlow that gives implementations of many different deep learning tasks — aymericdamien/TensorFlowExamples. Try running the algorithms, and play with the code to understand the underlying concepts better.
Finally, you can refer to Deep Learning book, which explains deep learning in a much more systematic and detailed manner. For the latest algorithms that are not in the book, you’ll have to refer to the original papers.
III. TIPS ON IMPLEMENTATION:
There are different levels at which you can understand an algorithm.
At the highest level, you know what an algorithm is trying to do and how. So for instance, gradient descent finds a local minimum by taking small steps along the negative gradient.
Going slightly deeper, you will delve into the math. Again, taking gradient descent for example, you will learn about how to take gradient for vector quantities, norms, etc. At about the same level of depth, you’ll also have other variants of the algorithm, like handling constraints in gradient descent. This is also the level at which you learn how to use libraries to run your specific algorithm.
Further deeper, you implement the algorithm from scratch, with minor optimization tricks. For instance, in Python, you will want to use vectorization. Consider the following two code snippets:
# Version 1:
import numpy as np
N = 10000000
a = np.random.rand(N,1)
b = np.random.rand(N,1)
for i in range(N):
s = s + a[i] * b[i]
print s
# Version 2:
import numpy as np
N = 10000000
a = np.random.rand(N,1)
b = np.random.rand(N,1)
s = a * b
print s
They both have the same functionality, but the second one is 20 times faster. Similarly, you will learn some other important implementation techniques, such as parallelizing code, profiling, etc. You will also learn some algorithmspecific details, like how to initialize your model for faster convergence, how to set the termination condition to tradeoff accuracy and training time, how to handle corner cases [like saddle points in gradient descent], etc. Finally, you will learn techniques to debug machine learning code, which is often tricky for beginners.
Finally, comes the depth at which libraries are written. This requires way more systems knowledge than the previous steps — knowing how to handle very large data, computational efficiency, effective memory management, writing GPU code, effective multithreading, etc.
Now, in how much detail do you need to know the algorithms? For the most part, you don’t need to know the algorithms at the depth of libraryimplementation, unless you are into systems programming. For most important algorithms in ML — like gradient descent, SVM, logistic regression, neural networks, etc. — you need to understand the math, and how to use libraries to run them. This would be sufficient if you are not an ML engineer, and only use ML as a blackbox in your daily work.
However, if you are going to be working as an ML engineer / data scientist / research scientist, you need to also implement some algorithms from scratch. Usually the ones covered in online courses are enough. This helps you learn many more nuances of different tools and algorithms. Also, this will help you with new algorithms that you might need to implement.
You can take courses on mathsgee today. Check out the list below