One of my students recently asked me for advice on learning ML. Here’s what I wrote. It’s biased toward my own experience, but should generalize.
My current favorite introduction is Kevin Murphy’s book (Machine Learning). You might also want to look at books by Chris Bishop (Pattern Recognition), Daphne Koller (Probabilistic Graphical Models), and David MacKay (Information Theory, Inference and Learning Algorithms).
Anything you can learn about linear algebra and probability/statistics will be useful. Strang’s Introduction to Linear Algebra, Gelman, Carlin, Stern and Rubin’s Bayesian Data Analysis, and Gelman and Hill’s Data Analysis using Regression and Multilevel/Hierarchical models are some of my favorite books.
Don’t expect to get anything the first time. Read descriptions of the same thing from several different sources.
There’s nothing like trying something yourself. Pick a model and implement it. Work through open source implementations and compare. Are there computational or mathematical tricks that make things work?
Read a lot of papers. When I was a grad student, I had a 20 minute bus ride in the morning and the evening. I always tried to have an interesting paper in my bag. The bus isn’t the important part – what was useful was having about half an hour every day devoted to reading.
Pick a paper you like and “live inside it” for a week. Think about it all the time. Memorize the form of each equation. Take long walks and try to figure out how each variable affects the output, and how different variables interact. Think about how you get from Eq. 6 to Eq. 7 – authors often gloss over algebraic details. Fill them in.
Be patient and persistent. Remember von Neumann: “in mathematics you don’t understand things, you just get used to them.”