Author: ajit jaokar
In various formats, one of the most frequent questions I am asked is the equivalent of:
“Can you recommend a free self-paced learning path for #machinelearning and #deeplearning?”
In this post, I attempt an answer
This is based on my work / teaching students primarily at Oxford University, but I have chosen only free resources here i.e. publicly available.
Usual disclaimers apply i.e. the views are my own
Also. I would encourage you to support the authors by buying paid versions of their books if you can (I do so)
The challenge in creating such a learning path is:
- It needs to be selective – because there is a lot of excellent content on the web – but from a learning standpoint – that can be overwhelming
- You need to know a sequence. I provide a sequence below from experience of teaching
- You need an end-point else you are not motivated to stay with it and you will drop out
So, my suggestion is: Use this learning pathway as a guide but shorten it as you want.
Try to go on a series of small journeys – each of which you will complete.
But overall, try and maintain the sequence and these resources (trust me between them – I don’t think you will miss anything!)
So, the first resource is a book: Python Data Science Handbook – by Jake VanderPlas
The whole book is free on github and it’s a relatively easy book to read
Covers the following topics
1. IPython: Beyond Normal Python
- Help and Documentation in IPython
- Keyboard Shortcuts in the IPython Shell
- IPython Magic Commands
- Input and Output History
- IPython and Shell Commands
- Errors and Debugging
- Profiling and Timing Code
- More IPython Resources
2. Introduction to NumPy
- Understanding Data Types in Python
- The Basics of NumPy Arrays
- Computation on NumPy Arrays: Universal Functions
- Aggregations: Min, Max, and Everything In Between
- Computation on Arrays: Broadcasting
- Comparisons, Masks, and Boolean Logic
- Fancy Indexing
- Sorting Arrays
- Structured Data: NumPy’s Structured Arrays
3. Data Manipulation with Pandas
- Introducing Pandas Objects
- Data Indexing and Selection
- Operating on Data in Pandas
- Handling Missing Data
- Hierarchical Indexing
- Combining Datasets: Concat and Append
- Combining Datasets: Merge and Join
- Aggregation and Grouping
- Pivot Tables
- Vectorized String Operations
- Working with Time Series
- High-Performance Pandas: eval() and query()
- Further Resources
4. Visualization with Matplotlib
- Simple Line Plots
- Simple Scatter Plots
- Visualizing Errors
- Density and Contour Plots
- Histograms, Binnings, and Density
- Customizing Plot Legends
- Customizing Colorbars
- Multiple Subplots
- Text and Annotation
- Customizing Ticks
- Customizing Matplotlib: Configurations and Stylesheets
- Three-Dimensional Plotting in Matplotlib
- Geographic Data with Basemap
- Visualization with Seaborn
- Further Resources
5. Machine Learning
- What Is Machine Learning?
- Introducing Scikit-Learn
- Hyperparameters and Model Validation
- Feature Engineering
- In Depth: Naive Bayes Classification
- In Depth: Linear Regression
- In-Depth: Support Vector Machines
- In-Depth: Decision Trees and Random Forests
- In Depth: Principal Component Analysis
- In-Depth: Manifold Learning
- In Depth: k-Means Clustering
- In Depth: Gaussian Mixture Models
- In-Depth: Kernel Density Estimation
- Application: A Face Detection Pipeline
- Further Machine Learning Resources
Once you have gone through this book, you will know machine learning (but not deep learning)
So, the second resource is not a book i.e. the book is a paid book (which I recommend you buy) but the author’s web site has extensive code which you can run in small ‘cook book’ formats
The book is Machine Learning with Python cookbook by Chris Albon
The website chrisalbon.com and the sequence of code I recommend is as below
I like this format because it fits in the deliberate practise approach of learning i.e. lots of small things practised individually
Finally, two more resources.
- A set of keras examples recently released and
- A free book reference book on Python itself (not machine learning but the core language). The website is Python for Everybody(with multilingual translations)
So, coming back to the details of the second resource from the website chrisalbon.com and the sequence of code I recommend is as below
If you liked this post, please follow me on linkedin Ajit Jaokar
Image source: jooinn
Machine Learning
Basics
- Loading Features From Dictionaries
- Loading scikit-learn’s Boston Housing Dataset
- Loading scikit-learn’s Digits Dataset
- Loading scikit-learn’s Iris Dataset
- Make Simulated Data For Classification
- Make Simulated Data For Clustering
- Make Simulated Data For Regression
- Perceptron In Scikit
- Saving Machine Learning Models
Vectors, Matrices, And Arrays
- Transpose A Vector Or Matrix
- Selecting Elements In An Array
- Reshape An Array
- Invert A Matrix
- Getting The Diagonal Of A Matrix
- Flatten A Matrix
- Find The Rank Of A Matrix
- Find The Maximum And Minimum
- Describe An Array
- Create A Vector
- Create A Sparse Matrix
- Create A Matrix
- Converting A Dictionary Into A Matrix
- Calculate The Trace Of A Matrix
- Calculate The Determinant Of A Matrix
- Calculate The Average, Variance, And Standard Deviation
- Calculate Dot Product Of Two Vectors
- Apply Operations To Elements
- Adding And Subtracting Matrices
Preprocessing Structured Data
- Convert Pandas Categorical Data For Scikit-Learn
- Delete Observations With Missing Values
- Deleting Missing Values
- Detecting Outliers
- Discretize Features
- Encoding Ordinal Categorical Features
- Handling Imbalanced Classes With Downsampling
- Handling Imbalanced Classes With Upsampling
- Handling Outliers
- Impute Missing Values With Means
- Imputing Missing Class Labels
- Imputing Missing Class Labels Using k-Nearest Neighbors
- Normalizing Observations
- One-Hot Encode Features With Multiple Labels
- One-Hot Encode Nominal Categorical Features
- Preprocessing Categorical Features
- Preprocessing Iris Data
- Rescale A Feature
- Standardize A Feature
Preprocessing Images
- Binarize Images
- Blurring Images
- Cropping Images
- Detect Edges
- Enhance Contrast Of Color Image
- Enhance Contrast Of Greyscale Image
- Harris Corner Detector
- Installing OpenCV
- Isolate Colors
- Load Images
- Remove Backgrounds
- Save Images
- Sharpen Images
- Shi-Tomasi Corner Detector
- Using Mean Color As A Feature
Preprocessing Text
- Bag Of Words
- Parse HTML
- Remove Punctuation
- Remove Stop Words
- Replace Characters
- Stemming Words
- Strip Whitespace
- Tag Parts Of Speech
- Term Frequency Inverse Document Frequency
- Tokenize Text
Preprocessing Dates And Times
- Break Up Dates And Times Into Multiple Features
- Calculate Difference Between Dates And Times
- Convert Strings To Dates
- Convert pandas Columns Time Zone
- Encode Days Of The Week
- Handling Missing Values In Time Series
- Handling Time Zones
- Lag A Time Feature
- Rolling Time Window
- Select Date And Time Ranges
Feature Engineering
- Dimensionality Reduction On Sparse Feature Matrix
- Dimensionality Reduction With Kernel PCA
- Dimensionality Reduction With PCA
- Feature Extraction With PCA
- Group Observations Using K-Means Clustering
- Selecting The Best Number Of Components For LDA
- Selecting The Best Number Of Components For TSVD
- Using Linear Discriminant Analysis For Dimensionality Reduction
Feature Selection
- ANOVA F-value For Feature Selection
- Chi-Squared For Feature Selection
- Drop Highly Correlated Features
- Recursive Feature Elimination
- Variance Thresholding Binary Features
- Variance Thresholding For Feature Selection
Model Evaluation
- Accuracy
- Create Baseline Classification Model
- Create Baseline Regression Model
- Cross Validation Pipeline
- Cross Validation With Parameter Tuning Using Grid Search
- Cross-Validation
- Custom Performance Metric
- F1 Score
- Generate Text Reports On Performance
- Nested Cross Validation
- Plot The Learning Curve
- Plot The Receiving Operating Characteristic Curve
- Plot The Validation Curve
- Precision
- Recall
- Split Data Into Training And Test Sets
Model Selection
- Find Best Preprocessing Steps During Model Selection
- Hyperparameter Tuning Using Grid Search
- Hyperparameter Tuning Using Random Search
- Model Selection Using Grid Search
- Pipelines With Parameter Optimization
Linear Regression
- Adding Interaction Terms
- Create Interaction Features
- Effect Of Alpha On Lasso Regression
- Lasso Regression
- Linear Regression
- Linear Regression Using Scikit-Learn
- Ridge Regression
- Selecting The Best Alpha Value In Ridge Regression
Logistic Regression
- Fast C Hyperparameter Tuning
- Handling Imbalanced Classes In Logistic Regression
- Logistic Regression
- Logistic Regression On Very Large Data
- Logistic Regression With L1 Regularization
- One Vs. Rest Logistic Regression
Trees And Forests
- Adaboost Classifier
- Decision Tree Classifier
- Decision Tree Regression
- Feature Importance
- Feature Selection Using Random Forest
- Handle Imbalanced Classes In Random Forest
- Random Forest Classifier
- Random Forest Classifier Example
- Random Forest Regression
- Select Important Features In Random Forest
- Titanic Competition With Random Forest
- Visualize A Decision Tree
Nearest Neighbors
- Identifying Best Value Of k
- K-Nearest Neighbors Classification
- Radius-Based Nearest Neighbor Classifier
Support Vector Machines
- Calibrate Predicted Probabilities In SVC
- Find Nearest Neighbors
- Find Support Vectors
- Imbalanced Classes In SVM
- Plot The Support Vector Classifiers Hyperplane
- SVC Parameters When Using RBF Kernel
- Support Vector Classifier
Naive Bayes
- Bernoulli Naive Bayes Classifier
- Calibrate Predicted Probabilities
- Gaussian Naive Bayes Classifier
- Multinomial Logistic Regression
- Multinomial Naive Bayes Classifier
- Naive Bayes Classifier From Scratch
Clustering
- Agglomerative Clustering
- DBSCAN Clustering
- Evaluating Clustering
- Meanshift Clustering
- Mini-Batch k-Means Clustering
- k-Means Clustering
Deep Learning
Keras
- Feedforward Neural Network For Binary Classification
- Feedforward Neural Network For Multiclass Classification
- Feedforward Neural Networks For Regression
- Adding Dropout
- Convolutional Neural Network
- LSTM Recurrent Neural Network
- Neural Network Early Stopping
- Neural Network Weight Regularization
- Preprocessing Data For Neural Networks
- Save Model Training Progress
- Tuning Neural Network Hyperparameters
- Visualize Loss History
- Visualize Neural Network Architecutre
- Visualize Performance History
- k-Fold Cross-Validating Neural Networks