Author: Vincent Granville
By Ajit Jaokar and Dan Howarth. With contributions from Ayse Mutlu.
Exclusively for Data Science Central members, with free access. You can download this book (PDF) here.
This tutorial began as a series of weekend workshops created by Ajit Jaokar and Dan Howarth. The idea was to work with a specific (longish) program such that we explore as much of it as possible in one weekend. This book is an attempt to take this idea online. The best way to use this book is to work with the Python code as much as you can. The code has comments. But you can extend the comments by the concepts explained here.
Content
1. Introduction and approach 4
2. Background, tools and philosophy 6
- What you will learn from this book? 6
- Components for book 7
- Big Picture Diagram 7
3. Code outline 7
- Regression code outline 7
- Classification Code Outline 8
4. Exploratory data analysis and graphics 8
- Numeric descriptive statistics 8
- Interpreting descriptive statistics 9
- Understanding the distribution 10
- Histograms 10
- Boxplots and IQR 10
- Correlation 11
- heatmaps for co-relation 12
- Analysing the target variable 13
5. Pre-processing data 13
- Dealing with missing values 13
- Treatment of categorical values 13
- Normalise the data 14
- Split the data 15
6. Choose a Baseline algorithm 15
- Defining / instantiating the baseline model 15
- Fitting the model we have developed to our training set 16
- Define the evaluation metric 16
- Predict scores against our test set and assess how good it is 18
7. Evaluation metrics for classification 18
- Improving a model – from baseline models to final models 21
- Understanding cross validation 21
- Feature engineering 24
- Regularization to prevent overfitting 24
- Ensembles – typically for classification 26
- Test alternative models 27
- Hyperparameter tuning 28
8. Conclusion 28
A1. Regression Code 29
A2. Classification Code 36
To access the book, and if you are not yet a DSC member, you can register as a member, following this link.