Modules

Module 6

Intro to Machine Learning

Common Questions

How do we choose the threshold when splitting the training and test set data?

First, you can consider the threshold prediction as one of the parameters to be chosen during the cross validation (meaning that you can test out different ratios between train and test data and see which yields the best model). But generally anything like 80-20, 70-30, 75-25, 90-10, etc, can be good. (usually there are more training data than testing data!)

For linear regression, is there a good way to identify the outliers? Do we get rid of the outliers?

Using training data find best hyperplane or line that best fit. Find points which are far away from the line or hyperplane. pointer which is very far away from hyperplane remove them considering those point as an outlier. Or ther are linear regression algorithms that helps minimize the effect of outliers (eg. Huber, RANSAC, Theil-Sen, etc).

Can KNN be used for more than 2 classes?

In general 'knn' methods are able to find more than 2 classes (this is called “multi-class classification”)

Resources

Lesson & Assignment Notebook

Modules

Module 0: Setting Up

Module 1: What is Data Science

Module 2: Python & Numpy

Module 3: Pandas

Module 4: Data Visualizations

Module 5: The Data Science Life Cycle

Module 6: Intro to Machine Learning

Module 7: Statistics in Data Science

Module 8: SQL

Module 6

Intro to Machine Learning

What is Machine Learning

Training to Prediction

Linear Regression

K-Nearest Neighbors

Common Questions

How do we choose the threshold when splitting the training and test set data?

For linear regression, is there a good way to identify the outliers? Do we get rid of the outliers?

Can KNN be used for more than 2 classes?

Resources

Lesson & Assignment Notebook

Module Feedback Form

Video Playlist

Presentation Slides