First, you can consider the threshold prediction as one of the parameters to be chosen during the cross validation (meaning that you can test out different ratios between train and test data and see which yields the best model). But generally anything like 80-20, 70-30, 75-25, 90-10, etc, can be good. (usually there are more training data than testing data!)
Using training data find best hyperplane or line that best fit. Find points which are far away from the line or hyperplane. pointer which is very far away from hyperplane remove them considering those point as an outlier. Or ther are linear regression algorithms that helps minimize the effect of outliers (eg. Huber, RANSAC, Theil-Sen, etc).
In general 'knn' methods are able to find more than 2 classes (this is called “multi-class classification”)