In this blog, we will talk about the KNN algorithm (K-Nearest Neighbour) which is considered to be one of the simplest and widely used Machine Learning algorithms based on the Supervised Learning technique.
Where did KNN basically start?
KNN was born when two officers of USAF school of Aviation, Hodge and Fix wrote a technical report in 1951, and the research was done for the armed forces.
So, when do we basically use this algorithm?
The main motive of the K-NN algorithm is that it assumes the similarity between the new data cases and the available data cases and classifies the new data point into the category that fits in the most appropriate available categories. KNN can be used for both regression and classification predictive problems but data analysts normally use it for classification problems in the industry.
The two most unique characteristics of KNN algorithm is that it is a non-parametric algorithm i.e., it does not make any assumption on underlying data and it is sometimes called a lazy learner algorithm as it does not learn from the training set immediately. Instead, it stores the dataset and performs an action on the dataset at the time of classification.
Let us consider an example to know it better. Suppose, we have an image of an object that looks similar to a leopard and panther, but we want to know whether it is a leopard or panther. So for this identification, we will use the KNN algorithm, because it works on a similarity measure. Our KNN model will find the similar features of the new data set to the leopard and panther images and based on the most similar features it will put it in either category.
What are the applications of KNN algorithm?
The KNN algorithm is considered to be the most popular algorithms for text mining.
- The agriculture sector also makes use of KNN methods for climate forecasting and estimating soil water parameters.
- In the domain of finance, predicting the price of a stock is done with the help of KNN methods on the basis of company performance measures and economic data.
- The doctors can easily predict whether a patient, hospitalized due to a heart attack, will have a second heart attack,
- The implementation of KNN algorithm also makes an estimate of the amount of glucose in the blood of a diabetic person, and helps doctors to prescribe medicines accordingly..
- Companies like Amazon or Netflix use KNN techniques when recommending books or other accessories to buy for the customer or movies to watch as per their choice of interests.
How does KNN work?
In simple understanding, when we have a data set with two different kinds of points which are labelled as Label A and Label B. And we are asked to figure out the label of a new point in this dataset using KNN algorithm, we do it by taking a vote of its k nearest neighbours. K values normally remain in between 1 to infinity, but in most cases it is less than 30.
Let’s talk about the pros and cons of KNN algorithms.
Some Advantages of KNN algorithm includes:
- Being one of the simplest algorithms, it is easy to interpret with quick calculation time.
- It is versatile and has high accuracy as compared with other supervised learning models
- There is no requirement to make assumptions about data or build a model.
One of the other hand, some disadvantages of KNN are:
- The quality of data highly affects the accuracy and so, a huge amount of data may lead to a slow prediction stage.
- It requires a high memory as it needs to store all the training data and storing all of the training data makes it computationally expensive.
The very common question that might have raised in your minds after reading the above sections is Do I need to learn Python or R (or any other programming language) to use KNN?
Well! It is somewhat partially true that we need to learn at least one of the programming languages to carry out such machine learning algorithms. No doubt, KNN can be performed on GUI based tools like KNIME, but lets still have a working knowledge of how to implement it in Python.
You can check out its implementation using R in Understanding the Concept of KNN Algorithm Using R.
In the below example, we will learn the following steps which are needed to be performed to implement KNN algorithm using Python Programming:
Step 1: Here, we shall be using the iris data set containing 3 classes of 50 instances each, where each class refers to a type of iris plant.
The k-nearest neighbour algorithm is imported from the scikit-learn package with all the other necessary modules which include KNeighborsClassifier, train_test_split, load_iris ,numpy and matplotlib.
Step 2: Once we have completed loading our dataset, the next is to create feature and target variables using X = irisData.data and y = irisData.target as described below.
After that, we have to split our data into training and test data using the train_test_split module. Once splitting is done, it’s time to generate a KNN model using neighbors values.
Foremostly, we have to fit the data into the model using the knn.fit(X_train, y_train)
Step 3: Once the model is fitted as shown above, we can predict on the dataset which the model has not seen before and then calculate the accuracy of the model print(knn.score(X_test, y_test))
Step 5: Finally, let’s generate the plot showing the testing dataset accuracy and training dataset accuracy with n_neighbours represented on the x-axis and accuracy on the y axis.
The ending lines
From our above blog, we can finally conclude that KNN is an effective machine learning algorithm that is quite easy to implement for small datasets such as credit scoring, prediction of cancer cells, image recognition, and many other applications.
We have grasped the knowledge of why we basically use the KNN algorithm with the help of some real-world examples. Then we have understood the working principle i.e., firstly, it calculates the distance between all points. Then, it tends to find the k points that are closest based on the previously calculated distances. Finally, the category is chosen containing the majority of the surrounding points.
Hope you all have enjoyed reading it!
Acknowledgement: We are thankful to Mr Ram Tavva, Senior Data Scientist and Alumnus of IIM- C (Indian Institute of Management – Kolkata) for writing such a wonderful article. He has over 25 years of professional experience having specialization in Data Science, Artificial Intelligence, and Machine Learning.