# KNN Classification Using sklearn Module in Python

Classification is often used in machine learning to derive solutions to different business problems. In this article, we will discuss the implementation of the KNN classification algorithm using the sklearn module in Python.

## What is KNN Classification Algorithm?

KNN (K-Nearest Neighbors) is a popular machine-learning algorithm for classification tasks. The basic idea behind the KNN algorithm is to find the K data points in a training set that are closest to a new data point. Then the algorithm classifies the new data point based on the majority class of its K nearest neighbors.

## KNN Classification Algorithm

KNN (K-Nearest Neighbors) is a simple and powerful classification algorithm that is based on the principle of instance-based learning. The main idea behind KNN is to classify a new data point based on the class labels of its closest neighbors in the training data.

The algorithm consists of the following steps:

1. First, we choose the number of nearest neighbors, K.
2. Then, we calculate the distance between the new data point and all the points in the training data.
3. Next, we select the K training points that are closest to the new data point.
4. Finally, we determine the class label of the new data point based on the majority class of its K nearest neighbors.

The distance metric used in KNN can be any standard distance measure, such as Euclidean distance, Manhattan distance, or Minkowski distance.

## The KNeighborsClassifier() Function

The `KNeighborsClassifier() `function defined in the sklearn module is used to perform KNN classification. It has the following syntax.

``sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None) ``

Here,

• The `n_neighbors` parameter is used to decide the number of neighbors to consider while classifying a new data point.
• The `weights` parameter is used to decide the weightage of neighbors for the given sample in the dataset. By default, it is `“uniform”` denoting that all neighbors are weighted equally.
• You can set the weights parameter to ‘`distance`’ if you want to weigh points by the inverse of their distance. In this case, closer neighbors of a data point will have a greater influence than neighbors that are further away.
• You can also pass a user-defined function that accepts an array of distances, and returns an array of the same shape containing the weights. In this way, you can explicitly define the weight of a neighbor as a function of its distance from the data point.
• The `algorithm` parameter is used to compute the nearest neighbors. When it is set to `“auto”`, the function will attempt to decide the most appropriate algorithm for neighbor calculation based on the training data.
• The `metric` parameter is used to decide the metric for computing distances between the data points. By default, the `metric` parameter is set to `“minkowski”`
• If the `metric` parameter is set to `“precomputed”`, the input given to the `fit()` method must be a distance matrix and must be square-shaped. You can set the metric parameter to `“precomputed”` to perform KNN classification for categorical and mixed data types.
• You can also pass a function to the `metric` parameter. In this case, the function must take two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. I.e., the function should calculate the distance between any two data points in the dataset.
• The `n_jobs` parameter is used to decide the number of parallel executions of the KNN classification algorithm. By default, it is set to None. This means that the function runs only one job. If `n_jobs` is set to -1, the KNN classifier defined in the sklearn module executes using all the processors. You can also set a specific number of jobs using the `n_jobs` parameter.

After execution, the `KNeighborsClassifier()` function returns an untrained KNeighborsClassifier object. We can train this untrained classifier using the `fit()` method.

The `fit()` method takes the training data points as its first input argument and their class labels as its second input argument. After execution, it returns a trained sklearn KNN classifier. You can use this trained model to predict class labels for new data points.

## KNN Classification Using the Sklearn Module in Python

To perform KNN classification using the sklearn module in python, we will use the following dataset.

The above dataset contains 15 data points and has three class labels. We will build the KNN classifier using the sklearn module using these data points.

Here, we have clean data with no noise or outliers. In real-world data, you won’t get such quality. Therefore, you might need to perform data preprocessing steps such as data cleaning, normalization, handling missing values, removing bias from the data, and others.

In this article, let us just use the above dataset to understand how the KNN classification algorithm works using the sklearn module in Python.

To perform KNN classification using the above dataset, we will use the following steps.

• First, we will create a list of data points and another list of class labels of the data points. We name this list of data points as `data_points` and the list of class labels as `class_labels`.
• Next, we will create an untrained KNN classifier using the `KNeighborsClassifier() `method defined in the sklearn module. Here, we will take the number of neighbors `n_neighbors` to 3. We will also set the `metric` parameter to “`euclidean`” to use euclidean distance as the distance metric.
• Then, we will train the KNN classifier model using the` fit()` method. The `fit() `method takes the list of data points as its first input argument and the list of class labels as its second input argument. After execution, the `fit()` method will return the trained machine-learning model. For datasets having multiple attributes, you can also pass the dataframe containing attributes as its first input argument and a series or list of class labels for the data points as the second input argument.
• Once we get the trained machine learning model for KNN classification, we can use the `predict()` method to predict class labels. The `predict()` method, when invoked on the trained KNN classifier, takes a list of data points as its input argument. After execution, it returns a list of class labels containing the labels for each data point in the input.

You can observe the entire process in the following example.

``````from sklearn.neighbors import KNeighborsClassifier
#create list of data points
data_points=[(2,10),(2, 6),(11,11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9),(10, 12),(7, 5),(9, 11),(4, 6), (3, 10), (3, 8),(6, 11)]
#create list of class labels
class_labels=["C2","C1","C3", "C2","C1","C1","C2","C2","C3","C1","C3","C1","C2","C2","C2"]
#create untrined model
untrained_model=KNeighborsClassifier(n_neighbors=3, metric="euclidean")
#train model using fit method
trained_model=untrained_model.fit(data_points,class_labels)
#predict class for model
predicted_class=trained_model.predict([(5,7)])
print("The data points are:")
print(data_points)
print("The class labels are:")
print(class_labels)
print("The predicted class label for (5,7) is:")
print(predicted_class)``````

Output:

``````The data points are:
[(2, 10), (2, 6), (11, 11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9), (10, 12), (7, 5), (9, 11), (4, 6), (3, 10), (3, 8), (6, 11)]
The class labels are:
['C2', 'C1', 'C3', 'C2', 'C1', 'C1', 'C2', 'C2', 'C3', 'C1', 'C3', 'C1', 'C2', 'C2', 'C2']
The predicted class label for (5,7) is:
['C1']``````

In this example, we have implemented the KNN classification algorithm using the sklearn module. Then, we used the trained model to predict the class label for the data point (5,7).

## Find Class Labels in The sklearn KNN Classifier

To find the properties of a KNN classifier, we can use the attributes of the trained machine-learning model.

To find the class labels in the KNN classifier, we can use the `classes_` parameter. The `classes_` parameter, when invoked on the trained classifier, returns all the classes in the k-nearest neighbors classification model. You can observe this in the following example.

``````from sklearn.neighbors import KNeighborsClassifier
#create list of data points
data_points=[(2,10),(2, 6),(11,11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9),(10, 12),(7, 5),(9, 11),(4, 6), (3, 10), (3, 8),(6, 11)]
#create list of class labels
class_labels=["C2","C1","C3", "C2","C1","C1","C2","C2","C3","C1","C3","C1","C2","C2","C2"]
#create untrined model
untrained_model=KNeighborsClassifier(n_neighbors=3, metric="euclidean")
#train model using fit method
trained_model=untrained_model.fit(data_points,class_labels)
#predict class for model
predicted_class=trained_model.predict([(5,7)])
print("The data points are:")
print(data_points)
print("The class labels are:")
print(class_labels)
print("The class labels in the model are:")
print(trained_model.classes_)``````

Output:

``````The data points are:
[(2, 10), (2, 6), (11, 11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9), (10, 12), (7, 5), (9, 11), (4, 6), (3, 10), (3, 8), (6, 11)]
The class labels are:
['C2', 'C1', 'C3', 'C2', 'C1', 'C1', 'C2', 'C2', 'C3', 'C1', 'C3', 'C1', 'C2', 'C2', 'C2']
The class labels in the model are:
['C1' 'C2' 'C3']``````

In this example, you can observe that there are three distinct classes in the training data. Hence, the `classes_` attribute of the KNN model returns the list [‘C1’ ‘C2’ ‘C3’].

## Find the Number of Training Samples in the KNN Classifier

You can also find the number of data points used while training the K-Nearest neighbors classifier. For this, you can use the `n_samples_fit_` attribute of the trained model. The `n_samples_fit_` attribute contains the number of training samples passed to the `fit()` method. You can observe this in the following example.

``````from sklearn.neighbors import KNeighborsClassifier
#create list of data points
data_points=[(2,10),(2, 6),(11,11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9),(10, 12),(7, 5),(9, 11),(4, 6), (3, 10), (3, 8),(6, 11)]
#create list of class labels
class_labels=["C2","C1","C3", "C2","C1","C1","C2","C2","C3","C1","C3","C1","C2","C2","C2"]
#create untrined model
untrained_model=KNeighborsClassifier(n_neighbors=3, metric="euclidean")
#train model using fit method
trained_model=untrained_model.fit(data_points,class_labels)
#predict class for model
predicted_class=trained_model.predict([(5,7)])
print("The data points are:")
print(data_points)
print("The class labels are:")
print(class_labels)
print("The number of data points in training is:")
print(trained_model.n_samples_fit_)
``````

Output:

``````The data points are:
[(2, 10), (2, 6), (11, 11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9), (10, 12), (7, 5), (9, 11), (4, 6), (3, 10), (3, 8), (6, 11)]
The class labels are:
['C2', 'C1', 'C3', 'C2', 'C1', 'C1', 'C2', 'C2', 'C3', 'C1', 'C3', 'C1', 'C2', 'C2', 'C2']
The number of data points in training are:
15``````

We have passed 15 data points to the `fit()` method. Hence, you can observe that the `n_samples_fit_` attribute of the trained KNN model contains the value 15.

## Conclusion

In this article, we have discussed the K-Nearest Neighbors classification algorithm using the sklearn module in Python. We also saw how to determine different attributes of a trained KNN classifier created using the `KNeighborsClassifier()` function defined in the sklearn module.