knn function in R: k-Nearest neighbour classification

knn function in r

In this article, we will show you how to use the knn function in R. The knn function in R can help you find labels of the dataset based on k-nearest neighbor classification. Let’s follow this article to learn more about it with the explanation and examples below.

What does knn do in R?

The knn function in R is built-in in the ‘class’ package. It helps you find the labels of the dataset based on the k-nearest (based on Euclidean distance) neighbor classification. This function needs the training data to train then it can predict the labels of another dataset. The knn function is widely used in machine learning to predict the labels of the data. Let’s take a look at the syntax of this function.

Syntax:

knn(train_data, test_data, cl, k , l , prob , use.all )

Parameters:

  • train_data: The training data is used for this function.
  • test_data: The testing data is used for this function.
  • cl: The factor of correct classifications of the training data.
  • k: The default is 1. The number of k-nearest neighbours to be used.
  • l: The default is 0. The minimum vote for definite decisions.
  • prob: The default is FALSE. The proportion of the predicted label is returned prob or not.
  • use.all: The default is TRUE. To control the handling of the ties.

After learning the usage and the syntax of the knn function, you will learn how to use it in the next title below.

How to use the knn function in R?

The knn function can help you find the labels of the dataset based on the k-nearest (in Euclidean distance) neighbor classification.

But first, you have to install the ‘class’ package to work with it.

Install the ‘class’ packages

You can install the ‘class’ package by running the following command.

install.packages('class')

You can use the knn function after installing the ‘class’ package successfully.

Use the knn function

You can use the knn function to find the k-nearest data is the same function as the given data.

In this example, I will use the dataset ‘iris’, which is available in the R library.

Let’s take a look at the data set.

iris

Output

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa

...
150          5.9         3.0          5.1         1.8  virginica

I will use the knn function to find the species based on its length and width.

Look at the example below.

library(class)

# Get the training data
trainData <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])

# Get the testing data
testData <- rbind(iris3[27:49,,1], iris3[27:49,,2], iris3[27:49,,3])

cl <- factor(c(rep("setos",25), rep("virgi",25), rep("versi",25)))

# Find the labels based on the 3-nearest neighbor labels
knn(trainData, testData, cl, k = 3, prob=TRUE)

Output

[1] setos setos setos setos setos setos setos setos setos setos setos setos setos setos setos setos setos setos setos
[20] setos setos setos setos setos setos virgi virgi versi virgi virgi virgi virgi virgi versi virgi virgi virgi virgi
[39] virgi virgi virgi virgi virgi virgi virgi virgi virgi virgi virgi virgi versi virgi virgi versi versi versi versi
[58] versi versi versi versi versi versi virgi versi versi versi versi versi versi versi versi versi versi versi
attr(,"prob")
…
Levels: setos versi virgi

Summary

You have learned the usage and how to use the knn function in RBy the knn function, you can find the labels of the dataset based on the k-nearest neighbor labels. If you have any questions about this tutorial, leave your comment below, and I will answer your question. I hope this tutorial is helpful to you. Thanks!

Maybe you are interested:

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *