What is K-Nearest Neighbors (KNN)?
KNN is like the “ask your neighbors” rule in real life. When you need to make a decision or guess something, you check what the closest people (or examples) around you are doing and follow the majority.
It’s a simple machine learning algorithm that looks at who’s closest to you to make a prediction.
Real-Life Analogy: Picking a Movie
Imagine you’re deciding what movie to watch. You ask your neighbors (friends).
- 3 friends love action movies.
- 2 friends love comedy movies.
Since most of your friends love action movies, you decide to watch an action movie too!
Here, the K in KNN is the number of neighbors you ask. If K=5K = 5K=5, you ask 5 friends and follow the majority.
How Does It Work?
- Collect Data: You start with a group of labeled examples. Example: Movie preferences of your friends:
- Find Closest Neighbors: When a new person (you!) arrives, you check the preferences of the 5 nearest friends.
- Decide the Majority: Count the labels (e.g., Action or Comedy).
- Predict: Choose the most common label among your neighbors.
Real-Life Example: Fruit Identification
Imagine you find a fruit and don’t know what it is. You compare it with nearby fruits based on:
- Size
- Color
- Weight
If it looks similar to apples near you, you call it an apple!
For example:
Now, you find a fruit that is:
- Size: 7.5 cm
- Color: Red
- Weight: 155 g
You compare it to the data above. It’s closest to the apples, so you classify it as an Apple.
How Does It Look on a Graph?
- Points on a Graph: Each data point represents a fruit.
- New Point: A new fruit (unknown) is plotted on the graph.
- Nearest Neighbors: The algorithm checks which points are closest.
Why is KNN Useful?
- Easy to Understand: It mimics real-life decision-making.
- Flexible: Works for classification (like movies) or regression (like predicting house prices).
- Non-Mathematical: No complex equations are needed to understand it.
How KNN Relates to Deep Learning
In Deep Learning, some algorithms use distance-based concepts like KNN to understand relationships between data points. Neural networks, for example, might learn how to group similar items together, just like KNN does.
Python Example: Classifying Fruit
# Step 1: Import libraries
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
# Step 2: Define data
# Features: [Size (cm), Weight (g)]
X = np.array([[7, 150], [6, 120], [8, 160], [5, 110], [6, 130]])
# Labels: 0 = Apple, 1 = Banana
y = np.array([0, 1, 0, 1, 1])
# Step 3: Create the KNN model
knn = KNeighborsClassifier(n_neighbors=3) # K = 3
knn.fit(X, y)
# Step 4: Predict for a new fruit
new_fruit = np.array([[7.5, 155]]) # New fruit: size=7.5 cm, weight=155 g
prediction = knn.predict(new_fruit)
# Step 5: Output the result
print("Prediction:", "Apple" if prediction[0] == 0 else "Banana")
Output:
Prediction: Apple
Recap of KNN
KNN is like saying: “When in doubt, ask your closest friends or neighbors!” It predicts outcomes by comparing a new item to the nearest items it already knows about.
Real-Life Example 1: Predicting Sports Preference
Imagine you just moved to a new school and want to find out what sport most students like to play.
- You meet some students and find out:
- A new student arrives, and they look similar to Student 1, Student 3, and Student 5.
- Since most neighbors like football, you predict that this new student also likes football.
Real-Life Example 2: House Price Prediction
Imagine you’re buying a house. You want to know the price of a house in a neighborhood based on its:
- Size (in square feet)
- Number of bedrooms
You check nearby houses for their prices:
A new house appears with:
- Size = 1300 sqft
- Bedrooms = 3
You compare this house with its nearest neighbors (based on size and bedrooms). Using KNN, you find that the house should cost around $300,000.
Visual Example with Graph
Scenario: Classifying Students’ Grades
You’re classifying students as pass or fail based on their:
- Study hours
- Sleep hours
You plot these on a graph:
- Points:
- New Student: A new student studied for 6 hours and slept for 8 hours.
- Prediction:
Real-Life Example 3: Diagnosing Illness
Let’s say a doctor wants to diagnose whether a patient has Disease A or Disease B based on symptoms:
- Temperature (°C)
- Heart Rate (bpm)
The doctor has previous patient data:
A new patient has:
- Temperature = 38.0°C
- Heart Rate = 100 bpm
Using KNN:
- The algorithm compares the new patient with existing data.
- The closest patients suggest the diagnosis is likely Disease A.
Practical Example: Using Python
Example 1: Classifying Fruit
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
# Data: Features = [Size, Weight]
X = np.array([[7, 150], [6, 120], [8, 160], [5, 110], [6, 130]])
# Labels: 0 = Apple, 1 = Banana
y = np.array([0, 1, 0, 1, 1])
# Model: KNN with 3 neighbors
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)
# Predict for a new fruit
new_fruit = np.array([[7.5, 155]]) # New fruit: Size=7.5 cm, Weight=155 g
prediction = knn.predict(new_fruit)
print("Prediction:", "Apple" if prediction[0] == 0 else "Banana")
Example 2: Predicting Grades
# Data: Features = [Study Hours, Sleep Hours]
X = np.array([[6, 8], [5, 6], [8, 8], [4, 5], [3, 6]])
# Labels: 1 = Pass, 0 = Fail
y = np.array([1, 0, 1, 0, 0])
# Model: KNN with 3 neighbors
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)
# Predict for a new student
new_student = np.array([[6, 7]]) # New student: Study=6 hours, Sleep=7 hours
prediction = knn.predict(new_student)
print("Prediction:", "Pass" if prediction[0] == 1 else "Fail")
Key Points to Remember
- KNN is Simple: It just checks which data points (neighbors) are closest to the new point.
- Uses Distances: The idea of “closeness” is based on the distance between data points.
- Flexibility: Can be used for classification (e.g., pass/fail) or regression (e.g., predicting house prices).
Summary
KNN is a real-life-inspired algorithm that mimics how we make decisions by asking our neighbors or checking similar examples. From classifying fruits to diagnosing diseases or predicting grades, KNN is a great starting point for understanding machine learning! 😊