KNN example

Concept Introduction:
- Explain KNN as a supervised learning algorithm for classification and regression.
- Describe how it works: finding the k nearest data points to a query point and using them for classification or prediction.
Mathematics Behind KNN:
- Distance calculation (e.g., Euclidean distance): $d(x, y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}$
- Decision rule: Majority class in the k nearest neighbors.
Step-by-Step Example:
We'll classify a 2D dataset, show decision boundaries, and explain the classification mathematically.

Here’s the Python code to demonstrate KNN with a graph and a simple dataset:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsClassifier

# Generate a simple dataset (2D for visualization)
X, y = make_classification(
    n_samples=100,
    n_features=2,
    n_informative=2,
    n_redundant=0,
    n_clusters_per_class=1,
    random_state=42
)

# Split into two classes (easier for visualization)
class_0 = X[y == 0]
class_1 = X[y == 1]

# Plot the dataset
plt.figure(figsize=(8, 6))
plt.scatter(class_0[:, 0], class_0[:, 1], c='blue', label='Class 0')
plt.scatter(class_1[:, 0], class_1[:, 1], c='red', label='Class 1')
plt.title("Simple 2D Dataset for KNN")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.grid()
plt.show()

# Train KNN on the dataset
k = 5
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X, y)

# Plot decision boundary
def plot_decision_boundary(X, y, model, title="Decision Boundary"):
    h = 0.02
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.figure(figsize=(8, 6))
    plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.Paired)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=plt.cm.Paired)
    plt.title(title)
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.grid()
    plt.show()

plot_decision_boundary(X, y, knn, title=f"KNN Decision Boundary (k={k})")

# Example classification
query_point = np.array([[0, 0]])  # Query point
neighbors = knn.kneighbors(query_point, return_distance=True)

plt.figure(figsize=(8, 6))
plt.scatter(class_0[:, 0], class_0[:, 1], c='blue', label='Class 0')
plt.scatter(class_1[:, 0], class_1[:, 1], c='red', label='Class 1')
plt.scatter(query_point[:, 0], query_point[:, 1], c='green', label='Query Point', s=100)
for i in range(k):
    plt.plot(
        [query_point[0, 0], neighbors[1][0][i, 0]],
        [query_point[0, 1], neighbors[1][0][i, 1]],
        'k--'
    )
plt.title("Query Point and Nearest Neighbors")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.grid()
plt.show()

Explanation of Code:

Dataset:
- A 2D dataset is created using make_classification, with two features and two classes.
- Points from different classes are visualized in different colors.
Visualization:
- The scatter plot shows the distribution of points and how the dataset looks.
Training:
- The KNeighborsClassifier is trained with k=5.
Decision Boundary:
- The decision boundary is plotted to show how the KNN algorithm splits the feature space.
Query Point:
- A query point is shown with its 5 nearest neighbors, visually connecting them.

Mathematical Explanation:

Distance:
For the query point ((x, y) = (0, 0)), calculate the Euclidean distance to 5 nearest points:
$d = \sqrt{(x_1 - x)^2 + (y_1 - y)^2}$
Classification Rule:
If most neighbors belong to Class 0, classify the query point as Class 0, and vice versa.

Explanation of Code:​

Mathematical Explanation:​

Explanation of Code:

Mathematical Explanation: