Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Instance-Based Learning: k-Nearest Neighbor Algorithm - Prof. Gregory J. Hamerly, Study notes of Computer Science

An introduction to instance-based learning, specifically the k-nearest neighbor algorithm. The algorithm is outlined, including the key idea of storing all training examples and using the nearest neighbors to predict new instances. Applications in regression, classification, and probability density estimation are discussed, as well as the use of similarity metrics and p-norms for comparing points. The advantages and disadvantages of using nearest neighbor are also covered, along with the behavior of the classifier in the limit and the concepts of bias and variance.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-7ls
koofers-user-7ls 🇺🇸

10 documents

1 / 14

Toggle sidebar

Related documents


Partial preview of the text

Download Instance-Based Learning: k-Nearest Neighbor Algorithm - Prof. Gregory J. Hamerly and more Study notes Computer Science in PDF only on Docsity! Intro. to machine learning (CSI 5325) Lecture 18: Instance-based learning Intro. to machine learning (CSI 5325) Lecture 18: Instance-based learning Greg Hamerly Some content from Tom Mitchell. 1 / 14 Intro. to machine learning (CSI 5325) Lecture 18: Instance-based learning Outline 1 k-Nearest Neighbor 2 / 14 Intro. to machine learning (CSI 5325) Lecture 18: Instance-based learning k-Nearest Neighbor Regression applications Regression: mean of f values over k nearest neighbors: f̂ (xq)← 1 k ∑ i∈N(xq ,k) f (xi ) -1 -0.5 0 0.5 1 0 0.2 0.4 0.6 0.8 1 5 / 14 Intro. to machine learning (CSI 5325) Lecture 18: Instance-based learning k-Nearest Neighbor Classification applications Classification: most popular f value over k nearest neighbors: f̂ (xq)← mode i∈N(xq ,k) f (xi ) -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 6 / 14 Intro. to machine learning (CSI 5325) Lecture 18: Instance-based learning k-Nearest Neighbor Probability density applications f̂ (xq)← k − 1 nV (k, xq, d) n is the number of examples in the dataset, and V is the volume of the smallest d-sphere centered at xq that covers the k neighbors -10 -5 0 5 10 -15 -10 -5 0 5 10 7 / 14 Intro. to machine learning (CSI 5325) Lecture 18: Instance-based learning k-Nearest Neighbor Classifier behavior in the Limit Consider p(x) defines probability that instance x will be labeled 1 (positive) versus 0 (negative). Nearest neighbor: As n→∞, approaches Gibbs Algorithm Gibbs: with probability p(x) predict 1, else 0 k-Nearest neighbor: As n→∞ and k gets large, approaches Bayes optimal Bayes optimal: if p(x) > .5 then predict 1, else 0 Recall Gibbs has at most twice the expected error of Bayes optimal 10 / 14 Intro. to machine learning (CSI 5325) Lecture 18: Instance-based learning k-Nearest Neighbor Bias and variance in k-NN What is the inductive bias of k-NN? What is the statistical bias and statistical variance of k-NN? In particular, how do they vary with k? Think about extreme k values, e.g. k = 1 or k = n. 11 / 14 Intro. to machine learning (CSI 5325) Lecture 18: Instance-based learning k-Nearest Neighbor Distance-Weighted k-NN Might want to weight nearer neighbors more heavily... f̂ (xq)← ∑ i∈N(xq ,k) wi f (xi )∑ i∈N(xq ,k) wi where wi ≡ 1 d(xq, xi )2 and d(xq, xi ) is distance between xq and xi Note: we could use all training examples instead of just k → This is called Shepard’s method 12 / 14
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved