Download Advanced Instance-Based Learning Techniques in Machine Learning - Prof. Gregory J. Hamerly and more Study notes Computer Science in PDF only on Docsity! Intro. to machine learning (CSI 5325) Lecture 19: Instance-based learning Intro. to machine learning (CSI 5325) Lecture 19: Instance-based learning Greg Hamerly Spring 2008 Some content from Tom Mitchell. 1 / 13 Intro. to machine learning (CSI 5325) Lecture 19: Instance-based learning Outline 1 Advanced topics in instance-based learning 2 / 13 Intro. to machine learning (CSI 5325) Lecture 19: Instance-based learning Advanced topics in instance-based learning PCA (2) PCA is a linear transformation of the data into a new basis. It assumes that there is some orthogonal vector basis of the original data which captures the variance of the data in a better way. In the transformed data, the first dimension has the largest variance, the second dimension has the second largest variance, etc. It finds a new set of orthogonal basis vectors in the original space that point in the directions of maximum variance. 5 / 13 Intro. to machine learning (CSI 5325) Lecture 19: Instance-based learning Advanced topics in instance-based learning PCA (3) Based on finding the eigenvalues of the covariance of the data: [v, lambda] = eig(cov(x)); If we find the principal components and use only the top few, we have reduced the dimension of our data while keeping the majority of the information (aka variance). 6 / 13 Intro. to machine learning (CSI 5325) Lecture 19: Instance-based learning Advanced topics in instance-based learning Random linear projections Basic idea: choose k random vectors in the original space, where k d linearly project all the data to the k new vectors learn in this lower-dimensional space Even though it sounds crazy, amazing properties. Johnson-Lindenstrauss lemma: if we project n points into k = O(log n) dimensions, then there is a high probability that the n2 inter-point distances will remain nearly the same. note – no dependence on the original d! 7 / 13 Intro. to machine learning (CSI 5325) Lecture 19: Instance-based learning Advanced topics in instance-based learning Multidimensional scaling Given a set of pairwise distances (not original points), find a set of vectors which preserve those distances. Given: d(xi , xj), pairwise distances for n points (1 ≤ i , j ≤ n). Find the minimum of S(y1, y2, . . . , yn) = ∑ i 6=j (||yi − yj || − d(xi , xj))2 where yi are vectors in some space of some chosen dimension. want sufficient dimension for yi , but not too high! method: minimize S by gradient descent on yi vectors. 10 / 13 Intro. to machine learning (CSI 5325) Lecture 19: Instance-based learning Advanced topics in instance-based learning Alternative distance metrics (L1, etc.) Minkowski (Lp) norm: Lp(x) = ( d∑ i=1 |x |p )1/p dp(x , y) = ||x − y ||p = Lp(x − y) L2 = Euclidean distance L1 = Manhattan distance L∞ = max General observation: in high dimension, lower p norms tend to work better (Aggarwal 2001). 11 / 13 Intro. to machine learning (CSI 5325) Lecture 19: Instance-based learning Advanced topics in instance-based learning Mahalanobis distance Given a scaling matrix A ∈ Rd×d , the Mahalanobis distance between vectors x , y ∈ Rd is ||x − y ||A = √ (x − y)TA(x − y) This has relevance with: Gaussian probability distribution (A = Σ−1, where Σ is the covariance matrix) principal components analysis metric learning 12 / 13