Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Templates and Classifiers: Image Recognition using Templates and Classifiers - Prof. Jane , Study notes of Computer Science

The use of templates and classifiers for image recognition. It covers the basics of template matching and recognition, the concept of classifiers, and the importance of decision boundaries. The document also explores various classifier types, including histogram-based classifiers and nearest neighbors, and techniques for feature selection and dimensionality reduction. Based on notes from cmu's f&p course, specifically chapter 22.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-ynl
koofers-user-ynl 🇺🇸

10 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Templates and Classifiers: Image Recognition using Templates and Classifiers - Prof. Jane and more Study notes Computer Science in PDF only on Docsity! 1 Templates and Classifiers Last Day: Structure from Motion Feature-based Dense Today: Templates and Classifiers (F&P Ch 22, Hebert notes CMU) Recognition by template matching Recognition by finding patterns We have seen very simple template matching (under filters) Some objects behave like quite simple templates Frontal faces Strategy: Find image windows Correct lighting Pass them to a statistical test (a classifier) that accepts faces and rejects non-faces Templates for Recognition Some objects can be identified by simple tests on image windows (faces, stop signs) Template matching: Take all windows of a particular shape Test to see whether relevant object is present. Possibly search over scale (size) and orientation More complicated shapes and objects can be identified by looking at relationships among groups of templates. Classifiers How do we test whether object is present? Classifier: takes a feature set as input Produces a class label Build using a training set of feature- label examples (xi,yi) Find a rule that takes a plausible measurement xi and computes its label yi Basic ideas in classifiers Loss some errors may be more expensive than others e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects -> false positives in diagnosis are better than false negatives We discuss two class classification: L(1->2) is the loss caused by calling 1 a 2 Total risk of using classifier s We want to minimize total risk R )12(}|12Pr{)21(}|21Pr{)( →→+→→= LsLssR 2 Basic ideas in classifiers Generally, we should classify as 1 if the expected loss of classifying as 1 is better than for 2 For observation x gives Crucial notion: Decision boundary points where the loss is the same for either case 1 if 2 if Probabilistic Formulation Decision boundary: Learn p(feature|object): Bayes Risk: E(R({1,2},s)) over label set {1,2} p(object1|feature) p(object2|feature) For λ=1: Bayes risk λ> )|( )|( 2 1 featureobjectp featureobjectp )()|(~)|( jjj objectpobjectfeaturepfeatureobjectp P feature Issues How to represent and learn p(feature|objectj) or decision boundary? How to approach Bayes risk given small number of samples? What features to use? How to reduce the feature space? Evaluating Classifier Performance Detection Rate = Prob(feature from object is correctly classified as object) False Positive Rate = Prob(feature from background is classified as object) Operating point λ> )|( )|( featurebackgroundp featureobjectp 1 10 Detection Rate False positive rate Receiver Operating Characteristic (ROC) λ decreasing ROC Curve 4 cases True positive (sensitivity), false positive, true negative (specificity), false negative ROC tells us what happens as we vary test threshold Approaches Every single pattern classification/learning approach has been applied to this problem Pick your favorite: Naïve Bayes Boosting Neural networks SVMs NNs PCA/LDA/ICA dimensionality reduction etc 5 Feature Selection Each feature is a set of variables (wavelet coefficients) S = {C1,..,CN} Find feature set which best classifies the dataset Problem: If N is large, the feature is very discriminative (S is equivalent to the entire window if N is the total number of variables) but representing the corresponding distribution is very expensive If N is small, the feature is not discriminative but classification is very fast Solution: Classifier Cascade Standard problem: We can have either discriminative or efficient features but not both! Cannot do classification in one shot PCA – Captures most of the variance (later) Classifier Cascade Apply first a classifier with simple features Fast and will eliminate the most obvious non-object locations Then apply a classifier with more complex features. More expensive but applied only to these locations that survived the previous stage Cascade Example Cascade Stage 1 Cascade Stage 2 Cascade Stage 3 Apply classifier with very simple (and fast) features Eliminates most of the image Apply classifier with more complex features on what is left Apply classifier with more complex features on what is left Using Weak Features Don’t try to design strong features from the beginning, just use really stupid but really fast features (and a lot of them) Weak learner = Very fast (but very inaccurate) classifier Example: Multiply input window by a very simple box operator and threshold output (Example from Paul Viola, Distributed by Intel as part of the OpenCV library) Feature Selection Operators defined over all possible shapes and positions within the window For a 24x24 window 45,396 combinations!! How to select the “useful” features? How to combine them into classifiers? Input: Training examples {xi} with labels (“face” or “non-face” = +/-1) {yi} + weights wi (initially wi = 1) Choose the feature (weak classifier ht) with minimum error: Update the weights such that wi is increased if xi is misclassified wi is decreased if xi is correctly classified Compute a weight αt for classifier ht αt large if εt is small Final classifier: [ ]∑ ≠= i iitit yxhw )(ε ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ = ∑ t tt xhxH )(sgn)( α Re pe at T t im es 6 The automatic selection process selects “natural” features Discriminative Approaches Difficult to represent the distribution in high-dimensional feature spaces Find decision boundary directly General idea: Much less training data is needed to construct the decision boundary than the distributions Maximize separation between the classes for better generalization Fewer parameters: 2 Gaussians with equal covariance = 7 params, line = 2 params Nearest Neighbors Does not require recovery of distributions or decision surfaces Asymptotically twice Bayes risk at most Choice of distance metric critical Indexing may be difficult Large Feature Spaces: PCA X = feature vector of high dimension Difficult indexing in high-dimensional space Most of the dimensions are probably not useful Principal Component: Dominant eigenvectors of scatter matrix Most of the information is contained in the space spanned by (V1,..,Vk), the k eigenvectors associated with the k largest eigenvalues. XXX −=~ ∑ ∑∑ ∑∑ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = i i ni i nii i nii i i T ii xxx xxx XX 2 ,,1, ,1, 2 1. ~~ … XVXVX kk ~...~~ 11 ⋅++⋅= λλ Principal Component Analysis PCA: Project first in the lower-dimensional space spanned by the principal components Indexing in much lower dimensional space Feature selection Features are linear and independent Method: Collect a set of pictures of m objects Compute the eigenvector representation For each object compute the coefficients for the space spanned by the objects Vp For new image p, compute projection into space Vp Identify object based on min distance d=|w-wk| 7 PCA for Recognition Assume centred features Training features: X1,..,Xm Compute principal directions: V1,..,Vk Project training features onto principal directions Input: Feature vector X Project X onto principal component space: Find object with feature vector X’io closest to X: ikkii XVXVX ⋅++⋅≈ λλ ...11 ' XVXVX kki ⋅++⋅≈ λλ ...11 ' i iXXi '' 0 maxarg −= Appearance-based matching Nayar et al.’96 Columbia Difficulties with PCA Projection may suppress important detail smallest variance directions may not be unimportant Method does not take discriminative task into account typically, we wish to compute features that allow good discrimination not the same as largest variance Linear Discriminant Analysis We wish to choose linear functions of the features that allow good discrimination. Assume class-conditional covariances are the same Want linear feature that maximises the spread of class means for a fixed within- class variance Problems Variation in appearance due to illumination and expression due to identity Assumes “linear” distribution of features Best choice for compression may not be the best choice for discrimination LDA Find projection direction V that separates the 2 classes best Minimize: Generalized eigenvalue problem Similar application of LDA to faces: FisherFaces (Belhumeur, Yale/Columbia) TVCCV XXV )( ))(( 21 2 21 + −⋅ = projection after classes of Scatter proj after classes between Distance
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved