Download Data Mining - Density - Based Clustering and more Study notes Data Mining in PDF only on Docsity! November 25, 2014 Data Mining: Concepts and Techniques 1 Chapter 7. Cluster Analysis 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis 3. A Categorization of Major Clustering Methods 4. Partitioning Methods 5. Hierarchical Methods 6. Density-Based Methods 7. Grid-Based Methods 8. Model-Based Methods 9. Clustering High-Dimensional Data 10.Constraint-Based Clustering 11.Outlier Analysis 12.Summary November 25, 2014 Data Mining: Concepts and Techniques 2 Density-Based Clustering Methods • Clustering based on density (local cluster criterion), such as density-connected points • Major features: – Discover clusters of arbitrary shape – Handle noise – One scan – Need density parameters as termination condition • Several interesting studies: – DBSCAN: Ester, et al. (KDD’96) – OPTICS: Ankerst, et al (SIGMOD’99). – DENCLUE: Hinneburg & D. Keim (KDD’98) – CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid- based) November 25, 2014 Data Mining: Concepts and Techniques 5 Density-Reachable and Density-Connected • Density-reachable: – A point p is density-reachable from a point q w.r.t. Eps, MinPts if there is a chain of points p1, …, pn, p1 = q, pn = p such that pi+1 is directly density-reachable from pi • Density-connected – A point p is density-connected to a point q w.r.t. Eps, MinPts if there is a point o such that both, p and q are density-reachable from o w.r.t. Eps and MinPts p q p1 p q o November 25, 2014 Data Mining: Concepts and Techniques 6 DBSCAN: Density Based Spatial Clustering of Applications with Noise • Relies on a density-based notion of cluster: A cluster is defined as a maximal set of density- connected points • Discovers clusters of arbitrary shape in spatial databases with noise Core Border Outlier Eps = 1cm MinPts = 5 November 25, 2014 Data Mining: Concepts and Techniques 7 DBSCAN: The Algorithm • Arbitrary select a point p • Retrieve all points density-reachable from p w.r.t. Eps and MinPts. • If p is a core point, a cluster is formed. • If p is a border point, no points are density- reachable from p and DBSCAN visits the next point of the database. • Continue the process until all of the points have been processed. November 25, 2014 Data Mining: Concepts and Techniques 10 OPTICS: A Cluster-Ordering Method (1999) • OPTICS: Ordering Points To Identify the Clustering Structure – Ankerst, Breunig, Kriegel, and Sander (SIGMOD’99) – Produces a special order of the database wrt its density-based clustering structure – This cluster-ordering contains info equiv to the density-based clusterings corresponding to a broad range of parameter settings – Good for both automatic and interactive cluster analysis, including finding intrinsic clustering structure – Can be represented graphically or using visualization techniques November 25, 2014 Data Mining: Concepts and Techniques 11 OPTICS: Some Extension from DBSCAN • Index-based: • k = number of dimensions • N = 20 • p = 75% • M = N(1-p) = 5 – Complexity: O(kN2) • Core Distance • Reachability Distance D p2 MinPts = 5 = 3 cm Max (core-distance (o), d (o, p)) r(p1, o) = 2.8cm. r(p2,o) = 4cm o o p1 November 25, 2014 Data Mining: Concepts and Techniques 12 Reachability- distance Cluster-order of the objects undefined ‘