Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Mining - Density - Based Clustering, Study notes of Data Mining

<table border="0" cellpadding="0" cellspacing="0" width="539"><col width="539"></col><tr height="20"> <td height="20" style="height: 15pt; width: 404pt;" width="539">This document about Cluster Analysis, What is Cluster Analysis?, Types of Data in Cluster Analysis, Hierarchical Methods, Grid-Based Methods, Model-Based Methods.</td> </tr></table>

Typology: Study notes

2010/2011

Uploaded on 09/04/2011

amit-mohta
amit-mohta 🇮🇳

4.2

(152)

89 documents

1 / 13

Toggle sidebar

Related documents


Partial preview of the text

Download Data Mining - Density - Based Clustering and more Study notes Data Mining in PDF only on Docsity! November 25, 2014 Data Mining: Concepts and Techniques 1 Chapter 7. Cluster Analysis 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis 3. A Categorization of Major Clustering Methods 4. Partitioning Methods 5. Hierarchical Methods 6. Density-Based Methods 7. Grid-Based Methods 8. Model-Based Methods 9. Clustering High-Dimensional Data 10.Constraint-Based Clustering 11.Outlier Analysis 12.Summary November 25, 2014 Data Mining: Concepts and Techniques 2 Density-Based Clustering Methods • Clustering based on density (local cluster criterion), such as density-connected points • Major features: – Discover clusters of arbitrary shape – Handle noise – One scan – Need density parameters as termination condition • Several interesting studies: – DBSCAN: Ester, et al. (KDD’96) – OPTICS: Ankerst, et al (SIGMOD’99). – DENCLUE: Hinneburg & D. Keim (KDD’98) – CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid- based) November 25, 2014 Data Mining: Concepts and Techniques 5 Density-Reachable and Density-Connected • Density-reachable: – A point p is density-reachable from a point q w.r.t. Eps, MinPts if there is a chain of points p1, …, pn, p1 = q, pn = p such that pi+1 is directly density-reachable from pi • Density-connected – A point p is density-connected to a point q w.r.t. Eps, MinPts if there is a point o such that both, p and q are density-reachable from o w.r.t. Eps and MinPts p q p1 p q o November 25, 2014 Data Mining: Concepts and Techniques 6 DBSCAN: Density Based Spatial Clustering of Applications with Noise • Relies on a density-based notion of cluster: A cluster is defined as a maximal set of density- connected points • Discovers clusters of arbitrary shape in spatial databases with noise Core Border Outlier Eps = 1cm MinPts = 5 November 25, 2014 Data Mining: Concepts and Techniques 7 DBSCAN: The Algorithm • Arbitrary select a point p • Retrieve all points density-reachable from p w.r.t. Eps and MinPts. • If p is a core point, a cluster is formed. • If p is a border point, no points are density- reachable from p and DBSCAN visits the next point of the database. • Continue the process until all of the points have been processed. November 25, 2014 Data Mining: Concepts and Techniques 10 OPTICS: A Cluster-Ordering Method (1999) • OPTICS: Ordering Points To Identify the Clustering Structure – Ankerst, Breunig, Kriegel, and Sander (SIGMOD’99) – Produces a special order of the database wrt its density-based clustering structure – This cluster-ordering contains info equiv to the density-based clusterings corresponding to a broad range of parameter settings – Good for both automatic and interactive cluster analysis, including finding intrinsic clustering structure – Can be represented graphically or using visualization techniques November 25, 2014 Data Mining: Concepts and Techniques 11 OPTICS: Some Extension from DBSCAN • Index-based: • k = number of dimensions • N = 20 • p = 75% • M = N(1-p) = 5 – Complexity: O(kN2) • Core Distance • Reachability Distance D p2 MinPts = 5  = 3 cm Max (core-distance (o), d (o, p)) r(p1, o) = 2.8cm. r(p2,o) = 4cm o o p1 November 25, 2014 Data Mining: Concepts and Techniques 12   Reachability- distance Cluster-order of the objects undefined  ‘
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved