Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Database Design: Special Databases, Data Warehouses, Data Mining and Information Retrieval, Study notes of Principles of Database Management

Special databases, data warehouses, data mining, and information retrieval in the context of database design. Topics include biological data, geographic data, movies, new types of queries, and storing/retrieval issues. Examples are given using r-trees, olap, and data cubes. The document also covers data federation, data warehouses, and data mining techniques such as association rules.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-mqt-2
koofers-user-mqt-2 🇺🇸

10 documents

1 / 13

Toggle sidebar

Related documents


Partial preview of the text

Download Database Design: Special Databases, Data Warehouses, Data Mining and Information Retrieval and more Study notes Principles of Database Management in PDF only on Docsity! CMSC 424 – Database design Lecture 25 Special databases Data warehouses Data mining/Information retrieval Mihai Pop Admin • Course evaluation: http://www.CourseEvalUM.umd.edu • Review sessions: Thursday & Monday – e-mail me topics to cover, questions, problems, etc. R-tree (chap. 24) • Binary search tree on Y-coordinate • Each internal node contains search structure on X-coordinate for all points with Y coordinates in the corresponding subtree OLAP (chap. 18) ■ On-line Analytical Processing ■ Why ? Exploratory analysis • Interactive • Different queries than typical SQL queries Data CUBE A summary structure used for this purpose –E.g. give me total sales by zipcode; now show me total sales by customer employment category Much much faster than using SQL queries against the raw data –The tables are huge ■ Applications: – Sales reporting, Marketing, Forecasting etc etc Cross Tabulation of sales by item-name and color ■ The table above is an example of a cross-tabulation (cross-tab), also referred to as a pivot-table.  Values for one of the dimension attributes form the row headers  Values for another dimension attribute form the column headers  Other dimension attributes are listed on top  Values in individual cells are (aggregates of) the values of the dimension attributes that specify the cell. Data warehouses • Brute-force solution to federation: – download all databases – convert them to a common schema – provide a common interface • Problems: – data storage & duplication – hard to keep up to date – performance (single point of entry/ failure) • Examples: – GenBank (US biological data repository) – Ensembl (EU biological data repository) Data Mining • Searching for patterns in data – Typically done in data warehouses ■ Association Rules: When a customer buys X, she also typically buys Y Use ? • Move X and Y together in supermarkets – A customer buys a lot of shirts Send him a catalogue of shirts Patterns are not always obvious • Classic example: It was observed that men tend to buy beer and diapers together (may be an urban legend) • Other types of mining Classification Decision Trees Information retrieval (chap. 19) • Extracting meaning from data • Examples: – Google (document indexing/ranking) – Image search – Automatic annotation of documents, e.g. extracting information from bio-medical literature
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved