Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Recording and Storage - Data Warehousing - Lecture Handouts, Lecture notes of Data Warehousing

Data recording and storage, Data Sets, Size of Data Sets, Cost of data storage, Total hardware and software cost, Businesses demand intelligence, DBMS Approach are discusses points of document. This handout is for data warehousing course from Virtual University of Pakistan.

Typology: Lecture notes

2011/2012

Uploaded on 11/06/2012

ahsen
ahsen 🇵🇰

4.5

(91)

86 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Data Recording and Storage - Data Warehousing - Lecture Handouts and more Lecture notes Data Warehousing in PDF only on Docsity! Data Warehousing Course Code: CS614 Cs614@vu.edu.pk Virtual University of Pakistan 9 Lecture Handout Data Warehousing Lecture No. 02 Why a DWH? • Data recording and storage is growing. • History is excellent predictor of the future. • Gives total view of the organization. • Intelligent decision-support is required for decision-making. Data recording and storage is growing. Moore’s law on increase in performance of CPUs and decrease in cost has been surpassed by the increase in storage space and decrease in cost. Meaning, it is true that the cost of CPUs is going down and the performance is going up, but this is applicable at a higher rate to storage space and cost i.e. more and more cheap storage space is becoming available as compared to fast CPUs. As you would have experienced, when you (or your father’s) briefcase seems to be small as compared to the contents carried in it, it seems a good idea to buy a new and larger briefcase. However, after sometime the new briefcase too seems to be small for the contents carried. On the practical side, it has been noted that the amount of data recorded in an organization doubles every year and this is an exponential increase. Data Warehousing Course Code: CS614 Cs614@vu.edu.pk Virtual University of Pakistan 10 Reason-1: Data Sets are growing How Much Data is that? 1 MB 220 or 106 bytes Small novel – 31/2 Disk 1 GB 230 or 109 bytes Paper rims that could fill the back of a pickup van 1 TB 240 or 1012 bytes 50,000 trees chopped and converted into paper and printed 2 PB 1 PB = 250 or 1015 bytes Academic research libraries across the U.S. 5 EB 1 EB = 260 or 1018 bytes All words ever spoken by human beings Table-2.1: Quantifying size of data Size of Data Sets are going up ↑. Cost of data storage is coming down ↓. Total hardware and software cost to store and manage 1 Mbyte of data 1990: ~ $15 2002: ~ ¢15 (Down 100 times) By 2007: < ¢1 (Down 150 times) A Few Examples WalMart: 24 TB (Tera Byte) France Telecom: ~ 100 TB CERN: Up to 20 PB by 2006 (Peta Byte) Stanford Linear Accelerator Center (SLAC): 500TB Data Warehousing Course Code: CS614 Cs614@vu.edu.pk Virtual University of Pakistan 13 However, this being another example of using historical data to predict the future. So I can predict today, which customers will leave me in the next 3 months before they even leave. There can be, and there are whole courses on data mining, but we will just have an applied overview of data mining in this course. Reason-2: Businesses demand intelligence Complex questions from integrated data. “Intelligent Enterprise” DBMS Approach Intelligent Enterprise List of all items that were sold last month? List of all items purchased by Khizar? The total sales of the last month grouped by branch? How many sales transactions occurred during the month of January? Which items sell together? Which items to stock? Where and how to place the items? What discounts to offer? How best to target customers to increase sales at a branch? Which customers are most likely to respond to my next promotional campaign, and why? Table-2.2: Comparison of queries Let’s take a close look at the typical queries for a DBMS. They are either about listing the contents of tables or running aggregates of values i.e. rather simple and straightforward queries and fairly easy to program. The queries follow rather pre-defined paths into the database and are unlikely to come up with something new or abnormal. Reason-3: Businesses want much more… 1. What happened? Data Warehousing Course Code: CS614 Cs614@vu.edu.pk Virtual University of Pakistan 14 2. Why it happened? 3. What will happen? 4. What is happening? 5. What do you want to happen? These questions primarily point to what is called as the different stages of a Data Warehouse i.e. starting from the first stage, and going all the way to stage 5. The first stage is not actually a data warehouse, but a pure batch processing system. Note that as the stages evolve the amount of batching processing decreases, this being maximum in the first stage and minimum in the last or 5th stage. At the same time the amount of ad- hoc query processing increases. Finally in the most developed stage there is a high level of event based triggering. As the system moves from stage-1 to stage-5 it becomes what is called as an active data warehouse. What is a DWH? A complete repository of historical corporate data extracted from transaction systems that is available for ad-hoc access by knowledge workers The other key points in this standard definition that I have also underlined and listed below are: Complete repository • All the data is present from all the branches/outlets of the business. • Even the archived data may be brought online. • Data from arcane and old systems is also brought online. Transaction System • Management Information System (MIS) • Could be typed sheets (NOT transaction system) Data Warehousing Course Code: CS614 Cs614@vu.edu.pk Virtual University of Pakistan 15 Ad-Hoc access • Does not have a certain predefined database access pattern. • Queries not known in advance. • Difficult to write SQL in advance. Knowledge workers • Typically NOT IT literate (Executives, Analysts, Managers). • NOT clerical workers. • Decision makers. The users of data warehouse are knowledge workers in other words they are decision makers in the organization. They are not the clerical people entering the data or overseeing the transactions etc or doing programming or performing system design/analysis. These are really decision makers in the organization like General Manager Marketing, or Executive Director or CEO (Chief Operating Officer). Typically those decision makers are people in areas like marketing, finance and strategic planning etc. Completeness: There is a misnomer here, about completeness. As per the standard definition a data warehouse is a complete repository of corporate data. The reality is that it can never be complete. We will discuss this in detail very shortly. Transaction System: Unlike databases where data is directly entered, the input to the data warehouse can come from OLTP or transactional systems or other third party databases. This is not a rule, the data could come from typed or even hand filled sheets, as was the case for the census data warehouse. Ad-Hoc access: It dose not have a certain repeatable pattern and it’s not known in advance. Consider financial transactions like a bank deposit, you know exactly what records will be inserted deleted or updated. That’s in OLTP system and in ERP system. But in a data warehouse there are really no fixed patterns. Say the marketing person, just
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved