Download Relational Databases-Introduction to Database Systems-Lecture 02 Slides-Computer Science and more Slides Introduction to Database Management Systems in PDF only on Docsity! 2-1 Relational Databases Basic concepts Data model: organize data as tables A relational database is a set of tables Advantages Simple concepts Solid mathematical foundation ➠ set theory Powerful query languages Efficient query optimization strategies Design theory Industry standard Relational model SQL language 2-2 Relation A relation R with attributes A={A1, A2, …, An} defined over n domains D={D1, D2, ..., Dn} (not necessarily distinct) with values {Dom1, Dom2, ..., Domn } is a finite, time varying set of n-tuples <d1, d2, ..., dn> such that d1 ∈ Dom1, d2 ∈ Dom2, ..., dn ∈ Domn and A1 ∈ D1, A2 ∈ D2, ..., An ∈ Dn. Notation: R(A1, A2, …, An) or R(A1: D1, A2: D2, …, An: Dn) Alternatively, given R as defined above, an instance of it at a given time is a set of n-tuples: {< A1: d1, A2: d2, …, An: dn> | d1∈Dom1, d2∈Dom2, ..., dn∈Domn} Tabular structure of data where R is the table heading attributes are table columns each tuple is a row Relational Model 2-3 Relation Schemes and Instances Relational scheme A relation scheme is the definition; i.e., a set of attributes A relational database scheme is a set of relation schemes: ➠ i.e., a set of sets of attributes Relation instance (simply relation) An relation is an instance of a relation scheme a relation r over a relation scheme R = {A1, ..., An} is a subset of the Cartesian product of the domains of all attributes, i.e., r ⊆ Dom1 × Dom2 × … × Domn 2-4 A domain is a type in the programming language sense Name: String Salary: Real Domain values is a set of acceptable values for a variable of a given type. Name: CdnNames = {…}, Salary: ProfSalary = {45,000 - 150,000} Simple/Composite domains ➠ Address = Street name+street number+city+province+ postal code Domain compatibility Binary operations (e.g., comparison to one another, addition, etc) can be performed on them. Full support for domains is not provided in many current relational DBMSs Domains 2-9 Key Constraints Key: a set of attributes that uniquely identifies tuples Candidate key: a minimum set of attributes that form a key Superkey: A set of one or more attributes, which, taken collectively, allow us to identify uniquely a tuple in a relation. Primary key: a designated candidate key Data Constraints Functional dependency, multivalued dependency, … Check constraints Others Null constraints Referential constraints Integrity Constraints 2-10 Views Views can be defined on single relations PROJECT(PNO, PNAME) on multiple relations SAL(ENO,TITLE,SALARY) Relations from which they are derived are called base relations View relations can be virtual; never physically created ➠ updates to views is a problem materialized: physical relations exist ➠ propagation of base table updates to materialized view tables 2-11 View Updates Views that are derived from multiple tables may cause problems ENO ENAME TITLE E1 J. Doe Elect. Eng. E2 M. Smith Syst. Anal. E3 A. Lee Mech. Eng. E4 J. Miller Programmer E5 B. Casey Syst. Anal. E6 L. Chu Elect. Eng. E7 R. Davis Mech. Eng. E8 J. Jones Syst. Anal. EMP TITLE SALARY PAY Elect. Eng. 55000 Syst. Anal. 70000 Mech. Eng. 45000 Programmer 60000 ENO TITLE SALARY E1 Elect. Eng. 55000 E2 Syst. Anal. 70000 E3 Mech. Eng. 45000 E4 Programmer 60000 E5 Syst. Anal. 70000 E6 Elect. Eng. 55000 E7 Mech. Eng. 45000 E8 Syst. Anal. 70000 SAL ➯ How do you delete a tuple from SAL? 2-12 Form <Operator><parameters> <Operands> → <Result> ↓ ↓ Relation (s) Relation Relational Algebra 2-13 Fundamental union set difference selection projection Cartesian product Additional rename intersection join quotient (division) Union compatibility same degree corresponding attributes defined over the same domain Relational Algebra Operators 2-14 Similar to set union General form R ∪ S={t | t∈R or t∈S} where R, S are relations, t is a tuple variable Result contains tuples that are in R or in S, but not both (duplicates removed) R, S should be union-compatible Union 2-19 Projection Example ΠPNO,BUDGET(PROJ) PNO BUDGET P1 150000 P2 135000 P3 250000 P4 310000 P5 500000 PROJ PNO BUDGET P2 135000 P3 250000 P4 310000 P5 500000 PNAME P1 150000Instrumentation Database Develop. CAD/CAM Maintenance CAD/CAM 2-20 Given relations R of degree k1 , cardinality n1 S of degree k2 , cardinality n2 Cartesian (cross) product: R × S = {t [A1,…,Ak1, Ak1+1,…,Ak1+k2] | t[A1,…,Ak1]∈R and t[Ak1+1,…,Ak1+k2]∈S} The result of R × S is a relation of degree (k1+ k2) and consists of all (n1* n2)-tuples where each tuple is a concatenation of one tuple of R with one tuple of S. Cartesian (Cross) Product 2-21 Cartesian Product Example ENO ENAME EMP.TITLE PAY.TITLE SALARY E1 J. Doe Elect. Eng. E1 J. Doe Elect. Eng. E1 J. Doe Elect. Eng. E1 J. Doe Elect. Eng. Elect. Eng. 55000 Syst. Anal. 70000 Mech. Eng. 45000 Programmer 60000 E2 M. Smith Syst. Anal. E2 M. Smith Syst. Anal. E2 M. Smith Syst. Anal. E2 M. Smith Syst. Anal. Elect. Eng. 55000 Syst. Anal. 70000 Mech. Eng. 45000 Programmer 60000 Elect. Eng. 55000 Syst. Anal. 70000 Mech. Eng. 45000 Programmer 60000 Elect. Eng. 55000 Syst. Anal. 70000 Mech. Eng. 45000 Programmer 60000 E3 A. Lee Mech. Eng. E3 A. Lee Mech. Eng. E3 A. Lee Mech. Eng. E3 A. Lee Mech. Eng. E8 J. Jones Syst. Anal. E8 J. Jones Syst. Anal. E8 J. Jones Syst. Anal. E8 J. Jones Syst. Anal. EMP × PAY ENO ENAME TITLE E1 J. Doe Elect. Eng E2 M. Smith Syst. Anal. E3 A. Lee Mech. Eng. E4 J. Miller Programmer E5 B. Casey Syst. Anal. E6 L. Chu Elect. Eng. E7 R. Davis Mech. Eng. E8 J. Jones Syst. Anal. EMP TITLE SALARY PAY Elect. Eng. 55000 Syst. Anal. 70000 Mech. Eng. 45000 Programmer 60000 2-22 Typical set intersection R ∩ S = {t | t∈R and t∈S} = R – (R – S) R, S union-compatible Intersection 2-23 General form R F(R.Ai,S.Bj) S={t[A1,…,An,B1,…,Bm]| t[A1,…,An]∈R and t[B1,…,Bm]∈S and F(R.Ai, S.Bj) is true} where R, S are relations, t is a tuple variable F (R.Ai, S.Bj)is a formula defined as that of selection. A derivative of Cartesian product R F S = σF(R × S) Join 2-24 Join Example ENO ENAME TITLE E1 J. Doe Elect. Eng E2 M. Smith Syst. Anal. E3 A. Lee Mech. Eng. E4 J. Miller Programmer E5 B. Casey Syst. Anal. E6 L. Chu Elect. Eng. E7 R. Davis Mech. Eng. E8 J. Jones Syst. Anal. EMP EMP EMP.ENO>WORKS.ENOWORKS ENO PNO RESP E1 P1 Manager 12 DUR E2 P1 Analyst 24 E2 P2 Analyst 6 E3 P3 Consultant 10 E3 P4 Engineer 48 E4 P2 Programmer 18 E5 P2 Manager 24 E6 P4 Manager 48 E7 P3 Engineer 36 E8 P3 Manager 40 WORKS E7 P5 Engineer 23 EMP. ENAME TITLE WORKS. PNO RESP DUR ENO. ENO E2 M. Smith Elect. Eng. E1 P1 Manager 12 E3 A. Lee Syst. Anal. E1 P1 Manager 12 E3 A. Lee Syst. Anal. E2 P1 Analyst 24 E3 A. Lee Syst. Anal. E2 P2 Analyst 6 E4 J. Miller Programmer E1 P1 Manager 12 E4 J. Miller Programmer E2 P1 Analyst 24 E4 J. Miller Programmer E2 P2 Analyst 6 E4 J. Miller Programmer E3 P3 Consultant 10 E4 J. Miller Programmer E3 P4 Engineer 48 E5 B. Casey Syst. Anal. E1 P1 Manager 12 E5 B. Casey Syst. Anal. E2 P1 Analyst 24 E5 B. Casey Syst. Anal. E2 P2 Analyst 6 E5 B. Casey Syst. Anal. E3 P3 Consultant 10 E5 B. Casey Syst. Anal. E3 P4 Engineer 48 E5 B. Casey Syst. Anal. E4 P2 Programmer 18 E6 L. Chu Elect. Eng. E1 P1 Manager 12 E6 L. Chu Elect. Eng. E2 P1 Analyst 24 E6 L. Chu Elect. Eng. E2 P2 Analyst 6 E6 L. Chu Elect. Eng. E3 P3 Consultant 10 E6 L. Chu Elect. Eng. E3 P4 Engineer 48 E6 L. Chu Elect. Eng. E4 P2 Programmer 18 E6 L. Chu Elect. Eng. E5 P2 Manager 24 … … … … … … … 2-29 Division (cont’d) x1 y1 x2 y1 x3 y1 x4 y1 x1 y2 x3 y2 x2 y3 x3 y3 x4 y3 x1 y4 x2 y4 x3 y4 X Y R x1 x2 x3 X S y1 y4 T Y T1 ← ΠY(R) T2 ← ΠY((S × T1) − R) T ← T1 − T2 2-30 Given relations R of degree k1 (R = {A1,…,Ak1}) S of degree k2 (S = {B1,…,Bk2}) Division of R by S (given , {B1,…,Bk2} ⊆{A1,…,Ak1}) R ÷ S = {t[{A1,…,Ak1}−{B1,…,Bk2}] | ∀u∈S∃v∈R(v[S]=u ∧v[R−S]=t)} = ΠR-S(R)−ΠR-S ((ΠR-S (R)×S)− R) R ÷ S results in a relation of degree (k1− k2) and consists of all (k1− k2)-tuples t such that for all k1-tuples u in S, the tuple tu is in R. Division - Formally 2-31 Division Example ENO PNO PNAME E1 P1 Instrumentation 150000 BUDGET E2 P1 Instrumentation 150000 E2 P2 Database Develop. 135000 E3 P4 Maintenance E4 P2 Instrumentation E5 P2 Instrumentation E6 P4 E7 P3 CAD/CAM E8 P3 CAD/CAM 310000 150000 150000 310000 250000 250000 EMP Maintenance E3 P1 Instrumentation 150000 ENO E3 EMP÷PROJ PROJ PNO BUDGET P2 135000 P3 250000 P4 310000 PNAME P1 150000Instrumentation Database Develop. CAD/CAM Maintenance E3 P2 Database Develop. 135000 E3 P3 CAD/CAM 250000 2-32 Emp (Eno, Ename, Title, City) (note we added City) Project(Pno, Pname, Budget, City) (note we added City) Pay(Title, Salary) Works(Eno, Pno, Resp, Dur) List names of all employees. ΠEname(Emp) List names of all projects together with their budgets. ΠPname,Budget(Project) Example Queries 2-33 Emp (Eno, Ename, Title, City) (note we added City) Project(Pno, Pname, Budget, City) (note we added City) Pay(Title, Salary) Works(Eno, Pno, Resp, Dur) Find all job titles to which at least one employee has been hired. ΠTitle(Emp) Find the records of all employees who work in Toronto. σCity=‘Toronto’(Emp) Example Queries 2-34 Emp (Eno, Ename, Title, City) Project(Pno, Pname, Budget, City) Pay(Title, Salary) Works(Eno, Pno, Resp, Dur) Find all cities where either an employee works or a project exists. ΠCity(Emp) ∪ ΠCity(Project) Find all cities that has a project but no employees who work there. ΠCity(Project) − ΠCity(Emp) Example Queries 2-39 According to the primitive variable used in specifying the queries. tuple relational calculus domain relational calculus Types of Relational Calculus 2-40 The primitive variable is a tuple variable which specifies a tuple of a relation. In other words, it ranges over the tuples of a relation. In tuple relational calculus queries are specified as {t | F(t)} where t is a tuple variable and F is a formula consisting of the atoms and operators. F evaluates to True or False. t can be qualified for only some attributes: t[A] Tuple Relational Calculus 2-41 The atoms are the following: ❶ Tuple variables ➠ If the relation over which the variable ranges is known, the variable may be qualified by the name of the relation as R.t or R(t). ❷ Conditions ➠ s[A] θ t[Β], where s and t are tuple variables and A and B are components of s and t, respectively; θ ∈ {<, >, =, ≠, ≤, ≥}. Specifies that component A of s stands in relation θ to the B component of t (e.g., s[SALARY] > t[SALARY]). ➠ s[A] θ c, where s, A and θ are as defined above and c is a constant. For example, s[NAME] = “Smith”. Tuple Relational Calculus 2-42 A formula F is composed of atoms Boolean operators ∧, ∨, ¬ existential quantifier ∃ universal quantifier ∀ Formation rules: Each atom is a formula. If F and G are formulae, so are F∧G, F∨G, ¬F, and ¬G. If F is a formula, so is (F). If F is a formula and t is a free variable in F, then ∃t(F) and ∀t(F) are also formulae. These can also be written as ∃tF(t) and ∀tF(t ) Nothing else is a formula. Tuple Relational Calculus 2-43 Safety of Calculus Expressions Problem: the size of {t | F(t)}must be finite. {t | ¬t∈R}is not finite Safety: A query is safe if, for all databases conforming to the schema, the query result can be computed using only constants appearing in the database or the query itself. Since database is finite, the set of constants appearing in it is finite as well as the constants in the query; therefore, the query result will be finite 2-44 Emp (Eno, Ename, Title, City) Project(Pno, Pname, Budget, City) Pay(Title, Salary) Works(Eno, Pno, Resp, Dur) List names of all employees. {t[Ename] | t ∈ Emp} List names of all projects together with their budgets. {<t[Pname],t[Budget]> | t ∈ Project]} Example Queries 2-49 Tuple Calculus and Relational Algebra Π is like ∃ “there exists” ... ÷ is like ∀ “for all” ... Expressing ÷ using basic operators R÷S = ΠΑ(R) - ΠΑ(ΠΑ(R) S - R) Similar to ∀x F(x) = ¬ (∃x ¬ F(x) ) 2-50 The primitive variable is a domain variable which specifies a component of a tuple. ➱ the range of a domain variable consists of the domains over which the relation is defined. Other differences from tuple relational calculus: The atoms are the following : ➠ Each domain is an atom. ➠ Conditions which can be defined as follows are atoms : ✦ x θ y, where x and y are domain variables or constants; ✦ <x1, x2, ..., xn> ∈R where R is a relation of degree n and each xi is a domain variable or constant. Formulae are defined in exactly the same way as in tuple relational calculus, with the exception of using domain variables instead of tuple variables. Domain Relational Calculus 2-51 The queries are specified in the following form : {<x1, x2, ..., xn> | F(x1, x2, ..., xn)} where F is a formula in which x1,..., xn are the free variables. Domain Relational Calculus 2-52 Emp (Eno, Ename, Title, City) Project(Pno, Pname, Budget, City) Pay(Title, Salary) Works(Eno, Pno, Resp, Dur) List the names and budgets of projects in which employee E1 works. {<b,c> | ∃a,d(<a,b,c,d> ∈ Project ∧ ∃e,f,g(<e,a,f,g> ∈ Works ∧ e = ‘E1’))} Domain Relational Calculus 2-53 Emp Ename CityTitle Project Pno Pname Budget City Eno Pay SalaryTitle Works Eno Pno Resp Dur Find the names of all employees QBE (Query-by-Example) Queries P. 2-54 Find the names of projects with budgets greater than $350,000. QBE Queries Emp Ename CityTitle Project Pno Pname Budget City Eno Pay SalaryTitle Works Eno Pno Resp Dur >350000P.