Comp 207: Tutorials and lab classes: Tutorial 4

Tutorials and lab classes: Tutorial 4

Complete the following exercises on query optimisation.

Exercise 1

Consider a relation r over the attributes A, B, C. r has 5,000 tuples, with 5 tuples stored per page. The attributes A, B, C have the following characteristics:

•Attribute A is a candidate key
•Unclustered hash index on attribute A
•Clustered B+ tree index on attribute B of depth d
•Attribute B has 1,000 distinct values in r
•Attribute C has 500 distinct tuples and an unclustered 3-level B+ tree index

•Assume that the the tuples are stored in 2 leaf nodes, rather than just in one.
•Hint: The total cost of accessing the leaf nodes in the tree is given by the cost of searching the index for the address of the first leaf node + 1 (cost of retrieving the second page of the index).

Estimate the cost of computing the following queries:

(a)σA=valuet(r) using the index
(b)σB=value(r) using the index
(c)σC=value(r) using the index
(d)πAC (r)

Optional question:

(e) Can you guess what the cost of writing the output of πAC (r) is? Assume that all attributes contribute equally to the tuple size.

Lectures:
Tue: 14.00 to 15.00 (LIFS-LT3)
Thu: 10.00 to 11.00 (SHER-LT1)
Fri: 15.00 to 16.00 (SHER-LT1)
Labs:
One session per week in Lab 4, G. Holt Building, starting from week 3 (Week starting on October 10th). Check your slot on Spider!
Lab timetable:
Mon: 12.00 - 13.00; 13.00 - 14.00 Tue: 10.00 - 11.00 Thu: 13.00 - 14.00; 15.00 - 16.00; 16.00 -17.00 Fri: 12.00 - 13.00; 13.00 - 14.00
Surgeries:
Tue: 15.00 to 16.00
Thu: 11.00 to 12.00
Assignments:
Assignment 1 (Deadline XX/XX/XX);
Assignment 2 (Deadline XX/XX/XX);
Assessment weightings:
80% multiple choice exam;
10% Practical Assignment 1;
10% Practical Assignment 2.

Exercise 2

Consider a relation s over the attributes A and B with the following characteristics:

•7,000 tuples with 70 tuples per page
•A hash index on attribute A
•The values that the attribute A takes in relation s are integers that are uniformly distributed in the range 1 – 200.

Estimate the cost of computing the following:

(a) Assuming that the aforesaid index on A is unclustered, estimate the number of disk accesses needed to compute the query σA=18(s).

Optional question:

(b) What would be the cost estimate if the index were clustered? Explain your reasoning.

Hint: The tuples might cluster around a page boundary.

Optional exercise

Estimate the cost of r ⋈ s using

(a) Sort-merge join
(b) Block nested loops

where r has 1,000 tuples, 20 tuples per page; s has 2,000 tuples, 4 tuples per page; and the main memory buffer for this operation is 22 pages long.