Complete the following exercises on query optimisation.

Exercise 1


Consider a relation r over the attributes A, B, C. r has 5,000 tuples, with 5 tuples stored per page. The attributes A, B, C have the following characteristics:

  1. Attribute A is a candidate key

  2. Unclustered hash index on attribute A

  3. Clustered B+ tree index on attribute B of depth d

  4. Attribute B has 1,000 distinct values in r

  5. Attribute C has 500 distinct tuples and an unclustered 3-level B+ tree index

    1. Assume that the the tuples are stored in 2 leaf nodes, rather than just in one.

    2. Hint: The total cost of accessing the leaf nodes in the tree is given by the cost of searching the index for the address of the first leaf node + 1 (cost of retrieving the second page of the index).


Estimate the cost of computing the following queries:

  1. (a)σA=valuet(r) using the index

  2. (b)σB=value(r) using the index

  3. (c)σC=value(r) using the index

  4. (d)πAC (r)


Optional question:

  1. (e) Can you guess what the cost of writing the output of πAC (r) is? Assume that all attributes contribute equally to the tuple size.

Lectures:
Tue: 14.00 to 15.00 (LIFS-LT3) 
Thu: 10.00 to 11.00 (SHER-LT1) 
Fri:   15.00 to 16.00 (SHER-LT1)
Labs:
One session per week in Lab 4, G. Holt Building, starting from week 3 (Week starting on October 10th).
Check your slot on Spider!
Lab timetable:
Mon: 12.00 - 13.00; 13.00 - 14.00 
Tue: 10.00 - 11.00
Thu: 13.00 - 14.00; 15.00 - 16.00; 16.00 -17.00
Fri: 12.00 - 13.00; 13.00 - 14.00
Surgeries:
Tue: 15.00 to 16.00
Thu: 11.00 to 12.00
Assignments:
Assignment 1 (Deadline XX/XX/XX);
Assignment 2 (Deadline XX/XX/XX);
Assessment weightings:
80% multiple choice exam;
10% Practical Assignment 1;
10% Practical Assignment 2.

Exercise 2


Consider a relation s over the attributes A and B with the following characteristics:

  1. 7,000 tuples with 70 tuples per page

  2. A hash index on attribute A

  3. The values that the attribute A takes in relation s are integers that are uniformly distributed in the range 1 – 200.


Estimate the cost of computing the following:

(a)  Assuming that the aforesaid index on A is unclustered, estimate the number of disk accesses needed to compute the query σA=18(s).

Optional question:

  1. (b) What would be the cost estimate if the index were clustered? Explain your reasoning.

Hint: The tuples might cluster around a page boundary.


Optional exercise


Estimate the cost of rs using

  1. (a) Sort-merge join

  2. (b) Block nested loops


where r has 1,000 tuples, 20 tuples per page; s has 2,000 tuples, 4 tuples per page; and the main memory buffer for this operation is 22 pages long.