[CI] About uncorrelated cost model

Hideaki Kimura hkimura at cs.brown.edu
Thu Jun 26 11:22:20 EDT 2008


Good point. I think about it and concluded that even B+Tree is affected 
by clustering attribute bucketing (here, I say non-unique clustering 
attribute is a bucketing).

I think you are pointing out following case, right?

Disk layout
c1 u9
c1 u9
c1 u9
c1 u9
c2 u1
c2 u1
c2 u1
c2 u1
c2 u2
c2 u2
c2 u2
c2 u2
....
c2 u9
c2 u9
c3 u2

SELECT ... WHERE u=u1

In this case, B+Tree scans just 4 tuples for each A_c and assuming 
c_pages=1 admittedly might underestimate the cost of B+Tree if 4 tuples 
occupy more than 1 pages. We need to know the selectivity of "WHERE u=u1 
AND c=c2" to estimate the number of sequentially scanned pages.



However, this can't happen actually. As it's clustered juts on C,
the actual disk layout would be:

Actual Disk layout
c1 u4
c1 u3
c1 u5
c1 u9
c2 u3
c2 u2
c2 u4
c2 u1
c2 u7
c2 u1
c2 u9
....
c2 u3
c2 u1
c3 u2

Because u=u1 is randomly scattered over all pages where c=c2, Bitmap 
Index Scan will read almost all pages where c=c2. So, B+Tree also reads 
c_pages pages for each A_c, too.

In sum, for *both* B+Tree and CT, c_pages is 1 to 20 depending on 
clustering attribute bucketing (not-unique clustering attribute is a 
bucketing, too). We can use the unified cost model, and actually it was 
accurate for both in the previous experiments.

Also, note that the c_pages only contributes to the sequential scan cost 
and it's much less than seek cost (disk_seek_cost*btree_height=28ms, 
seqscan_cost=0.08ms!).


George Huo wrote:
> On Thu, Jun 26, 2008 at 6:49 AM, Hideaki Kimura <hkimura at cs.brown.edu> wrote:
>> c_pages makes sense both for CT and B+Tree.
>> For B+Tree, it's just one page.
> 
> What if there are enough (A_c, A_u, ...) tuples to fill two pages?
-- 
Hideaki Kimura <hkimura at cs.brown.edu>


More information about the CI mailing list