[CI] About uncorrelated cost model
George Huo
george.huo at gmail.com
Thu Jun 26 11:43:32 EDT 2008
On Thu, Jun 26, 2008 at 8:22 AM, Hideaki Kimura <hkimura at cs.brown.edu> wrote:
> Good point. I think about it and concluded that even B+Tree is affected by
> clustering attribute bucketing (here, I say non-unique clustering attribute
> is a bucketing).
>
> I think you are pointing out following case, right?
>
> Disk layout
> c1 u9
> c1 u9
> c1 u9
> c1 u9
> c2 u1
> c2 u1
> c2 u1
> c2 u1
> c2 u2
> c2 u2
> c2 u2
> c2 u2
> ....
> c2 u9
> c2 u9
> c3 u2
>
> SELECT ... WHERE u=u1
>
> In this case, B+Tree scans just 4 tuples for each A_c and assuming c_pages=1
> admittedly might underestimate the cost of B+Tree if 4 tuples occupy more
> than 1 pages. We need to know the selectivity of "WHERE u=u1 AND c=c2" to
> estimate the number of sequentially scanned pages.
>
>
>
> However, this can't happen actually. As it's clustered juts on C,
> the actual disk layout would be:
>
> Actual Disk layout
> c1 u4
> c1 u3
> c1 u5
> c1 u9
> c2 u3
> c2 u2
> c2 u4
> c2 u1
> c2 u7
> c2 u1
> c2 u9
> ....
> c2 u3
> c2 u1
> c3 u2
>
> Because u=u1 is randomly scattered over all pages where c=c2, Bitmap Index
> Scan will read almost all pages where c=c2. So, B+Tree also reads c_pages
> pages for each A_c, too.
Yeah, I think we'll have to make an assumption like this. The old
synthetic experiments were designed precisely to make this happen and
make B+Trees look bad :) In general there's no real expectation that
the B+Tree will touch _all_ of the c_pages, though.
>
> In sum, for *both* B+Tree and CT, c_pages is 1 to 20 depending on clustering
> attribute bucketing (not-unique clustering attribute is a bucketing, too).
> We can use the unified cost model, and actually it was accurate for both in
> the previous experiments.
>
> Also, note that the c_pages only contributes to the sequential scan cost and
> it's much less than seek cost (disk_seek_cost*btree_height=28ms,
> seqscan_cost=0.08ms!).
>
>
> George Huo wrote:
>>
>> On Thu, Jun 26, 2008 at 6:49 AM, Hideaki Kimura <hkimura at cs.brown.edu>
>> wrote:
>>>
>>> c_pages makes sense both for CT and B+Tree.
>>> For B+Tree, it's just one page.
>>
>> What if there are enough (A_c, A_u, ...) tuples to fill two pages?
>
> --
> Hideaki Kimura <hkimura at cs.brown.edu>
>
More information about the CI
mailing list