[CI] Refactoring Section 3 and 4

Samuel Madden madden at csail.mit.edu
Tue Jun 24 18:03:54 EDT 2008


Guys -- what phone # are we using for the call?  Just tried Stan's  
office, but no answer.

On Jun 24, 2008, at 5:41 PM, Hideaki Kimura wrote:

> Alex and I had reviewed Section 3 (MODEL) and 4 (ESTIMATING CT  
> COST). Following is the suggestion from us to be discussed at the  
> upcoming meeting at 6p.
>
> 3. MODEL
> 3.1-3.2 (Preliminaries, Seq Scan)
> No change.
>
> 3.3 Cost of CT
> This subsection will remain without much change except that we'll  
> say that c_per_u depends on bucketing and therefore too wide  
> bucketing affects query performance.
>
> About Sam's comment on this section:
> We think the cost model is not similar to CORDS (actually, CORDS  
> didn't provide cost model), so we don't have to refer CORDS/BHUNT  
> here. Is it right?
>
> 3.4-3.5 Cost of Unclustered B+Tree, Discussion
> We want to do a substantial respinning here.
>
> First, the previous 3.4 and 3.5 assumes that the B+Tree  
> implementation doesn't have Bitmap Index Scan access, which isn't  
> true and might make reviewers think that we overlooked the access  
> method. We should just assume B+Tree with Bitmap Index Scan and  
> clearly say so in this section.
>
> Second, however, building a cost model for B+Tree with Bitmap Index  
> Scan is not easy; we need c_per_u-like parameter to estimate the  
> cost. Selectivity information is not enough to estimate the cost as  
> Bitmap Index Scan is affected by how much the selected tuples are  
> scattered over the disk. We suggest to have a small section to say  
> that:
>
> a) Existing DBMS like PostgreSQL didn't have a decent cost model for  
> Bitmap Index Scan because it needs c_per_u to estimate the cost.
> b) Our CT cost model is applicable to B+Tree with Bitmap Index Scan  
> and more accurate because it's aware of page clustering.
> c) The only difference b/w Bitmap Index Scan and CT is that CT has  
> bucketing and c_per_u is variable while B+Tree c_per_u is fixed.
>
> It's a bit challenging to claim, but it will clarify the difference  
> b/w CT and Bitmap Index Scan so that reviewers will know we are  
> comparing apple to apple.
>
>
> 4. Estimating CT COST
> We think this section should be a subsection of Section 3 right  
> after the cost model because CT cost model should be followed by how  
> to estimate c_per_u in the cost model. Instead, Section 5 should be  
> renamed to "CT Advisor" or something.
>
> We should say in this subsection that c_per_u is same as FD strength  
> defined in CORDS paper but we used the value for many other  
> purposes; CT size estimation and query cost estimation, which  
> requires more accuracy on c_per_u estimation and that's why we used  
> Distinct Sampling unlike CORDS.
> -- 
> Hideaki Kimura <hkimura at cs.brown.edu>
> _______________________________________________
> CI mailing list
> CI at list.cs.brown.edu
> http://list.cs.brown.edu/mailman/listinfo/ci



More information about the CI mailing list