[CI] Refactoring Section 3 and 4
Samuel Madden
madden at csail.mit.edu
Tue Jun 24 18:03:54 EDT 2008
Guys -- what phone # are we using for the call? Just tried Stan's
office, but no answer.
On Jun 24, 2008, at 5:41 PM, Hideaki Kimura wrote:
> Alex and I had reviewed Section 3 (MODEL) and 4 (ESTIMATING CT
> COST). Following is the suggestion from us to be discussed at the
> upcoming meeting at 6p.
>
> 3. MODEL
> 3.1-3.2 (Preliminaries, Seq Scan)
> No change.
>
> 3.3 Cost of CT
> This subsection will remain without much change except that we'll
> say that c_per_u depends on bucketing and therefore too wide
> bucketing affects query performance.
>
> About Sam's comment on this section:
> We think the cost model is not similar to CORDS (actually, CORDS
> didn't provide cost model), so we don't have to refer CORDS/BHUNT
> here. Is it right?
>
> 3.4-3.5 Cost of Unclustered B+Tree, Discussion
> We want to do a substantial respinning here.
>
> First, the previous 3.4 and 3.5 assumes that the B+Tree
> implementation doesn't have Bitmap Index Scan access, which isn't
> true and might make reviewers think that we overlooked the access
> method. We should just assume B+Tree with Bitmap Index Scan and
> clearly say so in this section.
>
> Second, however, building a cost model for B+Tree with Bitmap Index
> Scan is not easy; we need c_per_u-like parameter to estimate the
> cost. Selectivity information is not enough to estimate the cost as
> Bitmap Index Scan is affected by how much the selected tuples are
> scattered over the disk. We suggest to have a small section to say
> that:
>
> a) Existing DBMS like PostgreSQL didn't have a decent cost model for
> Bitmap Index Scan because it needs c_per_u to estimate the cost.
> b) Our CT cost model is applicable to B+Tree with Bitmap Index Scan
> and more accurate because it's aware of page clustering.
> c) The only difference b/w Bitmap Index Scan and CT is that CT has
> bucketing and c_per_u is variable while B+Tree c_per_u is fixed.
>
> It's a bit challenging to claim, but it will clarify the difference
> b/w CT and Bitmap Index Scan so that reviewers will know we are
> comparing apple to apple.
>
>
> 4. Estimating CT COST
> We think this section should be a subsection of Section 3 right
> after the cost model because CT cost model should be followed by how
> to estimate c_per_u in the cost model. Instead, Section 5 should be
> renamed to "CT Advisor" or something.
>
> We should say in this subsection that c_per_u is same as FD strength
> defined in CORDS paper but we used the value for many other
> purposes; CT size estimation and query cost estimation, which
> requires more accuracy on c_per_u estimation and that's why we used
> Distinct Sampling unlike CORDS.
> --
> Hideaki Kimura <hkimura at cs.brown.edu>
> _______________________________________________
> CI mailing list
> CI at list.cs.brown.edu
> http://list.cs.brown.edu/mailman/listinfo/ci
More information about the CI
mailing list