[CI] Initial pass at new intro
Hideaki Kimura
hkimura at cs.brown.edu
Mon Jun 16 19:21:45 EDT 2008
The new introduction counts to me. It covers what we actually
did and what we think related (and will not provoke the forerunners:-).
One thing I'd like to do is to clearly point out that CORDS/BHUNT
doesn't describe a query cost model nor a detailed system to utilize the
correlation therefore our contribution (cost model and CT designer) is
significant in Section 7 (related work).
> We need to say if there is anything about c_per_u / our estimation of
> c_per_u that is different / better than CORDS. If not we should cut
> down sections 3 and 4 and say we use the same method as CORDS so that we
> can dedicate more space to bucketing and multicolumn search in section 5.
Actually not same because CORDS uses cardinality information in system
catalogs (page6 "Step 1". chi-square test for correlation detection is
sampling-based though), but anyway our cardinality estimation is
other's work or combination of others' work.
BTW, I found that saying CORDS does no bucketing might be a bit risky;
It does Hash-based bucketing for chi-square test in Page 6 "Step 5".
However, (if I understand correctly) their bucketing is totally
different from us because bucketing with hashing does not preserve
adjacency of values; it's just for chi-square test in Step 5-c.
Actually, they prune out many-valued columns in Step 1, which will
neglects correlated pairs that need bucketing like "price-category" in
ebay data.
I committed Bucketing.pdf and Bucketing2.pdf. Could you combine
the 2 pdfs again, Sam?
Samuel Madden wrote:
> All --
>
> I checked in a new intro which I think addresses the concerns about
> related work a little more clearly. Please look it over and see what
> you think -- Alex, I believe I've captured all the ideas you guys
> mentioned in your mail, but I might have missed on or two.
>
> I'm sure we will need to iterate on this, so pls read and comment!
>
> There are a number of comments throughout the paper where I note things
> that need to fix in light of related work, etc. Everyone should read
> the BHUNT and CORDS papers and understand the basic approaches since
> they appear to be the state of the art and are clearly pretty close to
> what we have done.
>
> I also made two other changes:
>
> 1) A new title, making the scheme sound less like a new index and more
> like a way to exploit soft function dependencies at query execution time
>
> 2) Changed CI to CT (correlation table), so that it doesn't seem as much
> like we are proposing a new index (which we aren't.)
>
> I think we need to clean up our story re: unclustered B+trees a little
> more -- they aren't really a competitive technique, just another way to
> exploit correlations (rather than predicate introduction) once
> correlations have been found. Problem is that one could use an existing
> tool like CORDS and then create indices on the unclustered attributes
> which might be a win.
>
> We need to say if there is anything about c_per_u / our estimation of
> c_per_u that is different / better than CORDS. If not we should cut
> down sections 3 and 4 and say we use the same method as CORDS so that we
> can dedicate more space to bucketing and multicolumn search in section 5.
>
> Also, there are some references to CT in the figures that I couldn't fix
> -- if one of you could take care of that, that'd be great.
>
>
>
> -Sam
> _______________________________________________
> CI mailing list
> CI at list.cs.brown.edu
> http://list.cs.brown.edu/mailman/listinfo/ci
>
--
Hideaki Kimura <hkimura at cs.brown.edu>
More information about the CI
mailing list