[CI] Initial pass at new intro

Hideaki Kimura hkimura at cs.brown.edu
Mon Jun 16 19:21:45 EDT 2008


The new introduction counts to me. It covers what we actually
did and what we think related (and will not provoke the forerunners:-).

One thing I'd like to do is to clearly point out that CORDS/BHUNT 
doesn't describe a query cost model nor a detailed system to utilize the 
correlation therefore our contribution (cost model and CT designer) is 
significant in Section 7 (related work).

 > We need to say if there is anything about c_per_u / our estimation of
 > c_per_u that is different / better than CORDS.  If not we should cut
 > down sections 3 and 4 and say we use the same method as CORDS so that we
 > can dedicate more space to bucketing and multicolumn search in section 5.
Actually not same because CORDS uses cardinality information in system 
catalogs (page6 "Step 1". chi-square test for correlation detection is 
sampling-based though), but anyway our cardinality estimation is
other's work or combination of others' work.


BTW, I found that saying CORDS does no bucketing might be a bit risky; 
It does Hash-based bucketing for chi-square test in Page 6 "Step 5". 
However, (if I understand correctly) their bucketing is totally 
different from us because bucketing with hashing does not preserve 
adjacency of values; it's just for chi-square test in Step 5-c.
Actually, they prune out many-valued columns in Step 1, which will
neglects correlated pairs that need bucketing like "price-category" in 
ebay data.


I committed Bucketing.pdf and Bucketing2.pdf. Could you combine
the 2 pdfs again, Sam?

Samuel Madden wrote:
> All --
> 
> I checked in a new intro which I think addresses the concerns about 
> related work a little more clearly.  Please look it over and see what 
> you think -- Alex, I believe I've captured all the ideas you guys 
> mentioned in your mail, but I might have missed on or two.
> 
> I'm sure we will need to iterate on this, so pls read and comment!
> 
> There are a number of comments throughout the paper where I note things 
> that need to fix in light of related work, etc.  Everyone should read 
> the BHUNT and CORDS papers and understand the basic approaches since 
> they appear to be the state of the art and are clearly pretty close to 
> what we have done.
> 
> I also made two other changes:
> 
> 1) A new title, making the scheme sound less like a new index and more 
> like a way to exploit soft function dependencies at query execution time
> 
> 2) Changed CI to CT (correlation table), so that it doesn't seem as much 
> like we are proposing a new index (which we aren't.)
> 
>  I think we need to clean up our story re: unclustered B+trees a little 
> more -- they aren't really a competitive technique, just another way to 
> exploit correlations (rather than predicate introduction) once 
> correlations have been found.  Problem is that one could use an existing 
> tool like CORDS and then create indices on the unclustered attributes 
> which might be a win.
> 
> We need to say if there is anything about c_per_u / our estimation of 
> c_per_u that is different / better than CORDS.  If not we should cut 
> down sections 3 and 4 and say we use the same method as CORDS so that we 
> can dedicate more space to bucketing and multicolumn search in section 5.
> 
> Also, there are some references to CT in the figures that I couldn't fix 
> -- if one of you could take care of that, that'd be great.
> 
> 
> 
> -Sam
> _______________________________________________
> CI mailing list
> CI at list.cs.brown.edu
> http://list.cs.brown.edu/mailman/listinfo/ci
> 


-- 
Hideaki Kimura <hkimura at cs.brown.edu>


More information about the CI mailing list