[CI] About uncorrelated cost model
Samuel Madden
madden at csail.mit.edu
Thu Jun 26 09:19:55 EDT 2008
(Alex note that you originally sent this message to the wrong email
address)
I don't really buy it, sorry.
I do agree that its important for the cost model to account not just
for c_per_u but the cardinality of two attributes, but the cost model
we propose does do that already. Maybe the cost formula we have will
just work -- I do agree that a unified model would be better.
I don't think we should have different forumlas for the bucketed/
unbucketed case -- bucketing just affects the overall cardinality of
the columns.
On Jun 25, 2008, at 11:09 AM, Alexander Rasin wrote:
>
> We talked about this some more with Hideaki yesterday, and here's
> another way to phrase it - see if you believe it.
>
> The reason why c_per_u doesn't make sense for uncorrelated
> attributes is not because it doesn't apply, because technically it
> does. The reason why we might want a different cost model is
> because uncorrelated attributes are never bucketed. In the absence
> of bucketing c_per_u is a less useful measure.
> Very many-valued column might have low c_per_u due to uniqueness,
> but isn't useful because that never applies to a range of values
> (and bucketed low c_per_u means that up to a range equal to a bucket
> of values, locality applies)
>
> So we want to unify the cost model for BTree/CT and then perhaps try
> calling the two different cases are bucketed/un-bucketed, not
> clustered/unclustered, even though no bucketing technically means
> bucket size of 1 row.
>
> Alex.
>
>> I agree that a unified model might be good -- I just don't think it
>> makes sense to use c_per_u in a cost model for uncorrelated
>> attributes.
>>
>> On Jun 24, 2008, at 7:53 PM, Hideaki Kimura wrote:
>>
>>> I understand the cost model of uncorrelated case is correct, but
>>> it might give reviewers a wrong impression that we are claiming
>>> that B+Tree does something stupid in the case and our CT doesn't.
>>> As the CT base cost model shifts to sequential scan if c_per_u is
>>> high (which is almost accurate in figure 3), I think we don't need
>>> a different cost model for uncorrelated case.
>>> --
>>> Hideaki Kimura <hkimura at cs.brown.edu>
>>> _______________________________________________
>>> CI mailing list
>>> CI at list.cs.brown.edu
>>> http://list.cs.brown.edu/mailman/listinfo/ci
>>
>> _______________________________________________
>> CI mailing list
>> CI at list.cs.brown.edu
>> http://list.cs.brown.edu/mailman/listinfo/ci
More information about the CI
mailing list