[CI] About uncorrelated cost model
George Huo
george.huo at gmail.com
Thu Jun 26 09:36:31 EDT 2008
On Thu, Jun 26, 2008 at 6:19 AM, Samuel Madden <madden at csail.mit.edu> wrote:
> (Alex note that you originally sent this message to the wrong email address)
>
> I don't really buy it, sorry.
>
> I do agree that its important for the cost model to account not just for
> c_per_u but the cardinality of two attributes, but the cost model we propose
> does do that already. Maybe the cost formula we have will just work -- I do
> agree that a unified model would be better.
A note from reading over the section -- the model currently says 'you
visit each A_c value, then you read c_pages pages for that c value.'
This is true for CIs, but not for btrees, which only need to read
pages where A_u tuples appear -- how would we fix it to unify the two?
>
> I don't think we should have different forumlas for the bucketed/unbucketed
> case -- bucketing just affects the overall cardinality of the columns.
>
>
> On Jun 25, 2008, at 11:09 AM, Alexander Rasin wrote:
>
>>
>> We talked about this some more with Hideaki yesterday, and here's another
>> way to phrase it - see if you believe it.
>>
>> The reason why c_per_u doesn't make sense for uncorrelated attributes is
>> not because it doesn't apply, because technically it does. The reason why
>> we might want a different cost model is because uncorrelated attributes are
>> never bucketed. In the absence of bucketing c_per_u is a less useful
>> measure.
>> Very many-valued column might have low c_per_u due to uniqueness, but
>> isn't useful because that never applies to a range of values (and bucketed
>> low c_per_u means that up to a range equal to a bucket of values, locality
>> applies)
>>
>> So we want to unify the cost model for BTree/CT and then perhaps try
>> calling the two different cases are bucketed/un-bucketed, not
>> clustered/unclustered, even though no bucketing technically means bucket
>> size of 1 row.
>>
>> Alex.
>>
>>> I agree that a unified model might be good -- I just don't think it makes
>>> sense to use c_per_u in a cost model for uncorrelated attributes.
>>>
>>> On Jun 24, 2008, at 7:53 PM, Hideaki Kimura wrote:
>>>
>>>> I understand the cost model of uncorrelated case is correct, but it
>>>> might give reviewers a wrong impression that we are claiming that B+Tree
>>>> does something stupid in the case and our CT doesn't.
>>>> As the CT base cost model shifts to sequential scan if c_per_u is high
>>>> (which is almost accurate in figure 3), I think we don't need a different
>>>> cost model for uncorrelated case.
>>>> --
>>>> Hideaki Kimura <hkimura at cs.brown.edu>
>>>> _______________________________________________
>>>> CI mailing list
>>>> CI at list.cs.brown.edu
>>>> http://list.cs.brown.edu/mailman/listinfo/ci
>>>
>>> _______________________________________________
>>> CI mailing list
>>> CI at list.cs.brown.edu
>>> http://list.cs.brown.edu/mailman/listinfo/ci
>
>
>
>
> _______________________________________________
> CI mailing list
> CI at list.cs.brown.edu
> http://list.cs.brown.edu/mailman/listinfo/ci
>
More information about the CI
mailing list