[CI] About uncorrelated cost model
Hideaki Kimura
hkimura at cs.brown.edu
Thu Jun 26 09:49:51 EDT 2008
c_pages makes sense both for CT and B+Tree.
For B+Tree, it's just one page.
As for CT, it's 1 to 20 depending on clustering attribute bucketing
(not-unique clustering attribute is a bucketing, too).
I think the cost model works well for both as CT and Bitmap Index Scan
on B+Tree is basically same in terms of access pattern.
George Huo wrote:
> On Thu, Jun 26, 2008 at 6:19 AM, Samuel Madden <madden at csail.mit.edu> wrote:
>> (Alex note that you originally sent this message to the wrong email address)
>>
>> I don't really buy it, sorry.
>>
>> I do agree that its important for the cost model to account not just for
>> c_per_u but the cardinality of two attributes, but the cost model we propose
>> does do that already. Maybe the cost formula we have will just work -- I do
>> agree that a unified model would be better.
>
> A note from reading over the section -- the model currently says 'you
> visit each A_c value, then you read c_pages pages for that c value.'
> This is true for CIs, but not for btrees, which only need to read
> pages where A_u tuples appear -- how would we fix it to unify the two?
>
>> I don't think we should have different forumlas for the bucketed/unbucketed
>> case -- bucketing just affects the overall cardinality of the columns.
>>
>>
>> On Jun 25, 2008, at 11:09 AM, Alexander Rasin wrote:
>>
>>> We talked about this some more with Hideaki yesterday, and here's another
>>> way to phrase it - see if you believe it.
>>>
>>> The reason why c_per_u doesn't make sense for uncorrelated attributes is
>>> not because it doesn't apply, because technically it does. The reason why
>>> we might want a different cost model is because uncorrelated attributes are
>>> never bucketed. In the absence of bucketing c_per_u is a less useful
>>> measure.
>>> Very many-valued column might have low c_per_u due to uniqueness, but
>>> isn't useful because that never applies to a range of values (and bucketed
>>> low c_per_u means that up to a range equal to a bucket of values, locality
>>> applies)
>>>
>>> So we want to unify the cost model for BTree/CT and then perhaps try
>>> calling the two different cases are bucketed/un-bucketed, not
>>> clustered/unclustered, even though no bucketing technically means bucket
>>> size of 1 row.
>>>
>>> Alex.
>>>
>>>> I agree that a unified model might be good -- I just don't think it makes
>>>> sense to use c_per_u in a cost model for uncorrelated attributes.
>>>>
>>>> On Jun 24, 2008, at 7:53 PM, Hideaki Kimura wrote:
>>>>
>>>>> I understand the cost model of uncorrelated case is correct, but it
>>>>> might give reviewers a wrong impression that we are claiming that B+Tree
>>>>> does something stupid in the case and our CT doesn't.
>>>>> As the CT base cost model shifts to sequential scan if c_per_u is high
>>>>> (which is almost accurate in figure 3), I think we don't need a different
>>>>> cost model for uncorrelated case.
>>>>> --
>>>>> Hideaki Kimura <hkimura at cs.brown.edu>
>>>>> _______________________________________________
>>>>> CI mailing list
>>>>> CI at list.cs.brown.edu
>>>>> http://list.cs.brown.edu/mailman/listinfo/ci
>>>> _______________________________________________
>>>> CI mailing list
>>>> CI at list.cs.brown.edu
>>>> http://list.cs.brown.edu/mailman/listinfo/ci
>>
>>
>>
>> _______________________________________________
>> CI mailing list
>> CI at list.cs.brown.edu
>> http://list.cs.brown.edu/mailman/listinfo/ci
>>
> _______________________________________________
> CI mailing list
> CI at list.cs.brown.edu
> http://list.cs.brown.edu/mailman/listinfo/ci
>
--
Hideaki Kimura <hkimura at cs.brown.edu>
More information about the CI
mailing list