[CI] About uncorrelated cost model
George Huo
george.huo at gmail.com
Thu Jun 26 10:26:42 EDT 2008
On Thu, Jun 26, 2008 at 6:49 AM, Hideaki Kimura <hkimura at cs.brown.edu> wrote:
> c_pages makes sense both for CT and B+Tree.
> For B+Tree, it's just one page.
What if there are enough (A_c, A_u, ...) tuples to fill two pages?
> As for CT, it's 1 to 20 depending on clustering attribute bucketing
> (not-unique clustering attribute is a bucketing, too).
>
> I think the cost model works well for both as CT and Bitmap Index Scan
> on B+Tree is basically same in terms of access pattern.
>
> George Huo wrote:
>>
>> On Thu, Jun 26, 2008 at 6:19 AM, Samuel Madden <madden at csail.mit.edu>
>> wrote:
>>>
>>> (Alex note that you originally sent this message to the wrong email
>>> address)
>>>
>>> I don't really buy it, sorry.
>>>
>>> I do agree that its important for the cost model to account not just for
>>> c_per_u but the cardinality of two attributes, but the cost model we
>>> propose
>>> does do that already. Maybe the cost formula we have will just work -- I
>>> do
>>> agree that a unified model would be better.
>>
>> A note from reading over the section -- the model currently says 'you
>> visit each A_c value, then you read c_pages pages for that c value.'
>> This is true for CIs, but not for btrees, which only need to read
>> pages where A_u tuples appear -- how would we fix it to unify the two?
>>
>>> I don't think we should have different forumlas for the
>>> bucketed/unbucketed
>>> case -- bucketing just affects the overall cardinality of the columns.
>>>
>>>
>>> On Jun 25, 2008, at 11:09 AM, Alexander Rasin wrote:
>>>
>>>> We talked about this some more with Hideaki yesterday, and here's
>>>> another
>>>> way to phrase it - see if you believe it.
>>>>
>>>> The reason why c_per_u doesn't make sense for uncorrelated attributes is
>>>> not because it doesn't apply, because technically it does. The reason
>>>> why
>>>> we might want a different cost model is because uncorrelated attributes
>>>> are
>>>> never bucketed. In the absence of bucketing c_per_u is a less useful
>>>> measure.
>>>> Very many-valued column might have low c_per_u due to uniqueness, but
>>>> isn't useful because that never applies to a range of values (and
>>>> bucketed
>>>> low c_per_u means that up to a range equal to a bucket of values,
>>>> locality
>>>> applies)
>>>>
>>>> So we want to unify the cost model for BTree/CT and then perhaps try
>>>> calling the two different cases are bucketed/un-bucketed, not
>>>> clustered/unclustered, even though no bucketing technically means bucket
>>>> size of 1 row.
>>>>
>>>> Alex.
>>>>
>>>>> I agree that a unified model might be good -- I just don't think it
>>>>> makes
>>>>> sense to use c_per_u in a cost model for uncorrelated attributes.
>>>>>
>>>>> On Jun 24, 2008, at 7:53 PM, Hideaki Kimura wrote:
>>>>>
>>>>>> I understand the cost model of uncorrelated case is correct, but it
>>>>>> might give reviewers a wrong impression that we are claiming that
>>>>>> B+Tree
>>>>>> does something stupid in the case and our CT doesn't.
>>>>>> As the CT base cost model shifts to sequential scan if c_per_u is high
>>>>>> (which is almost accurate in figure 3), I think we don't need a
>>>>>> different
>>>>>> cost model for uncorrelated case.
>>>>>> --
>>>>>> Hideaki Kimura <hkimura at cs.brown.edu>
>>>>>> _______________________________________________
>>>>>> CI mailing list
>>>>>> CI at list.cs.brown.edu
>>>>>> http://list.cs.brown.edu/mailman/listinfo/ci
>>>>>
>>>>> _______________________________________________
>>>>> CI mailing list
>>>>> CI at list.cs.brown.edu
>>>>> http://list.cs.brown.edu/mailman/listinfo/ci
>>>
>>>
>>>
>>> _______________________________________________
>>> CI mailing list
>>> CI at list.cs.brown.edu
>>> http://list.cs.brown.edu/mailman/listinfo/ci
>>>
>> _______________________________________________
>> CI mailing list
>> CI at list.cs.brown.edu
>> http://list.cs.brown.edu/mailman/listinfo/ci
>>
>
> --
> Hideaki Kimura <hkimura at cs.brown.edu>
>
More information about the CI
mailing list