[CS241] data sharing

Skip dave.hirshberg at gmail.com
Mon Oct 29 06:54:18 EDT 2007


An amendment

Where Avram and I have the same path, we have the same tense-counts for all
reasonable (tree.vals[8] == [01][01][1-4]) tenses.
When I include unreasonable tenses, we get different counts.
Since I counted paths with unreasonable tenses when I took paths with 10,000
instances, we got some different paths.

The following is a list of #path-instances by path-type for (my, avram's)
top 12 path-types.
 (128041, 115487),
 (125588, 88564),
 (93718, 72037),
 (39514, 38546),
 (35535, 31282),
 (32972, 25657),
 (25879, 23351),
 (21643, 18062),
 (19329, 17029),
 (17293, 10623),
 (11499, 4074),
 (10465, 521)

These are the corresponding path-pairs (in my notation, ^X^ => X is the
root):
                          ^VP^ SBAR S VP
VP_SBAR_S_VP_topNode_VP0.txt
                               ^VP^ S VP
VP_S_VP_topNode_VP0.txt
                                 ^VP^ VP
VP_VP_topNode_VP0.txt
                           VP S ^S^ S VP
VP_S_S_S_VP_topNode_S2.txt
                       ^VP^ NP SBAR S VP
VP_NP_SBAR_S_VP_topNode_VP0.txt
                             VP ^S^ S VP
VP_S_SBAR_S_VP_topNode_S3.txt
                        VP ^S^ SBAR S VP
VP_S_S_VP_topNode_S2.txt
                    ^VP^ PP NP SBAR S VP
VP_PP_NP_SBAR_S_VP_topNode_VP0.txt
                            ^VP^ PP S VP
VP_S_SBAR_NP_S_VP_topNode_S4.txt
                     VP ^S^ NP SBAR S VP
VP_S_SINV_VP_topNode_SINV2.txt
                          VP ^SINV^ S VP
VP_NP_VP_topNode_VP0.txt
                              ^VP^ NP VP
VP_PP_S_VP_topNode_VP0.txt

Avram:
You said you'd not counted a some paths and attached a new file of
P(vp1|vp0,path)/P(vp1|path)s, but not a new file of counts.
If the counts you sent out aren't current, can you attach your new counts in
the same format?

On 10/29/07, Skip <dave.hirshberg at gmail.com> wrote:
>
> I get the same paths you have, but different tense-counts.
> 4 or 5 of our paths differ from Avram's, but where Avram and I have the
> same path (our top 6 or so are the same), we have the same tense-counts.
>
> ?
>
> On 10/28/07, Tim St. Clair <tstclair at cs.brown.edu> wrote:
>
> > Here is my initial set of data.  It looks like it is different from
> > juris, but I haven't checked it out that closely yet.
> >
> > The listserv would not let me attach it, so here it is in google
> > document format.  Let me know if you want a copy of the csv file.
> >
> > http://spreadsheets.google.com/ccc?key=p3coYwZqOPPzwP5bOiMBrBQ&hl=en
> >
> > --
> > Tim St. Clair
> >
> > (617) 460 - 6497
> > _______________________________________________
> > CS241 mailing list
> > CS241 at list.cs.brown.edu
> > http://list.cs.brown.edu/mailman/listinfo/cs241
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://list.cs.brown.edu/pipermail/cs241/attachments/20071029/870ac343/attachment.html


More information about the CS241 mailing list