[CS241] Asgn2 questions
Eugene Charniak
ec at cs.brown.edu
Fri Sep 21 21:47:02 EDT 2007
Justin,
The files 00.strip and 01.strip were missing the empty lines
that signal a story end. 2-9.strip had them. I have
replaced 00.strip and 01.strip with correct versions.
Other than that I am picking up the Tree::newline signal
just fine.
Your interpretation of the assignment looks right to me.
Eugene
On Fri, Sep 21, 2007 at 06:27:09PM -0400, Justin Palmer wrote:
> Hi Lenora,
>
> Here's my take on your questions. I might be wrong...
>
> For KL divergence, yes, order matters. D(p || q) != D(q || p).
> That's why KL divergence is not a distance/metric. So, I'm using the
> order Eugene asked for in the assignment, e.g, D( p(tense) || p(tense
> | tense in prev sent) ). Regarding the log(0) issue, I ran into it
> too; my understanding is that for entropy calculations, we define 0 *
> log 0 = 0.
>
> To compute D( p(tense) || p(tense|tense in prev sent) ), I did:
>
> KL += p(tense) * log (p(tense) / p(tense | tense in prev sent))
>
> for all tenses. So we get 2 numbers saying how useful the two
> conditional probability distributions are as predictors of tense.
> Does that make sense?
>
> I understand part d to mean calculate p(tense | tense any but prev
> sent). I'm computing:
>
> P(past | past), P(future | past), etc.
>
> So if you're at sentence i, and it's past, and a past tense also
> occurred in any of the previous sentences other than the last one,
> that counts. Also, if a future occurred in any sentence other than
> the previous sentence, add another count. And so on.
>
> If you'd like to to compare numbers, please let me know.
>
> Also, is anyone else having problems with the Tree::newstory flag?
> Appears that it never gets set, but I'm probably missing something
> obvious.
>
> Thanks,
>
> -- j
> _______________________________________________
> CS241 mailing list
> CS241 at list.cs.brown.edu
> http://list.cs.brown.edu/mailman/listinfo/cs241
More information about the CS241
mailing list