[Brown CS Talks] Brown CS Colloquium: Robert Moore talk in Lubrano on 10/24/02 at 4 pm
talks@list.cs.brown.edu
talks@list.cs.brown.edu
Mon, 07 Oct 2002 10:44:52 -0400
BROWN UNIVERSITY
COMPUTER SCIENCE COLLOQUIUM
presents
Robert C. Moore
Microsoft Research
Thursday, October 24, 2002 at 4:00 pm
Lubrano Conference Room (CIT 4th floor)
Refreshments will be served at 3:45 pm
An Experiment in Unsupervised Training of Statistical Translation
Models*
Abstract
A substantial amount of work in recent years has addressed the problem
of machine translation using statistical methods. Most of this work,
however, depends on having ``bitexts'' -- parallel corpora consisting
of the same text in two languages. While a number of bitexts are widely
available, they do not begin to cover the broad range of topics or
languages one might wish to be able to translate. Thus, statistical
translation models would be far more useful if they could be trained
without the need for bitexts.
In fact, the statistical framework most commonly used in the work with
bitexts, the ``source-channel model'', can be applied in principle to
learning translation models without the use of bitexts. Some work has
been done in this direction, but it has generally been limited in
scope, typically by relying on hand-compiled bilingual dictionaries in
some way.
In our experiment, we ask whether it might be possible to learn a
complete statistical word-translation model without any prior
knowledge of the translation of any word. To simplify other aspects
of the translation problem, we have created an artificial translation
task consisting of a word-level substitution cipher for half of a
corpus in a limited domain, and we attempt to learn the correct
decoding of this cipher, using only word frequency and sequence
statistics derived from the other half of the corpus. By applying the
source-channel model, we were able to obtain 89\% word-translation
accuracy in this task.
In this talk we will look at the method and results in detail,
including an analysis of the type of errors committed by the model. We
will conclude by considering the implications of this experiment for a
famous issue in philosophy of language, W.V.O. Quine's doctrine of
``indeterminacy of radical translation.''
*Joint work with Michele Banko of Microsoft Research
Host: Professor Eugene Charniak