Google Tech Talks
September, 5 2007
ABSTRACT
Statistical language processing tools are being applied to an
ever-wider and more varied range of linguistic data. Researchers and
engineers are using statistical models to organize and understand
financial news, legal documents, biomedical abstracts, and weblog
entries, among many other domains. Because language varies so widely,
collecting and curating training sets for each different domain is
prohibitively expensive. At the same time, differences in vocabulary
and writing style across domains can cause state-of-the-art supervised
models to dramatically increase in error.
This talk describes structural correspondence learning (SCL), a method
for adapting models from resource-rich source domains to resource-poor
target domains. SCL uses unlabeled data from both domains to induce a
common feature representation for domain adaptation. We demonstrate
SCL for two NLP tasks: sentiment classification and part of speech
tagging. For each of these tasks, SCL significantly reduces the error
of a state-of-the-art discriminative model.
Speaker: John Blitzer