My research interests are very broad,
including natural language processing, artificial intelligence,
machine learning and computational genomics.
However, I am primarily interested in Natural Language Generation. I am currently
working with my advisor, Kathy McKeown, in the
joint Columbia
University, University of Colorado AQUAINT
Open Q&A
project.
Previously, I also worked under her supervision in
the MAGIC
(Multimedia Abstract Generation for
Intensive Care) project.
I am interested in the task of content planning
performed during deep generation (i.e., generation from semantic
input, compared to generation from shallow sources, e.g., the one
that may be performed for machine translation or sumarization).
The task of content
planning involves two subtasks: content selection and ordering.
My goal is to be able to automatically learn
schemas (a particular type of content planners)
from corpora of text and data (text and knowledge
resources). The ordering problem seems approachable using
biological sequence analysis (ACL'01). Genetic algorithms proved
useful for learning schema-learning (INLG'02). My most recent work is on
using cross-entropy metrics on text clusters induced from data
cluster for learning content selection rules (EMNLP'03).
I successfully defended my Thesis
Proposal last May 1st, 2003.
|