mpalmer@linc.cis.upenn
Title: Putting Meaning into Your Trees
Time:Thursday, January 29, 11:30 - 12:30
Place:CS Conference Room in MUDD
Abstract:
The current success of applications of machine learning techniques to
tasks such as part-of-speech tagging and parsing has kindled the hope
that these same techniques might have equal or greater success in
other areas such as lexical semantics. Advances in automated and
semi-automated methods of acquiring lexical semantics would release
the field from its dependence on well-defined sub-domains and enable
broad-coverage natural language processing. However, supervised
machine learning requires large amounts of publicly available training
data, and a prerequisite for this training data is general agreement
on which elements should be tagged and with what tags. With respect
to lexical semantics, this type of general agreement has been
strikingly elusive.
A recent consensus on a task-oriented level of semantic representation to be layered on top of the existing Penn Treebank syntactic structures has been achieved. This level, know as the Proposition Bank, or PropBank, consists of argument labels for the semantic roles of individual verbs and similar predicating expressions such as participial modifiers and nominalizations. This talk will describe the PropBank verb semantic role annotation being done at Penn for both English and Chinese. The annotation process will be discussed as well as the use of existing lexical resources such as WordNet, Levin classes and VerbNet. Similar projects include the FrameNet Project at Berkeley and the Prague Tectogrammatics project. PropBank annotation is shallower than the Prague Tectogrammatics project and more broad coverage than FrameNet, in that every verb instance in the corpus has to be annotated.
The talk will also briefly describe progress in developing automatic semantic role labelers based on this training data and investigations into the role of sense distinctions in improving performance.
About the speaker: Martha Palmer is an Associate Professor in the Computer and Information Sciences Department of the University of Pennsylvania. She has been a member of the Advisory Committee for the DARPA TIDES program, the Chair of SIGLEX, the Chair of SIGHAN, and is now Vice-President elect of the Association for Computational Linguistics. Her early work on lexically based semantic interpretation formed the basis of the successful DARPA-funded message processing system, Pundit, and fostered a continuing interest in Information Extraction (ACE) and Machine Translation (TIDES). Her interest in lexical semantics and verb classes led to her involvement in SENSEVAL and the development of the English, Chinese and Korean Proposition Banks.