NAACL-HLT 2012 Workshop on Inducing Linguistic Structure
Welcome to the homepage of the NAACL-HLT 2012 Workshop on Inducing Linguistic Structure. This workshop addresses the challenges of learning in an unsupervised or minimally supervised context with questions of linguistic structure. It encompasses many popular themes in computational linguistics and machine learning, including grammar induction, shallow syntax induction (e.g., parts of speech), learning semantics, learning the structure of documents and discourses, and learning relations within multilingual text collections. Unlike supervised settings, where annotated training data is available, unsupervised induction is considerably more difficult, both in terms of modelling and evaluation.
For more information see the CallForPapers and the following links
Description of the SharedTask
The webpage for NAACL-HLT 2012
Programme
The workshop will be held on Thursday June 7. Here's the tentative programme (subject to change):
9.00-10.00 |
Invited talk: Alex Clark |
10.00-10.30 |
Spotlight talks |
|
Transferring Frames: Utilization of Linked Lexical Resources; Borin et al. |
|
Unsupervised Induction of Frame-Semantic Representations; Modi et al. |
|
Capitalization Cues Improve Dependency Grammar Induction; Spitkovsky et al. |
10.30-11.00 |
Coffee break |
11.00-12.00 |
Invited talk: Regina Barzilay |
12.00-13.00 |
Spotlight talks |
|
Toward Tree Substitution Grammars with Latent Annotations; Ferraro et al. |
|
Exploiting Partial Annotations with EM Training; Hovy and Hovy |
|
Using Senses in HMM Word Alignment; Gelling and Cohn |
|
Unsupervised Part of Speech Inference with Particle Filters; Dubbin and Blunsom |
|
Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition; Tackström |
13.00-14.15 |
Lunch break |
14.15-15.15 |
Invited talk: Noah Smith |
15.15-15.30 |
Overview of PASCAL challenge (shared task); Gelling et al. |
15.30-16.00 |
Coffee break and poster session |
16.00-17.30 |
Poster session continues |
All the research talks will be presented in short spotlight sessions, with 10 minute presentations back-to-back. This work will also be on display in the afternoon poster session. As well as these posters, the participants in the Grammar Induction Challenge (shared task) will be presenting their work.
Invited Talks
Alexander Clark
What types of linguistic structure can be induced?
In NLP, linguistic structure is typically taken to be data that one wishes to model: data that is accurately represented in annotated corpora like the Penn treebank; the role of the computational linguist is to recover this structure using supervised or unsupervised learning. In this talk we will claim that this view is mistaken and misleading; making the uncontroversial claim that the syntactic annotations are theoretical constructs and not data, and the more controversial claim that computational linguists should aim instead to specify well defined alternative structures that are capable of being induced efficiently.
I will present two simple proposals along these lines: one at the lexical level, giving a precise analogue of part-of-speech tags, and one more controversial at the level of syntactic structure. The final question is whether these structures are capable of performing the roles that traditional syntactic structures were meant to fulfill: supporting semantic interpretation and explaining certain syntactic phenonema.
Bio
Alexander Clark is in the Department of Computer Science at Royal Holloway, University of London. His research interests are in grammatical inference, theoretical and mathematical linguistics and unsupervised learning. He is currently president of SIGNLL and chair of the steering committee of the ICGI; a book coauthored with Shalom Lappin, 'Linguistic Nativism and the Poverty of the Stimulus' was published by Wiley-Blackwell in 2011.
Regina Barzilay
Selective-Sharing for Multilingual Syntactic Transfer
Today, we have at our disposal a significant number of linguistic annotations across many different languages. However, to achieve reliable performance on a new language, we still depend heavily on the annotations specific to that language. This limited ability to reuse annotations across languages stands in striking contrast with the unified treatment of syntactic structure given in linguistic theory. In this talk, I will put recent multilingual parsing models into the context of this unified view. I will explain some of the puzzling results in multilingual learning, such as the success of direct syntactic transfer over more sophisticated cross-lingual approaches. Finally, I will demonstrate the benefits of formulating multilingual parsing models that are consistent with this unified view and thereby can effectively leverage connections between languages.
Bio
Regina Barzilay is an Associate Professor in the Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory. Her research interests are in natural language processing. She is a recipient of various awards including the NSF Career Award, Microsoft Faculty Fellowship, the MIT Technology Review TR-35 Award, and best paper awards in the top NLP conferences. She serves as an associate editor of the Journal of Artificial Intelligence Research (JAIR) and is an action editor for Transactions of the Association for Computational Linguistics (TACL).
Noah Smith
Rethinking Inducing Linguistic Structure
We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure. In this talk, I'll discuss some of the weaknesses of our current methodology. I'll present a new abstract framework for evaluating of NLP models in general and unsupervised structure prediction models in particular. The central idea is to make explicit certain adversarial roles among researchers, so that the different roles in an evaluation are more clearly defined and participants in all roles are offered ways to make measurable contributions to the larger goal. This framework can be instantiated in many ways, simulating some familiar intrinsic and extrinsic evaluations as well as some new evaluations. This talk is entirely based on preliminary ideas (no theoretical or experimental results) and is intended to spark discussion.
Bio
Noah Smith is the Finmeccanica Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science, as a Hertz Foundation Fellow, from Johns Hopkins University in 2006 and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. His research interests include statistical natural language processing, especially unsupervised methods, machine learning for structured data, and applications of natural language processing. His book, Linguistic Structure Prediction, covers many of these topics. He serves on the editorial board of the journal Computational Linguistics and the Journal of Artificial Intelligence Research and received a best paper award at the ACL 2009 conference. His research group, Noah's ARK, is supported by the NSF (including an NSF CAREER award), DARPA, Qatar NRF, IARPA, ARO, Portugal FCT, and gifts from Google, HP Labs, IBM Research, and Yahoo Research.
Accepted Papers
CAPITALIZATION CUES IMPROVE DEPENDENCY GRAMMAR INDUCTION
Valentin I. Spitkovsky, Hiyan Alshawi and Daniel Jurafsky
EXPLOITING PARTIAL ANNOTATIONS WITH EM TRAINING
Dirk Hovy and Eduard Hovy
NUDGING THE ENVELOPE OF DIRECT TRANSFER METHODS FOR MULTILINGUAL NAMED ENTITY RECOGNITION
Oscar Täckström
TOWARD TREE SUBSTITUTION GRAMMARS WITH LATENT ANNOTATIONS
Francis Ferraro, Matt Post and Benjamin Van Durme
TRANSFERRING FRAMES: UTILIZATION OF LINKED LEXICAL RESOURCES
Lars Borin, Markus Forsberg, Richard Johansson, Kristiina Muhonen, Tanja Purtonen and Kaarlo Voionmaa
UNSUPERVISED INDUCTION OF FRAME-SEMANTIC REPRESENTATIONS
Ashutosh Modi, Ivan Titov and Alexandre Klementiev
UNSUPERVISED PART OF SPEECH INFERENCE WITH PARTICLE FILTERS
Gregory Dubbin and Phil Blunsom
USING SENSES IN HMM WORD ALIGNMENT
Douwe Gelling and Trevor Cohn
Challenge Papers
SUMMARY OF THE GRAMMAR AND POS INDUCTION CHALLLENGE
Douwe Gelling, Trevor Cohn, Phil Blunsom and Joao Graca
Two baselines for unsupervised dependency parsing
Anders Søgaard
Unsupervised Dependency Parsing using Reducibility and Fertility features
David Marecek and Zdenek Zabokrtsky
Induction of Linguistic Structure with Combinatory Categorial Grammars
Yonatan Bisk and Julia Hockenmaier
Turning the pipeline into a loop: Iterated unsupervised dependency parsing and PoS induction
Christos Christodoulopoulos, Sharon Goldwater and Mark Steedman
Hierarchical clustering of word class distributions
Grzegorz Chrupała
Combining the Sparsity and Unambiguity Biases for Grammar Induction
Kewei Tu
Paper submissions
We solicit papers from many subfields of computational linguistics and language processing. Topics include, but are not limited to - grammar learning - part-of-speech and shallow syntax - learning semantic representations - inducing document and discourse structure - learning/projecting structures across multilingual corpora - relation induction across document collections - evaluation of induced representations Our aim is to bring together work on fully unsupervised methods along with minimally supervised approaches (e.g., domain adaptation and multilingual projection).
The workshop will solicit short papers (6 pages of text, up to 2 pages of references) for either oral or poster presentation. We will consider allowing additional pages for accepted papers to give space to address the reviewer comments. Please follow the standard NAACL guidelines for paper formatting (details here). You can submit your papers using the following link
Important Dates
Workshop
Workshop Papers Submission |
Extended to April 14 |
Acceptance Notification |
April 28 |
Workshop Camera Ready |
May 7 |
NAACL-HLT 2012 |
Jun 4-6, 2012 |
Workshop |
Jun 7-8, 2012 |
Shared Task
Training Data Released |
Jan 27 |
Test Data Release |
- |
Test Data Submission |
April 13 |
Evaluation results released |
April 23 |
Shared Task System Description Submission |
May 4 |
NAACL-HLT 2012 |
Jun 4-6, 2012 |
Workshop |
Jun 7-8, 2012 |
News
November 13 It's come to our attention that Table 3 in the paper reported the wrong results for the Directed Accuracy (with cutoff 10 instead of no cutoff). Unfortunately this has invalidated a small part of the evaluation. A corrected paper with updated results can be found here. We thank Jonaton Bisk and Julia Hockenmaier for pointing us to the errors.
April 20 Results for submitted systems and some baselines are now available (see ResultsPos for POS induction, ResultsDep for dependency induction and ResultsPosDep for joint induction.
April 4 Baseline and evaluations scripts are now available (see SharedTask).
Jan 27 Training data has been released.
Feb 8 Accommodation after NAACL will be tight due to an overlap with a Formula 1 event. You will need to book your hotel early so that you can attend the workshops.
Organisers
- Trevor Cohn, The University of Sheffield
- João Graça, Spoken Language Systems Lab, INESC-ID Lisboa
- Phil Blunsom, The University of Oxford
Program Committee
- Ben Taskar - University of Pennsylvania
- Percy Liang - Stanford University
- Andreas Vlachos - University of Cambridge
- Chris Dyer - CMU
- Mark Drezde - John Hopkins
- Shai Cohen - Columbia University
- Kuzman Ganchev - Google Inc.
- André Martins - CMU/IST Portugal
- Greg Druck - Yahoo
Ryan McDonald - Google Inc.
- Nathan Schneider - CMU
- Partha Talukdar - CMU
- Dipanjan Das - CMU
- Mark Steedman - University of Edinburgh
- Luke Zettlemoyer - University of Washington
- Roi Reichart - MIT
- David Smith - University of Massachusetts
- Ivan Titov - Saarland University
- Alex Clark - Royal Holloway University
- Khalil Sima'an - University of Amsterdam
- Stella Frank - University of Edinburgh
- Oscar Täckström - Swedish Institute of Computer Science
- Valentin Spitkovsky - Stanford University
Registration
Conference registration is expected to open in early-to-mid March 2012. See NAACL-HLT 2012 for more details.