NAACL-HLT 2012 Workshop on Inducing Linguistic Structure


Welcome to the homepage of the NAACL-HLT 2012 Workshop on Inducing Linguistic Structure. This workshop addresses the challenges of learning in an unsupervised or minimally supervised context with questions of linguistic structure. It encompasses many popular themes in computational linguistics and machine learning, including grammar induction, shallow syntax induction (e.g., parts of speech), learning semantics, learning the structure of documents and discourses, and learning relations within multilingual text collections. Unlike supervised settings, where annotated training data is available, unsupervised induction is considerably more difficult, both in terms of modelling and evaluation.

For more information see the CallForPapers and the following links


Programme

The workshop will be held on Thursday June 7. Here's the tentative programme (subject to change):

9.00-10.00

Invited talk: Alex Clark

10.00-10.30

Spotlight talks

Transferring Frames: Utilization of Linked Lexical Resources; Borin et al.

Unsupervised Induction of Frame-Semantic Representations; Modi et al.

Capitalization Cues Improve Dependency Grammar Induction; Spitkovsky et al.

10.30-11.00

Coffee break

11.00-12.00

Invited talk: Regina Barzilay

12.00-13.00

Spotlight talks

Toward Tree Substitution Grammars with Latent Annotations; Ferraro et al.

Exploiting Partial Annotations with EM Training; Hovy and Hovy

Using Senses in HMM Word Alignment; Gelling and Cohn

Unsupervised Part of Speech Inference with Particle Filters; Dubbin and Blunsom

Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition; Tackström

13.00-14.15

Lunch break

14.15-15.15

Invited talk: Noah Smith

15.15-15.30

Overview of PASCAL challenge (shared task); Gelling et al.

15.30-16.00

Coffee break and poster session

16.00-17.30

Poster session continues

All the research talks will be presented in short spotlight sessions, with 10 minute presentations back-to-back. This work will also be on display in the afternoon poster session. As well as these posters, the participants in the Grammar Induction Challenge (shared task) will be presenting their work.

Invited Talks

Alexander Clark

What types of linguistic structure can be induced?

In NLP, linguistic structure is typically taken to be data that one wishes to model: data that is accurately represented in annotated corpora like the Penn treebank; the role of the computational linguist is to recover this structure using supervised or unsupervised learning. In this talk we will claim that this view is mistaken and misleading; making the uncontroversial claim that the syntactic annotations are theoretical constructs and not data, and the more controversial claim that computational linguists should aim instead to specify well defined alternative structures that are capable of being induced efficiently.

I will present two simple proposals along these lines: one at the lexical level, giving a precise analogue of part-of-speech tags, and one more controversial at the level of syntactic structure. The final question is whether these structures are capable of performing the roles that traditional syntactic structures were meant to fulfill: supporting semantic interpretation and explaining certain syntactic phenonema.

Bio

Alexander Clark is in the Department of Computer Science at Royal Holloway, University of London. His research interests are in grammatical inference, theoretical and mathematical linguistics and unsupervised learning. He is currently president of SIGNLL and chair of the steering committee of the ICGI; a book coauthored with Shalom Lappin, 'Linguistic Nativism and the Poverty of the Stimulus' was published by Wiley-Blackwell in 2011.

Regina Barzilay

Selective-Sharing for Multilingual Syntactic Transfer

Today, we have at our disposal a significant number of linguistic annotations across many different languages. However, to achieve reliable performance on a new language, we still depend heavily on the annotations specific to that language. This limited ability to reuse annotations across languages stands in striking contrast with the unified treatment of syntactic structure given in linguistic theory. In this talk, I will put recent multilingual parsing models into the context of this unified view. I will explain some of the puzzling results in multilingual learning, such as the success of direct syntactic transfer over more sophisticated cross-lingual approaches. Finally, I will demonstrate the benefits of formulating multilingual parsing models that are consistent with this unified view and thereby can effectively leverage connections between languages.

Bio

Regina Barzilay is an Associate Professor in the Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory. Her research interests are in natural language processing. She is a recipient of various awards including the NSF Career Award, Microsoft Faculty Fellowship, the MIT Technology Review TR-35 Award, and best paper awards in the top NLP conferences. She serves as an associate editor of the Journal of Artificial Intelligence Research (JAIR) and is an action editor for Transactions of the Association for Computational Linguistics (TACL).

Noah Smith

Rethinking Inducing Linguistic Structure

We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure. In this talk, I'll discuss some of the weaknesses of our current methodology. I'll present a new abstract framework for evaluating of NLP models in general and unsupervised structure prediction models in particular. The central idea is to make explicit certain adversarial roles among researchers, so that the different roles in an evaluation are more clearly defined and participants in all roles are offered ways to make measurable contributions to the larger goal. This framework can be instantiated in many ways, simulating some familiar intrinsic and extrinsic evaluations as well as some new evaluations. This talk is entirely based on preliminary ideas (no theoretical or experimental results) and is intended to spark discussion.

Bio

Noah Smith is the Finmeccanica Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science, as a Hertz Foundation Fellow, from Johns Hopkins University in 2006 and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. His research interests include statistical natural language processing, especially unsupervised methods, machine learning for structured data, and applications of natural language processing. His book, Linguistic Structure Prediction, covers many of these topics. He serves on the editorial board of the journal Computational Linguistics and the Journal of Artificial Intelligence Research and received a best paper award at the ACL 2009 conference. His research group, Noah's ARK, is supported by the NSF (including an NSF CAREER award), DARPA, Qatar NRF, IARPA, ARO, Portugal FCT, and gifts from Google, HP Labs, IBM Research, and Yahoo Research.

Accepted Papers

CAPITALIZATION CUES IMPROVE DEPENDENCY GRAMMAR INDUCTION
Valentin I. Spitkovsky, Hiyan Alshawi and Daniel Jurafsky

EXPLOITING PARTIAL ANNOTATIONS WITH EM TRAINING
Dirk Hovy and Eduard Hovy

NUDGING THE ENVELOPE OF DIRECT TRANSFER METHODS FOR MULTILINGUAL NAMED ENTITY RECOGNITION
Oscar Täckström

TOWARD TREE SUBSTITUTION GRAMMARS WITH LATENT ANNOTATIONS
Francis Ferraro, Matt Post and Benjamin Van Durme

TRANSFERRING FRAMES: UTILIZATION OF LINKED LEXICAL RESOURCES
Lars Borin, Markus Forsberg, Richard Johansson, Kristiina Muhonen, Tanja Purtonen and Kaarlo Voionmaa

UNSUPERVISED INDUCTION OF FRAME-SEMANTIC REPRESENTATIONS
Ashutosh Modi, Ivan Titov and Alexandre Klementiev

UNSUPERVISED PART OF SPEECH INFERENCE WITH PARTICLE FILTERS
Gregory Dubbin and Phil Blunsom

USING SENSES IN HMM WORD ALIGNMENT
Douwe Gelling and Trevor Cohn

Challenge Papers

SUMMARY OF THE GRAMMAR AND POS INDUCTION CHALLLENGE
Douwe Gelling, Trevor Cohn, Phil Blunsom and Joao Graca

Two baselines for unsupervised dependency parsing
Anders Søgaard

Unsupervised Dependency Parsing using Reducibility and Fertility features
David Marecek and Zdenek Zabokrtsky

Induction of Linguistic Structure with Combinatory Categorial Grammars
Yonatan Bisk and Julia Hockenmaier

Turning the pipeline into a loop: Iterated unsupervised dependency parsing and PoS induction
Christos Christodoulopoulos, Sharon Goldwater and Mark Steedman

Hierarchical clustering of word class distributions
Grzegorz Chrupała

Combining the Sparsity and Unambiguity Biases for Grammar Induction
Kewei Tu


Paper submissions

We solicit papers from many subfields of computational linguistics and language processing. Topics include, but are not limited to - grammar learning - part-of-speech and shallow syntax - learning semantic representations - inducing document and discourse structure - learning/projecting structures across multilingual corpora - relation induction across document collections - evaluation of induced representations Our aim is to bring together work on fully unsupervised methods along with minimally supervised approaches (e.g., domain adaptation and multilingual projection).

The workshop will solicit short papers (6 pages of text, up to 2 pages of references) for either oral or poster presentation. We will consider allowing additional pages for accepted papers to give space to address the reviewer comments. Please follow the standard NAACL guidelines for paper formatting (details here). You can submit your papers using the following link

Paper submissions


Important Dates

Workshop

Workshop Papers Submission

Extended to April 14

Acceptance Notification

April 28

Workshop Camera Ready

May 7

NAACL-HLT 2012

Jun 4-6, 2012

Workshop

Jun 7-8, 2012

Shared Task

Training Data Released

Jan 27

Test Data Release

-

Test Data Submission

April 13

Evaluation results released

April 23

Shared Task System Description Submission

May 4

NAACL-HLT 2012

Jun 4-6, 2012

Workshop

Jun 7-8, 2012


News

November 13 It's come to our attention that Table 3 in the paper reported the wrong results for the Directed Accuracy (with cutoff 10 instead of no cutoff). Unfortunately this has invalidated a small part of the evaluation. A corrected paper with updated results can be found here. We thank Jonaton Bisk and Julia Hockenmaier for pointing us to the errors.

April 20 Results for submitted systems and some baselines are now available (see ResultsPos for POS induction, ResultsDep for dependency induction and ResultsPosDep for joint induction.

April 4 Baseline and evaluations scripts are now available (see SharedTask).

Jan 27 Training data has been released.

Feb 8 Accommodation after NAACL will be tight due to an overlap with a Formula 1 event. You will need to book your hotel early so that you can attend the workshops.


Organisers

Program Committee


Registration

Conference registration is expected to open in early-to-mid March 2012. See NAACL-HLT 2012 for more details.

None: InducingLinguisticStructure (last edited 2012-11-13 13:53:37 by TrevorCohn)