#acl AdminGroup:read,write,delete,revert All:read
#format wiki
#language en

= NAACL-HLT 2012 Workshop on Inducing Linguistic Structure =
------

Welcome to the homepage of the NAACL-HLT 2012 Workshop on Inducing Linguistic Structure. This workshop addresses the challenges of learning in an unsupervised or minimally supervised context with questions of linguistic structure. It encompasses many popular themes in computational linguistics and machine learning, including grammar induction, shallow syntax induction (e.g., parts of speech), learning semantics, learning the structure of documents and discourses, and learning relations within multilingual text collections. Unlike supervised settings, where annotated training data is available, unsupervised induction is considerably more difficult, both in terms of modelling and evaluation. 

For more information see the CallForPapers and the following links
 * Description of the SharedTask
 * The webpage for [[http://www.naaclhlt2012.org/|NAACL-HLT 2012]]

-----
== Programme ==

The workshop will be held on Thursday June 7. Here's the tentative programme (subject to change):

|| 9.00-10.00   || Invited talk: '''Alex Clark''' ||
|| 10.00-10.30  || Spotlight talks ||
|| || Transferring Frames: Utilization of Linked Lexical Resources; ''Borin et al.'' ||
|| || Unsupervised Induction of Frame-Semantic Representations; ''Modi et al.'' ||
|| || Capitalization Cues Improve Dependency Grammar Induction; ''Spitkovsky et al.'' ||
|| 10.30-11.00  || Coffee break ||
|| 11.00-12.00  || Invited talk: '''Regina Barzilay''' ||
|| 12.00-13.00  || Spotlight talks ||
|| || Toward Tree Substitution Grammars with Latent Annotations; ''Ferraro et al.'' ||
|| || Exploiting Partial Annotations with EM Training; ''Hovy and Hovy'' ||
|| || Using Senses in HMM Word Alignment; ''Gelling and Cohn'' ||
|| || Unsupervised Part of Speech Inference with Particle Filters; ''Dubbin and Blunsom'' ||
|| || Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition; ''Tackström'' ||
|| 13.00-14.15  || Lunch break ||
|| 14.15-15.15  || Invited talk: '''Noah Smith''' ||
|| 15.15-15.30  || Overview of PASCAL challenge (shared task); ''Gelling et al.'' ||
|| 15.30-16.00  || Coffee break and poster session ||
|| 16.00-17.30  || Poster session continues ||

All the research talks will be presented in short ''spotlight'' sessions, with 10 minute presentations back-to-back.
This work will also be on display in the afternoon poster session. As well as these posters, the participants
in the Grammar Induction Challenge (shared task) will be presenting their work.

== Invited Talks ==

=== Alexander Clark ===
''What types of linguistic structure can be induced?''

In NLP, linguistic structure is typically taken to be data that one
wishes to model: data that is accurately represented  in annotated
corpora like the Penn treebank; the role of the computational linguist
is to recover this structure using supervised or unsupervised
learning. In this talk we will claim that this view is mistaken and
misleading; making the uncontroversial claim that the syntactic
annotations are theoretical constructs and not data, and the more
controversial claim that computational linguists should aim instead to
specify well defined alternative structures that are capable of being
induced efficiently.

I will present two simple proposals along these lines: one at the
lexical level, giving a precise analogue of part-of-speech tags, and
one more controversial at the level of syntactic structure. The final
question is whether these structures are capable of performing the
roles that traditional syntactic structures were meant to fulfill:
supporting semantic interpretation and explaining certain syntactic
phenonema.

''Bio''

Alexander Clark is in the Department of Computer Science at Royal
Holloway, University of London. His research interests are in
grammatical inference, theoretical and mathematical linguistics and
unsupervised learning. He is currently president of SIGNLL and chair
of the steering committee of the ICGI; a book coauthored with Shalom
Lappin, 'Linguistic Nativism and the Poverty of the Stimulus' was
published by Wiley-Blackwell in 2011.

=== Regina Barzilay ===
''Selective-Sharing for Multilingual Syntactic Transfer''

Today, we have at our disposal a significant number of linguistic
annotations across many different languages. However, to achieve
reliable performance on a new language, we still depend heavily on the
annotations specific to that language. This limited ability to reuse
annotations across languages stands in striking contrast with the
unified treatment of syntactic structure given in linguistic
theory. In this talk, I will put recent multilingual parsing models
into the context of this unified view.  I will explain some of the
puzzling results in multilingual learning, such as the success of
direct syntactic transfer over more sophisticated cross-lingual
approaches. Finally, I will demonstrate the benefits of formulating
multilingual parsing models that are consistent with this unified view
and thereby can effectively leverage connections between languages.

''Bio''

Regina Barzilay is an Associate Professor in the Department of
Electrical Engineering and Computer Science and a member of the Computer
Science and Artificial Intelligence Laboratory. Her research interests
are in natural language processing. She is a recipient of various awards
including the NSF Career Award, Microsoft Faculty Fellowship, the MIT
Technology Review TR-35 Award, and best paper awards in the top NLP
conferences. She serves as an associate editor of the Journal of
Artificial Intelligence Research (JAIR) and is an action editor for
Transactions of the Association for Computational Linguistics (TACL).


=== Noah Smith ===
''Rethinking Inducing Linguistic Structure''

We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure.  In this talk, I'll discuss some of the weaknesses of our current methodology.  I'll present a new abstract framework for evaluating of NLP models in general and unsupervised structure prediction models in particular.  The central idea is to make explicit certain adversarial roles among researchers, so that the different roles in an evaluation are more clearly defined and participants in all roles are offered ways to make measurable contributions to the larger goal.  This framework can be instantiated in many ways, simulating some familiar intrinsic and extrinsic evaluations as well as some new evaluations.  This talk is entirely based on preliminary ideas (no theoretical or experimental results) and is intended to spark discussion.

''Bio''

Noah Smith is the Finmeccanica Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science, as a Hertz Foundation Fellow, from Johns Hopkins University in 2006 and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. His research interests include statistical natural language processing, especially unsupervised methods, machine learning for structured data, and applications of natural language processing. His book, Linguistic Structure Prediction, covers many of these topics. He serves on the editorial board of the journal Computational Linguistics and the Journal of Artificial Intelligence Research and received a best paper award at the ACL 2009 conference. His research group, Noah's ARK, is supported by the NSF (including an NSF CAREER award), DARPA, Qatar NRF, IARPA, ARO, Portugal FCT, and gifts from Google, HP Labs, IBM Research, and Yahoo Research.

== Accepted Papers ==

CAPITALIZATION CUES IMPROVE DEPENDENCY GRAMMAR INDUCTION  <<BR>>
Valentin I. Spitkovsky, Hiyan Alshawi and Daniel Jurafsky

EXPLOITING PARTIAL ANNOTATIONS WITH EM TRAINING  <<BR>>
Dirk Hovy and Eduard Hovy

NUDGING THE ENVELOPE OF DIRECT TRANSFER METHODS FOR MULTILINGUAL NAMED ENTITY RECOGNITION  <<BR>>
Oscar Täckström

TOWARD TREE SUBSTITUTION GRAMMARS WITH LATENT ANNOTATIONS  <<BR>>
Francis Ferraro, Matt Post and Benjamin Van Durme

TRANSFERRING FRAMES: UTILIZATION OF LINKED LEXICAL RESOURCES <<BR>>
Lars Borin, Markus Forsberg, Richard Johansson, Kristiina Muhonen, Tanja Purtonen and Kaarlo Voionmaa

UNSUPERVISED INDUCTION OF FRAME-SEMANTIC REPRESENTATIONS <<BR>>
Ashutosh Modi, Ivan Titov and Alexandre Klementiev

UNSUPERVISED PART OF SPEECH INFERENCE WITH PARTICLE FILTERS <<BR>>
Gregory Dubbin and Phil Blunsom

USING SENSES IN HMM WORD ALIGNMENT <<BR>>
Douwe Gelling and Trevor Cohn

== Challenge Papers ==

SUMMARY OF THE GRAMMAR AND POS INDUCTION CHALLLENGE  <<BR>>
Douwe Gelling, Trevor Cohn, Phil Blunsom and Joao Graca

Two baselines for unsupervised dependency parsing <<BR>>
Anders Søgaard

Unsupervised Dependency Parsing using Reducibility and Fertility features <<BR>>
David Marecek and Zdenek Zabokrtsky

Induction of Linguistic Structure with Combinatory Categorial Grammars <<BR>>
Yonatan Bisk and Julia Hockenmaier

Turning the pipeline into a loop: Iterated unsupervised dependency parsing and PoS induction <<BR>>
Christos Christodoulopoulos, Sharon Goldwater and Mark Steedman

Hierarchical clustering of word class distributions <<BR>>
Grzegorz Chrupała

Combining the Sparsity and Unambiguity Biases for Grammar Induction <<BR>>
Kewei Tu

-----
== Paper submissions ==

We solicit papers from many subfields of computational linguistics and language processing. Topics include, but
are not limited to
- grammar learning
- part-of-speech and shallow syntax
- learning semantic representations
- inducing document and discourse structure
- learning/projecting structures across multilingual corpora
- relation induction across document collections
-  evaluation of induced representations
Our aim is to bring together work on fully unsupervised methods along
with minimally supervised approaches (e.g., domain adaptation and
multilingual projection).

The workshop will solicit short papers (6 pages of text, up to 2 pages of references) for either oral or
poster presentation. We will consider allowing additional pages for
accepted papers to give space to address the reviewer comments.
Please follow the standard NAACL guidelines for paper
formatting [[http://www.naaclhlt2012.org/downloads/download.php|(details here)]]. 
You can submit your papers using the following link

[[https://www.softconf.com/naaclhlt2012/WILS2012/|Paper submissions]]

-----
== Important Dates ==

=== Workshop ===
||Workshop Papers Submission||'''Extended to April 14'''||
||Acceptance Notification||April 28||
||Workshop Camera Ready||May 7||
||NAACL-HLT 2012|| Jun 4-6, 2012||
||Workshop||Jun 7-8, 2012||

=== Shared Task ===
||Training Data Released||Jan 27||
||Test Data Release||-||
||Test Data Submission||April 13||
||Evaluation results released||April 23||
||Shared Task System Description Submission||May 4||
||NAACL-HLT 2012|| Jun 4-6, 2012||
||Workshop||Jun 7-8, 2012||
-----

== News ==

''' November 13 ''' It's come to our attention that Table 3 in the paper reported the wrong results for the Directed Accuracy (with cutoff 10 instead of no cutoff). Unfortunately this has invalidated a small part of the evaluation. A corrected paper with updated results can be found [[http://staffwww.dcs.shef.ac.uk/people/T.Cohn/wils/wils12overview.pdf|here]]. We thank Jonaton Bisk and Julia Hockenmaier for pointing us to the errors.

''' April 20 ''' Results for submitted systems and some baselines are now available (see ResultsPos for POS induction, ResultsDep for dependency induction and ResultsPosDep for joint induction.

'''April 4''' Baseline and evaluations scripts are now available (see SharedTask).

'''Jan 27''' Training data has been released.

'''Feb 8''' Accommodation after NAACL will be tight due to an overlap with a Formula 1 event. You will need to book your hotel early so that you can attend the workshops. 

-----

== Organisers ==
 * Trevor Cohn, The University of Sheffield
 * João Graça, Spoken Language Systems Lab, INESC-ID Lisboa
 * Phil Blunsom, The University of Oxford

== Program Committee ==
 * Ben Taskar - University of Pennsylvania
 * Percy Liang - Stanford University
 * Andreas Vlachos - University of Cambridge
 * Chris Dyer - CMU
 * Mark Drezde - John Hopkins
 * Shai Cohen - Columbia University
 * Kuzman Ganchev - Google Inc.
 * André Martins - CMU/IST Portugal
 * Greg Druck - Yahoo
 * Ryan McDonald - Google Inc.
 * Nathan Schneider - CMU
 * Partha Talukdar - CMU
 * Dipanjan Das - CMU
 * Mark Steedman - University of Edinburgh
 * Luke Zettlemoyer - University of Washington
 * Roi Reichart - MIT
 * David Smith - University of Massachusetts
 * Ivan Titov - Saarland University
 * Alex Clark - Royal Holloway University
 * Khalil Sima'an - University of Amsterdam
 * Stella Frank - University of Edinburgh
 * Oscar Täckström - Swedish Institute of Computer Science
 * Valentin Spitkovsky - Stanford University 
-----

== Registration ==
Conference registration is expected to open in early-to-mid March 2012. See [[http://www.naaclhlt2012.org/participants/registration.php/|NAACL-HLT 2012]] for more details.