#acl AdminGroup:read,write,delete,revert All:read #format wiki #language en = NAACL-HLT 2012 Workshop on Inducing Linguistic Structure = ------ Welcome to the homepage of the NAACL-HLT 2012 Workshop on Inducing Linguistic Structure. This workshop addresses the challenges of learning in an unsupervised or minimally supervised context with questions of linguistic structure. It encompasses many popular themes in computational linguistics and machine learning, including grammar induction, shallow syntax induction (e.g., parts of speech), learning semantics, learning the structure of documents and discourses, and learning relations within multilingual text collections. Unlike supervised settings, where annotated training data is available, unsupervised induction is considerably more difficult, both in terms of modelling and evaluation. For more information see the CallForPapers and the following links * Description of the SharedTask * The webpage for [[http://www.naaclhlt2012.org/|NAACL-HLT 2012]] ----- == Programme == The workshop will be held on Thursday June 7. Here's the tentative programme (subject to change): || 9.00-10.00 || Invited talk: '''Alex Clark''' || || 10.00-10.30 || Spotlight talks || || || Transferring Frames: Utilization of Linked Lexical Resources; ''Borin et al.'' || || || Unsupervised Induction of Frame-Semantic Representations; ''Modi et al.'' || || || Capitalization Cues Improve Dependency Grammar Induction; ''Spitkovsky et al.'' || || 10.30-11.00 || Coffee break || || 11.00-12.00 || Invited talk: '''Regina Barzilay''' || || 12.00-13.00 || Spotlight talks || || || Toward Tree Substitution Grammars with Latent Annotations; ''Ferraro et al.'' || || || Exploiting Partial Annotations with EM Training; ''Hovy and Hovy'' || || || Using Senses in HMM Word Alignment; ''Gelling and Cohn'' || || || Unsupervised Part of Speech Inference with Particle Filters; ''Dubbin and Blunsom'' || || || Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition; ''Tackström'' || || 13.00-14.15 || Lunch break || || 14.15-15.15 || Invited talk: '''Noah Smith''' || || 15.15-15.30 || Overview of PASCAL challenge (shared task); ''Gelling et al.'' || || 15.30-16.00 || Coffee break and poster session || || 16.00-17.30 || Poster session continues || All the research talks will be presented in short ''spotlight'' sessions, with 10 minute presentations back-to-back. This work will also be on display in the afternoon poster session. As well as these posters, the participants in the Grammar Induction Challenge (shared task) will be presenting their work. == Invited Talks == === Alexander Clark === ''What types of linguistic structure can be induced?'' In NLP, linguistic structure is typically taken to be data that one wishes to model: data that is accurately represented in annotated corpora like the Penn treebank; the role of the computational linguist is to recover this structure using supervised or unsupervised learning. In this talk we will claim that this view is mistaken and misleading; making the uncontroversial claim that the syntactic annotations are theoretical constructs and not data, and the more controversial claim that computational linguists should aim instead to specify well defined alternative structures that are capable of being induced efficiently. I will present two simple proposals along these lines: one at the lexical level, giving a precise analogue of part-of-speech tags, and one more controversial at the level of syntactic structure. The final question is whether these structures are capable of performing the roles that traditional syntactic structures were meant to fulfill: supporting semantic interpretation and explaining certain syntactic phenonema. ''Bio'' Alexander Clark is in the Department of Computer Science at Royal Holloway, University of London. His research interests are in grammatical inference, theoretical and mathematical linguistics and unsupervised learning. He is currently president of SIGNLL and chair of the steering committee of the ICGI; a book coauthored with Shalom Lappin, 'Linguistic Nativism and the Poverty of the Stimulus' was published by Wiley-Blackwell in 2011. === Regina Barzilay === ''Selective-Sharing for Multilingual Syntactic Transfer'' Today, we have at our disposal a significant number of linguistic annotations across many different languages. However, to achieve reliable performance on a new language, we still depend heavily on the annotations specific to that language. This limited ability to reuse annotations across languages stands in striking contrast with the unified treatment of syntactic structure given in linguistic theory. In this talk, I will put recent multilingual parsing models into the context of this unified view. I will explain some of the puzzling results in multilingual learning, such as the success of direct syntactic transfer over more sophisticated cross-lingual approaches. Finally, I will demonstrate the benefits of formulating multilingual parsing models that are consistent with this unified view and thereby can effectively leverage connections between languages. ''Bio'' Regina Barzilay is an Associate Professor in the Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory. Her research interests are in natural language processing. She is a recipient of various awards including the NSF Career Award, Microsoft Faculty Fellowship, the MIT Technology Review TR-35 Award, and best paper awards in the top NLP conferences. She serves as an associate editor of the Journal of Artificial Intelligence Research (JAIR) and is an action editor for Transactions of the Association for Computational Linguistics (TACL). === Noah Smith === ''Rethinking Inducing Linguistic Structure'' We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure. In this talk, I'll discuss some of the weaknesses of our current methodology. I'll present a new abstract framework for evaluating of NLP models in general and unsupervised structure prediction models in particular. The central idea is to make explicit certain adversarial roles among researchers, so that the different roles in an evaluation are more clearly defined and participants in all roles are offered ways to make measurable contributions to the larger goal. This framework can be instantiated in many ways, simulating some familiar intrinsic and extrinsic evaluations as well as some new evaluations. This talk is entirely based on preliminary ideas (no theoretical or experimental results) and is intended to spark discussion. ''Bio'' Noah Smith is the Finmeccanica Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science, as a Hertz Foundation Fellow, from Johns Hopkins University in 2006 and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. His research interests include statistical natural language processing, especially unsupervised methods, machine learning for structured data, and applications of natural language processing. His book, Linguistic Structure Prediction, covers many of these topics. He serves on the editorial board of the journal Computational Linguistics and the Journal of Artificial Intelligence Research and received a best paper award at the ACL 2009 conference. His research group, Noah's ARK, is supported by the NSF (including an NSF CAREER award), DARPA, Qatar NRF, IARPA, ARO, Portugal FCT, and gifts from Google, HP Labs, IBM Research, and Yahoo Research. == Accepted Papers == CAPITALIZATION CUES IMPROVE DEPENDENCY GRAMMAR INDUCTION <
> Valentin I. Spitkovsky, Hiyan Alshawi and Daniel Jurafsky EXPLOITING PARTIAL ANNOTATIONS WITH EM TRAINING <
> Dirk Hovy and Eduard Hovy NUDGING THE ENVELOPE OF DIRECT TRANSFER METHODS FOR MULTILINGUAL NAMED ENTITY RECOGNITION <
> Oscar Täckström TOWARD TREE SUBSTITUTION GRAMMARS WITH LATENT ANNOTATIONS <
> Francis Ferraro, Matt Post and Benjamin Van Durme TRANSFERRING FRAMES: UTILIZATION OF LINKED LEXICAL RESOURCES <
> Lars Borin, Markus Forsberg, Richard Johansson, Kristiina Muhonen, Tanja Purtonen and Kaarlo Voionmaa UNSUPERVISED INDUCTION OF FRAME-SEMANTIC REPRESENTATIONS <
> Ashutosh Modi, Ivan Titov and Alexandre Klementiev UNSUPERVISED PART OF SPEECH INFERENCE WITH PARTICLE FILTERS <
> Gregory Dubbin and Phil Blunsom USING SENSES IN HMM WORD ALIGNMENT <
> Douwe Gelling and Trevor Cohn == Challenge Papers == SUMMARY OF THE GRAMMAR AND POS INDUCTION CHALLLENGE <
> Douwe Gelling, Trevor Cohn, Phil Blunsom and Joao Graca Two baselines for unsupervised dependency parsing <
> Anders Søgaard Unsupervised Dependency Parsing using Reducibility and Fertility features <
> David Marecek and Zdenek Zabokrtsky Induction of Linguistic Structure with Combinatory Categorial Grammars <
> Yonatan Bisk and Julia Hockenmaier Turning the pipeline into a loop: Iterated unsupervised dependency parsing and PoS induction <
> Christos Christodoulopoulos, Sharon Goldwater and Mark Steedman Hierarchical clustering of word class distributions <
> Grzegorz Chrupała Combining the Sparsity and Unambiguity Biases for Grammar Induction <
> Kewei Tu ----- == Paper submissions == We solicit papers from many subfields of computational linguistics and language processing. Topics include, but are not limited to - grammar learning - part-of-speech and shallow syntax - learning semantic representations - inducing document and discourse structure - learning/projecting structures across multilingual corpora - relation induction across document collections - evaluation of induced representations Our aim is to bring together work on fully unsupervised methods along with minimally supervised approaches (e.g., domain adaptation and multilingual projection). The workshop will solicit short papers (6 pages of text, up to 2 pages of references) for either oral or poster presentation. We will consider allowing additional pages for accepted papers to give space to address the reviewer comments. Please follow the standard NAACL guidelines for paper formatting [[http://www.naaclhlt2012.org/downloads/download.php|(details here)]]. You can submit your papers using the following link [[https://www.softconf.com/naaclhlt2012/WILS2012/|Paper submissions]] ----- == Important Dates == === Workshop === ||Workshop Papers Submission||'''Extended to April 14'''|| ||Acceptance Notification||April 28|| ||Workshop Camera Ready||May 7|| ||NAACL-HLT 2012|| Jun 4-6, 2012|| ||Workshop||Jun 7-8, 2012|| === Shared Task === ||Training Data Released||Jan 27|| ||Test Data Release||-|| ||Test Data Submission||April 13|| ||Evaluation results released||April 23|| ||Shared Task System Description Submission||May 4|| ||NAACL-HLT 2012|| Jun 4-6, 2012|| ||Workshop||Jun 7-8, 2012|| ----- == News == ''' November 13 ''' It's come to our attention that Table 3 in the paper reported the wrong results for the Directed Accuracy (with cutoff 10 instead of no cutoff). Unfortunately this has invalidated a small part of the evaluation. A corrected paper with updated results can be found [[http://staffwww.dcs.shef.ac.uk/people/T.Cohn/wils/wils12overview.pdf|here]]. We thank Jonaton Bisk and Julia Hockenmaier for pointing us to the errors. ''' April 20 ''' Results for submitted systems and some baselines are now available (see ResultsPos for POS induction, ResultsDep for dependency induction and ResultsPosDep for joint induction. '''April 4''' Baseline and evaluations scripts are now available (see SharedTask). '''Jan 27''' Training data has been released. '''Feb 8''' Accommodation after NAACL will be tight due to an overlap with a Formula 1 event. You will need to book your hotel early so that you can attend the workshops. ----- == Organisers == * Trevor Cohn, The University of Sheffield * João Graça, Spoken Language Systems Lab, INESC-ID Lisboa * Phil Blunsom, The University of Oxford == Program Committee == * Ben Taskar - University of Pennsylvania * Percy Liang - Stanford University * Andreas Vlachos - University of Cambridge * Chris Dyer - CMU * Mark Drezde - John Hopkins * Shai Cohen - Columbia University * Kuzman Ganchev - Google Inc. * André Martins - CMU/IST Portugal * Greg Druck - Yahoo * Ryan McDonald - Google Inc. * Nathan Schneider - CMU * Partha Talukdar - CMU * Dipanjan Das - CMU * Mark Steedman - University of Edinburgh * Luke Zettlemoyer - University of Washington * Roi Reichart - MIT * David Smith - University of Massachusetts * Ivan Titov - Saarland University * Alex Clark - Royal Holloway University * Khalil Sima'an - University of Amsterdam * Stella Frank - University of Edinburgh * Oscar Täckström - Swedish Institute of Computer Science * Valentin Spitkovsky - Stanford University ----- == Registration == Conference registration is expected to open in early-to-mid March 2012. See [[http://www.naaclhlt2012.org/participants/registration.php/|NAACL-HLT 2012]] for more details.