Self

This page contains short system descriptions for all the participants in the shared task.

Yonatan Bisk

In this work we automatically induce a stochastic Combinatory Categorial Grammar (CCG) from the text. Our lexical items are the gold part-of-speech tags which we have classified as Nouns, Verbs, Punctuation and other (mostly relying on Petrov et al's universal's tag set, with the exception that we add numbers and pronouns to the nominal class). We seed the lexicon with the atomic categories N and S for noun and verb POS tags respectively, and then use the data to iteratively create new lexical entries, consisting of complex categories (of the form X/Y or X\Y, etc.), for all POS tags. We distinguish head categories, which take arguments (e.g. S/N, S\N, (S/N)/N, (S\N)\N), (S\N)/N etc.), from modifier categories (e.g. S/S, S\S, N/N, (S/S)/(S/S)). These complex categories are induced in an iterative fashion: any item with a lexical category X (which may be a complex category itself) that appears adjacent to an item with a lexical non-modifier category Y, can take Y as an argument, resulting in a new lexical category X/Y or X\Y. Modifier categories are induced in a similar fashion.

We have a small number of additional restrictions: 1) If a verb is present in the sentence, TOP must go to an S (otherwise we also allow N as a start symbol) 2) N cannot take arguments 3) Constituents cannot include half of a bracket or quotation mark (unless a pair does not exist)

We tune the use of punctuation and the choice of coarse vs fine tagsets on the development data. CCG parsing is performed with our CCG NF (Hockenmaier and Bisk, 2010) and more details of our system are present in (Bisk and Hockenmaier, 2012)

Phil Blunsom

????

Christos Christodoulopoulos

In this work we investigate how dependency information can be incorporated into an unsupervised PoS induction system by inducing both the PoS tags and the dependencies using an iterated learning method. We use BMMM (Christodoulopoulos et al., 2011), a PoS induction system that allows us to easily incorporate the dependency features as multinomial distributions, and the original DMV model (Klein and Manning, 2004) for inducing the dependency structures with unsupervised PoS tags. The iterated learning method works as follows: we run the original BMMM PoS inducer (without dependency features), using the output as input to the DMV parser. We then re-train the PoS inducer using the induced dependencies as additional features, use the new PoS tags to retrain the parser, and so forth for 5 iterations. Both systems are fully unsupervised; only raw text is used as input---the BMMM system also uses morphology segmentation features obtained from Morfessor (Creutz and Lagus, 2005).

Grzegorz Chrupala

The unsupervised POS tagging method used in this submission consists of two components. The first one is the word class induction approach using Latent Dirichlet Allocation proposed in [1]. As one of the outputs of this first stage we obtain for each word type a probability distribution over classes. In the second stage we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the Jensen-Shannon divergence between the probability distributions over classes associated with each cluster. When assigning POS tags, we find the tree leaf most similar to the current word and use the prefix of the path leading to this leaf as the tag. We tune the number of classes and prefix length based on the development data, for both coarse-grained and fine-grained POS inventories. Otherwise the system is entirely unsupervised and uses no resources other than raw word forms in the provided data files.

João Graça

???

David Mareček

Our approach is based on dependency model that consists of three submodels (i) edge model (similar to P_CHOOSE in DMV), (ii) fertility model (modeling number of children for a given head), and (iii) reducibility model. Fertility model utilizes the observation that fertility of function words (typically the most frequent words in the corpus) is more determined than fertility of content (less frequent) words. The reducibility is a feature of individual part-of-speech tags. We compute it based on reducibility of words in a large unannotated corpus (we used Wikipedia aticles). A word is reducible, if the sentence after removing the word remains grammatically correct. The grammaticality of such newly created sentences is tested by searching for it in the corpus.

The inference itself was done on the test corpus using Gibbs sampling method. Three hyperparameters were tuned on English Penn Treebank with the fine-grained POS tags (5th column in the given CoNLL format). For parsing other languages and for all types of tags (CPOS, POS, UPOS), we used the same parameter setting.

The only additional data (not provided by organizers) are the unannotated monolingual Wikipedia text. (They were automatically POS tagged by TnT tagger trained on the provided corpora.)

Anders Søgaard

Only references....

Kewei Tu

Our system incorporates two types of inductive biases, the sparsity bias and the unambiguity bias, for dependency structure induction. The sparsity bias favors a grammar with fewer grammar rules. We employ two types of sparsity biases: parameter sparsity induced by Dirichlet priors over rule probabilities and a particular structure sparsity induced using posterior regularization (Gillenwater et al. (2010)). The unambiguity bias favors a grammar that leads to unambiguous parses, which is motivated by the observation that natural language is remarkably unambiguous in the sense that the number of plausible parses of a natural language sentence is very small. We derive an approach named unambiguity regularization to induce unambiguity based on the posterior regularization framework (Tu et al. (2012)). To combine Dirichlet priors with unambiguity regularization, we derive a mean-field variational inference algorithm. To combine the sparsity-inducing posterior regularization approach with unambiguity regularization, we employ a simplistic approach that optimizes the two regularization terms separately.

None: SysDescriptions (last edited 2012-04-26 15:06:32 by TrevorCohn)