TEXAS: Taxonomy Extraction with Applications in Semantics

Workshop Description

Taxonomies form the backbone of knowledge-based systems by organizing knowledge in a machine interpretable manner and facilitating information integration. Hierarchical structures provide valuable input in knowledge-intensive applications such as question answering and textual entailment and are useful tools for browsing and navigation of document collections, especially when applied for exploration and discovery.

Although some taxonomies are readily available as part of language and web resources such as WordNet and Wikipedia, not all domains are covered and existing taxonomies are often too small to fully describe a domain. Automatic taxonomy extraction methods have been developed in recent years to address this problem, but issues remain in evaluation, comparison and application of extracted taxonomies [1, 2, 3, 4, 5, 6]. Depending on the application, multiple perspectives can be equally valid both in the selection of concepts and in the extraction of relations between them. This makes the resulting taxonomies difficult to compare, as they are based on different requirements. For instance, WordNet is a lexical semantic resource that is used mainly for tracking hyponymic substitution (e.g. ‘table’ can be replaced by ‘furniture’) with the main requirement of broad lexical coverage. On the other hand, subject hierarchies, such as the ACM Subject Classification, are used mainly for document collection browsing (e.g.
fine-grained topic distinction such as ‘information retrieval’ vs. ‘information extraction’) with the main requirements of comprehensibility and coherence.

The TEXAS workshop aims at addressing these issues by providing a venue for presenting and discussing approaches that evaluate taxonomy extraction [7], and its subtasks (term/concept extraction, term/concept relation discovery, taxonomy construction and cleaning) in the context of semantic applications such as: entity search, entity disambiguation and linking, information integration and summarization, knowledge acquisition, knowledge sharing, inference in NLP tasks (question answering, textual entailment), etc. In this way, progress towards automatically constructed hierarchies can be measured relative to other tasks and real-world applications.

[1] Kozareva, Zornitsa, and Eduard Hovy. “A semi-supervised method to learn and construct taxonomies using the web.” Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010.

[2] Navigli, Roberto, Paola Velardi, and Stefano Faralli. “A graph-based algorithm for inducing lexical taxonomies from scratch.” Proceedings of the Twenty-Second international joint conference on Artificial Intelligence- Volume Three. AAAI Press, 2011.

[3] Medelyan, Olena, et al. “Constructing a Focused Taxonomy from a Document Collection.” The Semantic Web: Semantics and Big Data. Springer Berlin Heidelberg, 2013. 367-381.

[4] Stoica, Emilia, Marti A. Hearst, and Megan Richardson. “Automating Creation of Hierarchical Faceted Metadata Structures.” HLT-NAACL. 2007.
[5] Wang, Wei, Payam M. Barnaghi, and Andrzej Bargiela. “Probabilistic topic models for learning terminological ontologies.” Knowledge and Data Engineering, IEEE Transactions on 22.7 (2010): 1028-1040.

[6] Paola Velardi, Stefano Faralli, Roberto Navigli: OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction. Computational Linguistics 39(3): 665-707 (2013)

[7] Elias Zavitsanos, Georgios Paliouras, George A. Vouros: Gold Standard Evaluation of Ontology Learning Methods through Ontology Transformation and Alignment. IEEE Trans. Knowl. Data Eng. 23(11): 1635-1648 (2011)


Expected research topics of relevance to the workshop:

Important Dates


Submissions should be made electronically, using the Softconf at https://www.softconf.com/emnlp2014/texas2014/.

Submissions should follow the two-column format of ACL 2014 proceedings and should not exceed 8 pages of content and one additional references page.

The LaTeX style files and the Microsoft Word style files tailored for this year’s conference are available at:

The reviewing of papers will be double-blind, so please make sure your paper shows the title, but no author information. You should likewise not have any self identifying references anywhere in the paper submitted for review. For example, rather than this: “We showed previously (Smith, 2001), …”, use citations such as: “Smith (2001) previously showed …”. References to your own work in thesis proposals should also be anonymized. You may for example write it as “in X (2000) we showed”, etc. and do not add your papers in the reference list.

Program Committee

Organizing Committee


The TEXAS workshop will be supported by the following projects: