Multilingual extraction and editing of concept strings for the legal domain
Abstract
Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an important NLP task, as it allows web search engines to define and perform semantic queries over large collection of documents. Existing web search engines in the legal domain are mainly limited to keyword search, in which the query word is matched against the textual content of the documents. This paper presents a novel framework named the Concept Strings Framework that makes use of CSs for representing the content of the documents, and for allowing semantic search over them. These CSs can consist of individual knowledge base (KB) concepts (e.g. WordNet concepts) or combination of them. In addition, this paper presents an interactive web-based toolkit, called the Template Editor that enables the creation, editing and evaluation of CSs. Experiments on two publicly available legislation websites show satisfactory results.
Keywords
Full Text:
PDFReferences
A. Edmonds. Using concept structures for efficient document comparison and location. In Proceedings of IEEE Symposium on Computational Intelligence and Data Mining, 2007.
C. Soria, R. Bartolini, A. Lenci, S. Montemagni, and V. Pirrelli. Automatic extraction of semantics in law documents. In Proceedings of the V Legislative XML Workshop, 2007.
R. Bartolini, A. Lenci, S. Montemagni, V. Pirrelli, and C. Soria. Automatic classification and analysis of provisions in Italian legal texts: a case study. In Proceedings of OTM Confederated International Conferences, 2004.
L. Dini, W. Peters, D. Liebwald, E. Schweighofer, L. Mommers, and W. Voermans. Cross-lingual legal information retrieval using a WordNet architecture. In Proceedings of the 10th international conference on Artificial intelligence and law, 2005.
E. Schweighofer, and A. Geist. Legal query expansion using ontologies and relevance feedback. In Proceedings of the 2nd Workshop on Legal Ontologies and Artificial Intelligence Techniques, 2007.
G. A. Miller. Wordnet: A lexical database for english. Commun. ACM, 1995.
F. Bond, and K. Paik. A survey of wordnets and their licenses. In Proceedings of the 6th Global WordNet Conference, 2012.
F. Bond, and R. Foster. Linking and extending an open multilingual wordnet. In Proceedings of the ACL. Association for Computational Linguistics, 2013.
W. Black, S. Elkateb, and P. Vossen. Introducing the arabic wordnet project. In Proceedings of the third International WordNet Conference, 2006.
A. F. Montraveta, G. Vazquez, and C. Fellbaum. The spanish version of wordnet 3.0. In Text Resources and Lexical Knowledge, 2008.
B. Sagot, and D. Fier. Building a free French wordnet from multilingual resources. In Ontolex, 2008.
B. Hamp, and H. Feldweg. Germanet - a lexical-semantic net for german. In Proceedings of ACL workshop Automatic IE and Building of Lexical Semantic Resources for NLP Applications, 1997.
V. dePaiva, and A. Rademaker. Revisiting a brazilian wordnet. In Proceedings of Global Wordnet Conference. Global Wordnet Association, 2012.