Contribution

Towards a Linked Open Data Cloud of Language Resources in the Legal Domain

The work described in this paper is framed in the H2020 Lynx project, which is aimed at developing legal compliance services across different languages and legislations, based on a legal knowledge graph (LKG). The legal knowledge graph integrates and links heterogeneous compliance data sources including legislation, case law, standards and other private contracts.

In order to stablish the foundations of this legal knowledge graph, a sound set of language resources needs to be generated. Language resources are understood as pieces of structured data in machine-readable form such as corpora, terminologies, glossaries, lexicons or dictionaries. Within Lynx project, language resources are required to annotate and classify legal documents, train machine translation tools, test natural language processing algorithms and related activities.

The first stage of the process described in this paper comprises the identification of existing legal language resources. Some examples are Jurivoc (https://www.bger.ch/ext/jurivoc/live/de/jurivoc/Jurivoc.jsp?interfaceLanguage=german), a juridical thesaurus for Swiss regulations; the UNESCO thesaurus (http://skos.um.es/unescothes/?l=en), which contains terms from various fields including the legal domain; or the STW thesaurus (http://zbw.eu/stw/version/latest/about), covering the economy domain. These resources are available in different formats such as CSV, XML or PDF. Since the value of such assets increases when interlinked, the next step is the conversion into RDF, which is a good choice to represent structured information and metadata, and to easily create links between resources. As a result of this linking process, a Linguistic Legal Linked Open Data (LLLOD) cloud, core of the Legal Knowledge Graph, is being generated as part of the Linguistic Linked Open Data (LLOD) cloud, in turn a subset of the Linked Open Data cloud.

Part of the conversion work consists in selecting the vocabularies and properties to be applied to the datasets, heavily relying on SKOS vocabularies, that represents concepts and hierarchical relations between them, and the Ontolex vocabulary, that is used to model linguistic information, have been considered for this purpose.

More language resources are to be identified and converted into RDF and eventually, specific legal corpora provided by Lynx project will be used to create new language resources thanks to the implementation of automatic term extraction techniques. Both sets of resources, the one identified and the one created, will be linked to contribute to the enrichment of the Semantic Web in the legal domain.

Related Session:

October 11th, 2018
Session III.A. Semantic Interoperability of Legal Data
16:30-18:15
Aula Magna of the Rectorate of the University of Florence