Contribution

Improving Public Access to Legislation Through Legal Citation Detection and Linking: The Linkoln Project at the Italian Senate

Hyperlinks to cited norms, preferably resolved at the provision granularity level, are essential to improve readability of a legislative text. Besides for enhanced legislation navigation, the extraction and annotation of machine readable legal references as metadata of legislative texts guarantees interoperability and enables higher level applications in the Semantic Web and Linked Data domains. In Legal Information Retrieval, incoming and outgoing legal references of documents can be exploited to improve search results.

In order to overcome the limitations of the existing tools for the automatic extraction of legal references from Italian legal texts, in 2015 the Italian Senate promoted the design of a new software that could cover a wide number of authorities and typologies of act, support the main standard identifiers for legislative texts, be efficient, maintainable and easy to integrate in different environments, like web applications, existing legal platforms and so on. Hence, ITTIG developed Linkoln, a Java project, that was later integrated in ShowDoc, the application for the visualization of official acts (including legislative proposals, amendments, dossiers, etc.) on the public website of the Italian Senate.

In order to tackle the complexity of automatic legal reference extraction, Linkoln relies on a pipeline of specific services that analyze the text in order to identify, normalize and annotate the relevant textual features of a reference. Implemented using JFlex, a popular lexical scanner generator for Java, these annotation services support macros of regular expressions, rules and state, compiled into efficient lexical automata. At the end of the pipeline, patterns of identified entities are annotated as reference objects anchored to the original text. Finally, Linkoln associates one or more standard identifiers for legislative texts and URLs to the found references. Currently urn:nir, celex and eli are supported.

Besides plain texts, Linkoln also accepts previously annotated texts as input (HTML, XML) and it is able to render the additional reference annotations while preserving any pre-existing annotation. ShowDoc exploits this capability allowing users to detect legislative citations and activate hyperlinks while reading a document or a document partition by invoking Linkoln on the currently visualized HTML fragment. In this scenario Linkoln receives previously marked-up HTML text as input and returns the enriched HTML annotation with <a> tags in correspondence with the detected legislative references.

With the Linkoln project, we made available in the public domain a robust and reliable framework for the automatic detection of legislative references from Italian legal texts. After the adoption and integration in the publication workflow of the Italian Senate and thanks to its compliance with (both legal and web) open standards and its release as open source software, several Italian enacting authorities and Public Administrations, will be encouraged to test and adopt the software and contribute to its iterative refinement, evolution and maintenance.

Related Session:

October 11th, 2018
Session I. Legal Data under (Free, Open, Linked, Big) Data Deluge
10:30-13:30
Aula Magna of the Rectorate of the University of Florence