Open Data Federation

Open Data Federation (ODF) is a web application able to federate existing Open Data Management Systems (ODMS) based on different technologies; in this way ODF provides a unique access point to search and discover open data sets coming from the different federated ODMS. ODF uniforms representation of collected Open Data Set, thanks to the adoption of international standards (DCAT-AP) and provides a set of APIs to develop third party applications. ODF supports natively ODMS based on CKAN, DKAN and Socrata and provides a set of APIs to federate ODMSs not natively supported; these ODMSs have to implement and expose them. In addition, it is possible to federate a generic Web Portal, either by using the Web Scraping functionality or by uploading a dump of the datasets in DCAT-AP format. Moreover, ODF provides a SPARQL endpoint in order to perform queries on 5 stars RDF linked open data collected from federated ODMSs.

Content

Architecture Overview

Open Data Federation provides access to resources of federated ODMSs from a single-entry point through a set of APIs and is able to retrieve, search and visualize datasets from different ODMSs. The platform is responsible for collecting metadata of Open Data from federated ODMS catalogues and then for translating them into a common and uniform format. In addition, it manages Linked Open Data (LOD), importing them into a specific repository in order to perform queries on them. The following picture illustrates the architecture of the Open Data Federation.

alt tag

Its main components are:

  • Federation Manager: is the core of the platform that interacts with federated ODMS catalogues; it is responsible for managing internal federation processes. It provides the main functionalities through Platform API in order to be accessed by external application or by the Federated Open Data Catalogue. Main functionalities provided by the FM are:
    • ODMS catalogues management: registration, removal and monitor.
    • Federated full text search: possibility to search for specific Open Data on the federated ODMS catalogues.
    • Federated queries on Linked Open Data.
    • Federation configuration management
  • LOD Repository: is the central store in which collected Linked Open Data retrieved from federated ODMS catalogues are stored, in order to perform queries on them and to provide collected results in different formats.
  • Federated Open Data Catalogue: is a web application that allows end users to access the FM functionalities calling the Platform API. In particular, the Federated Open Data Catalogue allows to:
    • Manage administrator authentication
    • Search for Open Data/Linked Open Data, visualise and manage results
    • Manage Federation and configuration.

The Federation Manager functionalities can be also accessed by a generic external system (e.g. client application) using the Platform API. It is important to underline that each ODMS catalogue depicted in the picture is a generic system that manages OD/LOD. Usually it consists in a web portal associated to a database. In order to be federated in the ODF, the ODMS has to provide some basic functionalities through RESTful APIs. One of the objectives of the ODF is to allow the federation of different ODMSs with minimum effort. Different type of ODMS catalogues will be natively supported by ODF: CKAN, Socrata, DKAN or portals that provides the datasets through a DCAT-AP or DCAT-AP_IT dump; ODF provides Federation API Specification to allow “custom ODMS catalogues” to join the federation; moreover, custom ODMS catalogues that does not provide APIs can join the federation through the scraping of its web portal.

Administration Manual

This section provides the description of the administration functionalities. An administrator should be able to install, deploy, perform the sanity checks on the environment and manage the platform through the Federated Open Data Portal.

Installation

This section covers the steps needed to properly install the Open Data Federation

Requirements

ODF has the following requirements that must be correctly installed and configured

Framework Version License
Java SE Development Kit 8.0 Oracle Binary Code License
Apache Tomcat 8.0 Apache License v.2.0
MySQL 5.7.5 Community GNU General Public License Version 2.0
RDF4J Server 2.2.1 EDL 1.0 (Eclipse Distribution License)
RDF4J Workbench 2.2.1 EDL 1.0 (Eclipse Distribution License)

Libraries

ODF is based on the following software libraries and frameworks.

Framework Version Licence
Apache SOLR-Lucene (SOLR Core) 6.6.0 Apache License
Apache Http Client 4.5.2 Apache License
Apache Http Core 4.5.2 Apache License
Mysql connector (Community Release) 5.1.39 GPL 2.0 (GNU General Public License Version)
Hibernate 5.2.10.Final LGPL 2.1 (GNU Lesser General Public License)
Hikari 2.6.1 Apache License 2.0
Log4j 2.7 Apache License 2.0
CKANClient-J 1.7 AGPL 3.0 (GNU Affero General Public License)
RDF4J-Runtime 2.2.1 EDL 1.0 (Eclipse Distribution License)
AngularJS 1.5.9 MIT
Angular-UI - bootstrap-ui 0.13.3 MIT
Bootstrap 3.3.2 MIT
Bootstrap-Material 3 MIT
Smart-table 2.1.3 MIT
ngImageCrop 0.3.2 MIT
spin.js 2.3.2 MIT
angular-zeroclipboard 0.8.0 MIT
angular-xeditable 0.1.8 MIT
angular-pagination 0.11.0 MIT
Ace Editor 1.2.0 BSD
Angular-UI - ace-ui 0.2.3 MIT

Prerequisites

The following tools should be properly installed on your computer:

Proxy configurations

In order to use the different tools behind a proxy please execute the following commands (username and password are your credential, proxyhost is the host name or the IP address of the proxy and proxyport is the TCP port of the proxy):

  • Git: open a command prompt and execute:
    $ git config --global http.proxy http://username:password@proxyhost:proxyport
    $ git config --global https.proxy http://username:password@proxyhost:proxyport
    
  • Npm: open a command prompt and execute:
    $ npm config set proxy http://username:password@proxyhost:proxyport
    $ npm config set https-proxy http://username:password@proxyhost:proxyport
    
  • Bower: change the current directory to the one that contains the “bower.json” file and create/edit the “.bowerrc” file and add the proxy configuration:

    {
            "proxy" : "http://username:password@proxyhost:proxyport",
            "https-proxy" : "http://username:password@proxyhost:proxyport"
    }
    
  • Maven: edit the file “Path_Of_Maven/conf/settings.xml” and add to the “<proxies>” section the proper configuration following the example provided in the same file (please refer to maven guide https://maven.apache.org/guides/mini/guide-proxies.html)

Create WAR packages

Open a command prompt and Execute the following command to clone the repository:

$ git clone https://production.eng.it/gitlab/OPSI/OpenDataFederation.git
$ cd OpenDataFederation

In this folder you will find two subfolders:

  • FederationManager: this folder contains the server side application of the Open Data Federation
  • ODFCatalogue: this folder contains the client side application of the Open Data Federation
FederationManager.war
Move in FederationManager folder:
$ cd FederationManager
$ mvn package

Note. Execute this command in a network without proxy because of jitpack dependency.

ODFCatalogue.war
Move in ODFCatalogue folder:
$ cd ODFCatalogue
$ cd /src/main/webapp
$ bower install
$ cd ../../..
$ mvn package

Deployment

This page shows the deployment procedure of the Open Data Federation.

Artefacts

These are the artefacts that must be installed in order to run ODF:

  • FederationManager.war
  • ODFCatalogue.war
  • rdf4j-workbench.war & rdf4j-sesame.war (you can get both here , into "war" folder)
  • opendata_federation.sql

Database creation

ODF relies on a MySQL database to store all the application data and collected Open Datasets.

So before deploying the application, it is necessary to create a new database, by importing in the MySQL server the provided SQL dump file:

  • opendata_federation.sql

This dump already contains the statement that creates the “opendata_federation” DB automatically. In addition it creates an administration user with the following credentials:

username: admin

password: admin

Note. To change the administrator password login in the Open Data Catalogue with the previous credentials then go to the Administration -> Manage Configurations -> Update Password section.

WARs deployment

Move all the WAR artifacts to the “webapps” folder of Tomcat installation, start it up and wait until they are deployed.

RDF repository creation

Once the Tomcat server started, go with browser to the URL “localhost:8080/rdf4j-workbench”

Note. Change the port number according to the configuration of server.xml file of Tomcat “conf” folder (default 8080)

Through the RDF4J GUI, select “new repository” on the left menu, then create a new repository of type “Native Java Store” called “ODF”.

Configuration

Once all the WAR files are deployed and the server has started, modify the following configuration files, located in the deployed folders of Tomcat “webapps” folder.

  • ODFCatalogue/WEB-INF/classes/
  • FederationManager/WEB-INF/classes/
    • In configuration.properties file, change the following properties:
      • DB_HOST, DB_USERNAME, DB_PASSWORD with the actual parameters of the MySQL server installation.
      • http.proxyHost, http.proxyPort, http.proxyUser, http.proxyPassword with the proxy parameters, leave blank if none. Change http.proxyEnabled to true if the previous proxy parameters are provided.
      • odmsDumpFilePath and dumpFilePath with the folder path where to save the DCAT-AP dump files. NOTE The path MUST end with "\" or "/".
      • sesameRepositoryName must have the same value of the newly created RDF repository.
      • enableRdf to true, in order to enable RDF retrieval, configured with the following parameters, according to the Tomcat configuration, as described in the “RDF repository creation” step:
        • sesameServerURI with the URL where to find the "repositories" endpoint of RDF4J. Example: http\\://localhost\:8080/rdf4j-server/repositories/
        • sesameEndPoint with the URL where to find the "query" endpoint. Example: http\://localhost\:8080/rdf4j-workbench/repositories/ODF/query
    • In hibernate.properties file, change the following properties:
      • hibernate.connection.url, hibernate.connection.username, hibernate.connection.password with the actual parameters of the MySQL server installation.

Sanity Checks

In order to apply the previous changes, restart the Tomcat server. The Sanity Checks are the steps that the Administrator will take to verify that the installation is ready to be used and tested.

Note. Change the “BASEPATH” value with the actual host and port where is exposed the runtime environment (Tomcat).

Catalogue Access Testing

Once the server restarted, go with browser to http://BASEPATH/ODFCatalogue

When the home page is showed, perform the following steps:

  • Check that the message "There are no federated catalogues" is showed.
  • Check that you can perform the Login as Administrator, in the appropriate section in the top bar.

Platform API testing

  • Open a command prompt and execute: curl http://BASEPATH/FederationManager/api/v1/administration/info
  • Check that you get the version number as output, along with other information about API version and timestamp

Platform Management

This section provides the description of the Administration Functionalities. Through the Open Data Catalogue a logged administrator can:

  • Manage ODMS Catalogues;
  • Manage configuration parameters;
  • Manage datalets;
  • View platform logs.

Catalogues Managements

In this page the administrator manages the Catalogues. In particular, he/she is able to:

  • Add/Edit/Delete a Catalogue
  • Add from a Remote Catalogues
  • Activate/Deactivate a Catalogue;
  • Start the synchronization of a Catalogue;
  • Download a catalogue dump or the federation dump with DCAT-AP profile

alt tag

The following pictures depicts the functionalities linked to every button or icons.

alt tag

Add/Edit/Delete a Catalogue

By clicking on the ADD button the following the Catalogue form is presented to the administrator.

alt tag

Here the administrator has to insert all of the information related to the catalogue and then click on the CREATE button.

By clicking on the edit icon on the Catalogue table, the user can edit most of the Catalogue's information. He/she cannot modify the host and type attributes.

By clicking on the delete icon on the Catalogue table, the user deletes the Catalogue and its datasets from the federation. This operation cannot be reverted.

Remote Catalogues

New Catalogues can be added to the federation using the remote catalogues list. This remote list is a catalogue repository maintained by Engineering. In the remote catalogue list an ODF administrator can find certified catalogues and by clicking on the plus icon he can insert the selected catalogue in his/her ODF instance.

alt tag

Activate/Deactivate a Catalogue

This functionality allows the administrator to manage on which catalogues the user can perform searches. Indeed, if a catalogue is active users will find its datasets during a search; if a catalogue is inactive user will not find any of its datasets during a search.

Catalogue Synchronization

By default, Catalogues are automatically synchronized from the platform taking advantage of the refresh period attribute. If an administrator will force the synchronization of a catalogue he/she would have to click on its synchronize button.

Download Dump

The administrator can download a DCAT-AP dump the Federated Open Data Catalogue. He/she can choose to download a single catalogue dump or the complete federation dump by clicking respectively on the download button in the catalogue's row of the table or on the global download button located at the bottom of the table.

Configuration Parameters Management

An administrator can modify some of the configuration parameters that control the loading of the RDF files into the LOD repository. In particular, he/she can: - Enable RDF controls: if false all RDFs will be loaded into the LOD repository, if true only the RDFs which pass the controls will be loaded, the others will be discarded; - Enable RDF max size check: this configuration parameter if true will enable the controls on RDFs size; - RDF max dimension: if the previous configuration parameter is true, this parameter will represent the size limit of an RDF in order to be loaded into the repository. RDFs whose dimension exceeded will be discarded. Moreover, the administrator will define the default catalogue's refresh period.

alt tag

The administrator can also update his/her password and he/she can manage the RDFs' prefixes through the console.

Datalets Management

Through this page, the administrator can manage all of the datalets produced by the end users.

alt tag

The administrator will check the number of views and the last time the datalets was seen by end users. The administrator will be able to delete the datalet or to see its preview.

alt tag

Platform Logs

This page will show the Logs produced by the back-end server in the GUI. The administrator will be able to query the logs in order to search for a particular event. The following figure depicts this functionality.

alt tag

End User Manual

This section provides the description of the End User Functionalities. Through the Open Data Catalogue a user can:

  • Search datasets filtering by their metadata;
  • Create graphical representation of dataset resources called Datalet;
  • Execute SPARQL queries on RDF resources;
  • View the federated ODMS in the platform.

Datalet Creation

A Datalet is a view WC, which is used to create rich, reusable visualization of open data. It was developed under the ROUTE-TO-PA project. The datalet creator tool called DatalEt-Ecosystem Provider (DEEP) was integrated with ODF in order to provide to users an open data visualization tool. For any further references about datalets please check https://github.com/routetopa/spod/wiki/Datalets.

In order to create a datalet the user should follow this steps:

  • Select fields
  • Select the graphical representation

Selecting fields

The datalet creation process starts with the selection of the fields from the resource. In this page the user can add all or a subset of the original fields. Moreover, the user can also filter the data through a dedicated panel. The user should then click on the right arrow to continue the process.

alt tag

Select the graphical representation

The next step is to choose the graphical representation of the selected fields and the proper association among the selected fields and the chart inputs. The following picture depicts a pie chart example.

alt tag

In order to show the datalet in the ODF environment the user should click on the Add button.

SPARQL Queries

This functionality allows the user to search over LOD downloaded from the federated dataset and stored into RDF4J triple store. In this page the user can write his SPARQL query and select the format of the output between XML or JSON.

alt tag

The result of the query is showed to the user and he/she can download.

alt tag

Catalogues overview

In this page all of the federated Catalogues are showed to user. The user can have a brief description of the Catalogue, check its country and category. Moreover, by clicking on the search button, the user can see all of the its datasets. The user can select between two views:

  • Card

alt tag

  • Table

alt tag