Information Technology Reference
web-based public portal (Indice della Pubblica Amministrazione, 2012). The portal
allows anyone to search previously mentioned contact information of PAs.
The next section describes the work carried out at our agency to convert the IPA
data set in RDF and thus create what we called Linked Open IPA.
12.4 Methodology Used in SPCData and the
Application to Linked open iPA
Our work for creating Linked Open IPA has been driven by LOD principles and
Semantic Web best practices. As a result, the Linked Open IPA is assessed at the
highest level of the W3C open data rating system (Berners-Lee, 2012).
The methodology we have devised consisted of five main phases that are sketched
in Figure 12.2. We think this methodology is sufficiently general to be applied for
producing linked open data sets starting from any relational DBMS that constitute
both SPC and PAs' assets. In fact, we have applied such methodology to other data
sets that we have published, for example, a data set about public contracts.
12.4.1 Data Cleansing
The input of our whole process is a relational DBMS dump of IPA. As mentioned
before, IPA data were originally managed by using LDAP and subsequently imported
into the relational DBMS MySQL. However, this import process, along with a lack
of proper modeling constraints, caused a degradation of the quality of the data repre-
sented into the database. Thus, a data cleansing phase becomes crucial.
We managed the data cleansing through MySQL itself; in particular, we imple-
mented an SQL script to automate the execution of this phase. An example of IPA
data on telephone numbers and URLs of services offered by PAs is the following.
As can be noticed, in these example data, the “ 1# ” characters are pre-pended;
they derive from LDAP and can cause problems when converting IPA in RDF.
With a MySQL script, we can successfully clean the data. The cleansing phase can
be performed before or after the modeling phase (see Section 12.4.2).
In parallel to the cleansing phase, our modeling phase consists of an assessment of
the schema of the IPA database where the most important concepts to be published
are selected and then renamed if required. This phase also comprises an evaluation