Database Reference
In-Depth Information
are used to deeply analyze the candidate hypernodes and detect those containing
people.
Candidate hypernodes detection . A person has a number of characteristics such
as name, surname, birthday, address, and email. Some of these characteristics are
used when designing databases containing persons. We collect these characteristics
from various ontologies such as FOAF ontology and person ontology (schema-
Web 9 ) and we manually build a person ontology (PO) containing all these char-
acteristics and their synonyms (collected from WordNet). Using the person
ontology, the set of nodes related to each hypernode in the LHD is analyzed.
If the node's name is one of the PO concepts, the number of characteristics for
this hypernode is incremented.
l
If the number of characteristics for the hypernode
1 and one of them contains
a name, the hypernode h is a candidate to contain persons.
l
Candidate hypernodes Analysis . Each candidate hypernode has a set of instance
hypernodes h i . In order to analyze the name found in each instance of hypernode
(we take just the ten first entities), the name is sent to the Web search engine (Bing
API). The top ten returned documents are downloaded and parsed using DOM. 10
Each document is analyzed using the NER (Named entity Recognition) proposed
by Stanford 11 and which put three kinds of tags (person, location, or organization).
We give to each document a rank rd. If the name is tagged in the document by
Person, the document is ranked by rd
0. The average assigned to
the name found in the hypernode instance h i (avghi) counts how many times is
considered as a person name in the documents (where the tag of this name is
Person)
¼
1 else rd
¼
P rd
number documents
avghi
¼
(9.1)
The average assigned to the hypernode (avgH) calculates the average where the
names found in its hypernode instances are considered as a person's name:
P avghi
number hi
avgH
¼
(9.2)
In order to identify persons, we use the NER proposed by Stanford: in which
precision is an average of about 90% to find Person entities; so, a hypernode is
considered as representative of a person if more than 60% of its instances contains a
person name (we take only 60% as a threshold due to problems such as wrongly
written name and use of abbreviations, which decrease the precision of NER).
9
http://ebiquity.umbc.edu/ontology/person.owl
10
http://www.w3.org/DOM/
11 http://nlp.stanford.edu/ner/index.shtml
Search WWH ::




Custom Search