Information Technology Reference
In-Depth Information
These facts represent a creation and search tradeoff. We gain better creation
time results for certain parameter settings but on the other hand we get worse
search complexity results. This tradeoff has even one more dimension which is
the amount of paths returned that are longer then l . Due to the space limitations
we are not able to discuss this dependence more in detail.
10.5.5
Deploying the ρ -index to the Real Life Data
The other set od data represents a piece of the CiteSeer [12] database of sci-
entific publications and citations among them. Our data set was created taking
one publication and deploying the breadth first search for all weakly accessible
publications from our starting one until we got a certain amount of vertices in
the built data set. Weak accessibility ignores the orientation of the edge between
two vertices. The amount of publications in our data set we set to be 30, 000. The
amount of edges acquired among the vertices in the data set was 63, 584. The
distribution of the degrees of the vertices in this citation graph is demonstrated
in Figure 10.15. The x-axis represents the particular vertex degree - respectively
the amount of edges initiated, terminated and a total number of both in the
particular vertex - and the y-axis then represents an amount of vertices having
this degree. The x-axis is drawn using logarithmic scale to make a clearer view of
the curve's progress. Also to the values on the y-axis the logarithm was applied
to achieve better readability of the demonstrated distribution.
The indegree represents the number of citations of each particular publication.
Notice that this distribution follows the power law that states that in the testing
citation graph is a small number of vertices that have large indegree and a large
number of vertices which's indegree is very small. This exactly conforms with the
reality where most of the publications receive a small number of citations and
was also presented in [4]. To the contrary, the outdegree represents the number
of references that the particular publication refers to. This number is not always
accurate since CiteSeer does not contain all the references for each publication
in its database.
From the ρ -index evaluation demonstrated on the synthetic data in the pre-
vious sections we know that the structure is usable to search for paths between
a pair of vertices. But now, using the CiteSeer's citation graph we can give
those paths semantics, some meaning. Since each vertex in the citation graph
4
5
5
inDegree
outDegree
degree
3.5
4
4
3
2.5
3
3
2
1.5
2
2
1
1
1
0.5
0
0
0
-0.5
-1
-1
-1
1
10
100
1000
10000
0
20
40
60
80
100
120
140
160
180
200
1
10
100
1000
10000
indegree
outdegree
total degree
Fig. 10.15. Vertex degree distribution in the citation graph
 
Search WWH ::




Custom Search