Database Reference
In-Depth Information
TABLE 8.2
Order and Size of Networks
Name
Vertices
Edges
Description
YahooWeb
1413 million
6636 million
WWW pages in 2002
LinkedIn
7.5 million
58 million
Person-person in 2006
4.4 million
27 million
Person-person in 2005
1.6 million
6.8 million
Person-person in 2004
85 thousand
230 thousand
Person-person in 2003
Wikipedia
3.5 million
42 million
Doc-doc in 2007/02
3 million
35 million
Doc-doc in 2006/09
1.6 million
18.5 million
Doc-doc in 2005/11
Kronecker
177 thousand
1977 million
Synthetic
120 thousand
1145 million
Synthetic
59 thousand
282 million
Synthetic
19 thousand
40 million
Synthetic
WWW-Barabasi
325 thousand
1497 thousand
WWW pages in nd.edu
DBLP
471 thousand
112 thousand
Document-document
Flickr
404 thousand
2.1 million
Person-person
Epinions
75 thousand
508 thousand
Who trusts whom
machines (minimum number of machines, which Hadoop “distributed mode” sup-
ports), the fastest method ( GIM-V BL-CL) ran 5.27 times faster than GIM-V BASE.
With 90 machines, GIM-V BL-CL ran 2.93 times faster than GIM-V BASE. This is
expected since there are fixed component (JVM load time, disk I/O, network com-
munication), which cannot be optimized even if we add more machines.
Next, we show how the performance of our methods changes as the input size
grows. Figure 8.5c shows the running time of GIM-V with different number of edges
under 10 machines. As we can see, all of the methods scale linearly with the number
of edges.
Next, we compare the performance of GIM-V DI and GIM-V BL-CL for Hcc in
graphs with long chains. For this experiment, we made a new graph whose diameter
is 17, by adding a length 15 chain to the 282 million Kronecker graph, which has
diameter 2. As we see in Figure 8.6, GIM-V DI finished in 6 iterations, whereas
GIM-V BL-CL finished in 18 iterations. The running time of both methods for the
first 6 iterations are nearly same. Therefore, the diagonal block iteration method
decreases the number of iterations while not affecting the running time of each itera-
tion much.
Finally, we compare the number of iterations with/without renumbering.
Figure 8.7 shows the degree distribution of LinkedIn. Without renumbering, the
minimum vertex has degree 1, which is not surprising since about 46% of the
vertices have degree 1 due to the power law behavior of the degree distribution.
We show the number of iterations after changing the minimum vertex to each
of the top 5 highest degree vertices in Figure 8.8. We see that the renumbering
 
Search WWH ::




Custom Search