Information Technology Reference
In-Depth Information
To obtain the disruption score, we need to compute ˆ of the nodes in the
cascade. A citation network is a directed acyclic graph if cycles are considered
as errors. From Eq. (1), traversing the citation network in a topological order [8]
and updating ˆ values along the way will guarantee that no backtracking is nec-
essary to compute all ˆ values for all nodes. Therefore, we can apply topological
sortingtocompute ˆ and obtain the disruption scores. The time complexity of
topological sorting is O(
|
V C |
+
|
E C |
), which is linear to the sum of the number
of nodes and edges in cascade C .
2.3 Model-Based Approximation
Computation of the scores must be ecient given ever-growing number of
biomedical sciences papers and citations. One challenge of the proposed approach
is that the computation of the cascade structure and pairwise comparison can be
intractable. We have several strategies to accelerate the computation. One strat-
egy is to avoid exhaustive pairwise comparison by reusing intermediate results.
Suppose we would like to rank 100 candidate challengers by their disruption
scores. A brute-force approach is to compute the residue cascades for each of the
candidates. By sorting these candidates in their topological order in the citation
network, the ˆ values computed for the upstream candidates can be reused for
the downstream candidates and significantly reduce the computational costs.
Another strategy is by approximation, where we can take advantage of the
fact that citations decay exponentially over time to estimate the size of cas-
cades. Computing average cascade ʦ can be intractable. A brute-force algorithm
to compute ʦ is to traverse the citation network and update ˆ for each node
visited by a topological sorting algorithm. Such an exhaustive search algorithm
slows down as the size of the citation network increases exponentially in recent
years. Arbesman [9] shows that, for any paper, the longer away from the citing
paper, the less likely that the paper will be cited, and the decay is approximately
exponential. Also, modern papers cite more often and the average citations in-
crease each year. The result suggests that it is possible to model the citation
counts accurately and we will take advantage of that to derive an approximation
algorithm to accelerate the computation of cascade overtaking. We now present
accurate approximation algorithms scalable to very large scale citation networks
by taking advantage of properties of ʦ .
It has been suggested that citation counts of papers decay exponentially over
time [9] and the rate of decay can be estimated accurately. We plotted the curves
of the annual average citation count of the papers in the APS citation network
dataset as shown in Figure 2, which shows that the longer away from the citing
paper, the less likely that the paper will be cited, and the decay is approximately
exponential. Also, modern papers cite more often and the average citations in-
crease each year. The plot suggests that it is possible to model the citation
counts accurately and we will take advantage of that to derive an approximation
algorithm to accelerate the computation of cascade overtaking.
Given a cascade C , We model the number of citations by a function ʓ C ( t, ˄ ),
the average citations from papers published in time t to papers published ˄ time
 
Search WWH ::




Custom Search