Information Technology Reference
In-Depth Information
In a connected directed graph G, an acyclic sub-graph B is a branch in G if the in-
degree of each vertex is at the most 1. Let w(B) be the sum of the weighted arcs in
branch B. The maximum weight branching problem is the optimization problem that
finds the largest possible branch B and was proposed in [10] known as Edmonds'
branching algorithm. The algorithm is described in two steps: the condensation
process, to remove the cycles, and the unraveling process where the branch is created.
Fulkerson [11] presents the same problem with an additional constraint, the branch
must start in a vertex called the root. His algorithm for the maximum packing rooted
directed cuts in a graph is equal to the weight of the minimum spanning tree directed
away from the root. The algorithm is also described in two steps.
Both algorithms [10] and [11] generated trees, with in-degree zero or one. To find
the maximum weighted poly-tree, there are few or inexistent bibliographic references.
3
Ramex Algorithm
The aim of this approach is to create a poly-tree of events, with as many branches as
needed to visit all the vertexes. The Ramex algorithm uses a Poly-tree Sequence
model with two phases: the transformation of the problem into a network and the
search of the sequences.
Algorithm 1. Ramex, Two Phase Algorithm to get the Poly-tree Sequence Model
Input: raw data (#id, event stream);
Output: a poly-tree of events;
1) Problem transformation, by creating next-events and accumulating into the network
2) Search for the most weighed Poly-tree sequence of events
3.1
Problem Transformation
The event log is represented in a structure with two columns (#id, event stream). Our
approach uses a two-phase algorithm: the transformation of a database into a network
and the search of the maximum weighted poly-tree.
For each line in the table, starting with the first event, a new attribute next-event is
created, and continues until the last event. Then a network, i.e. a graph G with a
source and a sink is created, where each sequence has an initial node called source
and a final node called sink. The network, where cycles are allowed, condenses the
information of the database by incorporating all the event sequences. In network G
each state corresponds to an event and each transition represents the sequence from
one event to the next-event. The weight of each arc corresponds to the number of
times that one item precedes the next-item.
The transformation of a database into a network is identical in the Markov Chain or
Petri nets approaches. As the patterns that occur in nature, the accumulation of
elements produces a specific shape. For example, a dune, created by wind is
composed by several layers of sand and has a recognized configuration. The
advantage is getting a macro view of the system, instead of extracting rules from a
small subset of data, as sequence/episode mining algorithms do.
Search WWH ::




Custom Search