Information Technology Reference
In-Depth Information
MInPATh Algorithm
The MINCOST algorithm works as follows Anderson et al. (2001). Given a sequence of pages, called
prefix, it uses prediction models to find the possible next pages and their probabilities. These pages and
their savings are added to a shortcut list. For each page found by the prediction model, it is appended to
the sequence, forming a suffix, which is used to find more shortcuts following the same procedure. This
process continues recursively until the length of the suffix exceeds the depth bound or the probability
of the predicted page becomes less than the probability threshold. Once the recursive part is over, the
shortcuts found are sorted in descending order based on savings and the best shortcuts are returned.
The number of shortcuts to be returned are user specified.
To estimate savings of a shortcut, MINPATH counts the number of links saved by following the
shortcut. For example, if a prefix is A-B-C, and a suffix is A-B-C-D-E-F-G, the expected saving for a
shortcut D-> G is two.
There are two threshold parameters, depth bound and probability threshold; used in the MINPATH
algorithm to limit searches for shortcuts. Depth bound represents the maximum length of the suffix.
Probability threshold represents the minimum value of page probability.
MINPATH algorithm uses Markov models (Deshpande & Karypis, 2000; Sarukkai, 2000). Though
accurate these models are complex and require Web graphs as a part of their implementation. There is
a need to find less complex prediction models with comparable accuracy. Besides, MINPATH does not
consider page size in estimating saving.
t he Pro Posed aPProach
There are three steps in our approach to improve navigation for mobile Web users. First, Web server
logs are preprocessed to extract sessions. Second, an N-gram prediction model is built from these ses-
sions. Third, an algorithm that extends MINPATH, called MINCOST, recommends shortcuts based on
the N-gram model and the current browsing sequence.
Server Log Preprocessing
A record in a server log file contains raw browsing data, such as the IP address of the user, date and time
of the request, URL of the page, the return code of the server, and the size of the page, if the request is
successful. Since such records are in chronological order, they do not provide much meaningful infor-
mation about user browsing. The Web server log files are transformed into a set of sessions. A session
represents a single visit of a user. The following procedure is used to transform a server log file into
sessions (Mobasher, Cooley, & Srivastava, 1999b).
1 . Records about image files (.gif, .jpg, etc.) and unsuccessful requests (return code belonging to the
4XX series) are filtered out.
2. Requests from the same IP address are grouped into a session. A timeout of max-idle is used to
decide the end of a session, i.e., if the same IP address does not occur within a period of max-idle
seconds, the current session is closed. Subsequent requests from the same IP address will be treated
as a new session.
Search WWH ::




Custom Search