Improving Mobile Web Navigation Using N-Grams Prediction Models - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

MInPATh Algorithm

The MINCOST algorithm works as follows Anderson et al. (2001). Given a sequence of pages, called

prefix, it uses prediction models to find the possible next pages and their probabilities. These pages and

their savings are added to a shortcut list. For each page found by the prediction model, it is appended to

the sequence, forming a suffix, which is used to find more shortcuts following the same procedure. This

process continues recursively until the length of the suffix exceeds the depth bound or the probability

of the predicted page becomes less than the probability threshold. Once the recursive part is over, the

shortcuts found are sorted in descending order based on savings and the best shortcuts are returned.

The number of shortcuts to be returned are user specified.

To estimate savings of a shortcut, MINPATH counts the number of links saved by following the

shortcut. For example, if a prefix is A-B-C, and a suffix is A-B-C-D-E-F-G, the expected saving for a

shortcut D-> G is two.

There are two threshold parameters, depth bound and probability threshold; used in the MINPATH

algorithm to limit searches for shortcuts. Depth bound represents the maximum length of the suffix.

Probability threshold represents the minimum value of page probability.

MINPATH algorithm uses Markov models (Deshpande & Karypis, 2000; Sarukkai, 2000). Though

accurate these models are complex and require Web graphs as a part of their implementation. There is

a need to find less complex prediction models with comparable accuracy. Besides, MINPATH does not

consider page size in estimating saving.

t he Pro Posed aPProach

There are three steps in our approach to improve navigation for mobile Web users. First, Web server

logs are preprocessed to extract sessions. Second, an N-gram prediction model is built from these ses-

sions. Third, an algorithm that extends MINPATH, called MINCOST, recommends shortcuts based on

the N-gram model and the current browsing sequence.

Server Log Preprocessing

A record in a server log file contains raw browsing data, such as the IP address of the user, date and time

of the request, URL of the page, the return code of the server, and the size of the page, if the request is

successful. Since such records are in chronological order, they do not provide much meaningful infor-

mation about user browsing. The Web server log files are transformed into a set of sessions. A session

represents a single visit of a user. The following procedure is used to transform a server log file into

sessions (Mobasher, Cooley, & Srivastava, 1999b).

1 . Records about image files (.gif, .jpg, etc.) and unsuccessful requests (return code belonging to the

4XX series) are filtered out.

2. Requests from the same IP address are grouped into a session. A timeout of max-idle is used to

decide the end of a session, i.e., if the same IP address does not occur within a period of max-idle

seconds, the current session is closed. Subsequent requests from the same IP address will be treated

as a new session.

Search WWH ::

Custom Search

Home