Improving Mobile Web Navigation Using N-Grams Prediction Models - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

In our experiments, we used a value of 1800 seconds for max-idle, which is very common in Web

usage mining studies.

Prediction Model Learning

Once we are done with preprocessing of Web logs, the next step is to build prediction models to predict

the navigation patterns of Web users. If P 1 , P 2 , P 3 , . . ., P i are the pages browsed by the user so far, the

prediction models will try to predict the next page, P i+1 .

We use an N-gram based prediction model. An N-gram is a substring of N characters, each character

from an alphabet. The order of an N-gram is defined as N, the number of characters in the N-gram. In

the context of this work, the alphabet is the set of URLs of Web pages on the Web server. An N-gram

is a sequence of URLs.

After Web server logs are preprocessed into a number of sessions, an N-gram prediction model can

be built as follows:

1. Each session is decomposed into a set of overlapping, subsequent paths of length N.

2. These paths are entered into an N-gram table T as N-grams.

3. For each path in T, the next pages right after it and their occurrences, in all the sessions, are re-

corded.

4. The probabilities of the next pages for all paths are calculated from the occurrences of all possible

next pages.

For example, given a log file consisting of the following sessions, a 3-gram model for prediction can

be built as follows:

A, B, C, D, H

B, C, D, G

The first session is decomposed into two 3-grams: “A, B, C” and “B, C, D.” The second session is

decomposed into one 3-gram: “B, C, D.” For 3-gram “A, B, C,” the next page would be D, while for

3-gram “B, C, D,” the next page may be G or H. The complete N-gram table, T, is given in Table 1.

The N-gram table T is used as our prediction model. Given an observed path, it is matched against

N-grams in T. The predicted next pages and their corresponding probabilities of the matching entries

in T are returned for shortcut recommendations.

The algorithm for constructing an N-gram model is given in Figure 1. It is modified from the al-

gorithm from Su et al., (2000) such that it stores all possible next pages for an N-gram, rather than the

most probable one only.

This algorithm has a time complexity of O(|L| * |S|) where |L| is the number of sessions and |S| is the

length of sessions. The time of the algorithm is dominated by the first for loop. Its outer loop runs |L|

times and its inner loop runs |S| times, which gives the time complexity of O(|L| * |S|).

An N-gram based prediction model is proposed in Su et al. ( 2000), which evaluates the model's ac-

curacy without a specific application. In our study, the N-gram model is evaluated on its effectiveness

in improving mobile Web navigation. To suit our application, the model is modified so that a lookup

operation for an N-gram will return all predictions, instead of just the most probable one.

Search WWH ::

Custom Search

Home