Information Technology Reference
In-Depth Information
interests are stored. This information can be directly stored into a data base which can
be taken later on for data mining. However, for a user it is often boring to answer all
these questions. Therefore, on-line forms or questionnaires should be set up in such a
way that they do not take too much of the user's time and that he is motivated to give
all the requested answers.
A newer trend is the Open Profiling Standard (OPS)
which allows to
automatically access user profiles from the browser of the client site. The OPS
standard defines the data format and the transaction rules for electronic profiles
3
.
The user can set up his profile on a voluntary basis and by doing so keep track of
what information he likes to provide. The other advantage of an electronic profile for
the user is that he only needs to define his basic profile once and not whenever he is
entering a web site.
11
3.3 Web Documents and Web Meta Data
The web documents ( HTML documuent, see Fig. 3) contain information such as text,
images, video or audio. They have a structure that allows to recognize for e.g. the title
of the page, the author, keywords and the main body. The formatting instruction must
be removed in order to access the information that we want to mine on these sides. An
example of an HTML document is given in Figure 3. The relevant information on this
page is marked with grey color. Everything else is HTML code which is enclosed into
brackets <>. The title of a page can be identified by searching the page for the code
<title> to find the beginning of the title and for the code </title> to find the end of the
title. Images can be identified by searching the webpage for the file extension .gif,
.jpg.
Web meta data give us the topology of a website. This information is normally
stored as a side-specific index table implemented as a directed graph. Usually, these
web meta data are specified manually by the website administrator. This can become
hard for large websites. Therefore, recently methods have been developed to annotate
this documents automatically.
4 Data Mining
4.1 Basic Problem Types
Data Mining
methods can be distinguished into two main categories of data
mining problems:
4
1. prediction and
2. knowledge discovery.
While prediction is the strongest goal, knowledge discovery is the weaker approach
and usually prior to prediction.
The classification of a customer into a customer who is highly likely to buy a
product belongs to predictive data mining. In this example, we have to mine a data
Search WWH ::




Custom Search