Database Reference
In-Depth Information
￿
Log Files : As one widely used data collection method, log files are record files
automatically generated by the data source system, so as to record activities in
designated file formats for subsequent analysis. Log files are typically used in
nearly all digital devices. For example, web servers record in log files number
of clicks, click rates, visits, and other property records of web users [ 7 ]. To
capture activities of users at the web sites, web servers mainly include the
following three log file formats: public log file format (NCSA), expanded log
format (W3C), and IIS log format (Microsoft). All the three types of log files
are in the ASCII text format. Databases other than text files may sometimes be
used to store log information to improve the query efficiency of the massive log
store [ 8 , 9 ]. There are also some other log files based on data collection, including
stock indicators in financial applications and determination of operating states in
network monitoring and traffic management.
￿
Sensors : Sensors are common in daily life to measure physical quantities
and transform physical quantities into readable digital signals for subsequent
processing (and storage). Sensory data may be classified as sound wave, voice,
vibration, automobile, chemical, current, weather, pressure, temperature, etc.
Sensed information is transferred to a data collection point through wired or
wireless networks. For applications that may be easily deployed and managed,
e.g., video surveillance system [ 10 ], the wired sensor network is a convenient
solution to acquire related information. Sometimes the accurate position of a
specific phenomenon is unknown, and sometimes the monitored environment
does not have the energy or communication infrastructures. Then wireless
communication must be used to enable data transmission among sensor nodes
under limited energy and communication capability. In recent years, WSNs
have received considerable interest and have been applied to many applications,
such as environmental research [ 11 , 12 ], water quality monitoring [ 13 ], civil
engineering [ 14 , 15 ], and wildlife habit monitoring [ 16 ]. A WSN generally
consists of a large number of geographically distributed sensor nodes, each being
a micro device powered by battery. Such sensors are deployed at designated
positions as required by the application to collect remote sensing data. Once the
sensors are deployed, the base station will send control information for network
configuration/management or data collection to sensor nodes. Based on such
control information, the sensory data is assembled in different sensor nodes and
sent back to the base station for further processing. Interested readers are referred
to [ 17 ] for more detailed discussions.
￿
Methods for Acquiring Network Data : At present, network data acquisition is
accomplished using a combination of web crawler, word segmentation system,
task system, and index system, etc. Web crawler is a program used by search
engines for downloading and storing web pages [ 18 ]. Generally speaking, web
crawler starts from the uniform resource locator (URL) of an initial web page
to access other linked web pages, during which it stores and sequences all the
retrieved URLs. Web crawler acquires a URL in the order of precedence through
a URL queue and then downloads web pages, and identifies all URLs in the
downloaded web pages, and extracts new URLs to be put in the queue. This
Search WWH ::




Custom Search