Database Reference
In-Depth Information
process is repeated until the web crawler is stopped. Data acquisition through a
web crawler is widely applied in applications based on web pages, such as search
engines or web caching. Traditional web page extraction technologies feature
multiple efficient solutions and considerable research has been done in this
field. As more advanced web page applications are emerging, some extraction
strategies are proposed in [ 19 ] to cope with rich Internet applications.
The current network data acquisition technologies mainly include traditional
Libpcap-based packet capture technology, zero-copy packet capture technology,
as well as some specialized network monitoring software such as Wireshark,
SmartSniff, and WinNetCap.
￿
Libpcap-Based Packet Capture Technology : Libpcap (packet capture library) is
a widely used network data packet capture function library. It is a general tool
that does not depend on any specific system and is mainly used to capture data
in the data link layer. It features simplicity, easy-to-use, and portability, but has
a relatively low efficiency. Therefore, under a high-speed network environment,
considerable packet losses may occur when Libpcap is used.
￿
Zero-Copy Packet Capture Technology : The so-called zero-copy (ZC) means that
no copies between any internal memories occur during packet receiving and
sending at a node. In sending, the data packets directly start from the user buffer
of applications, pass through the network interfaces, and arrive at an external
network. In receiving, the network interfaces directly send data packets to the
user buffer. The basic idea of zero-copy is to reduce data copy times, reduce
system calls, and reduce CPU load while datagrams are passed from network
equipments to user program space. The zero-copy technology first utilizes direct
memory access (DMA) technology to directly transmit network datagrams to an
address space pre-allocated by the system kernel, so as to avoid the participation
of CPU. In the meanwhile, it maps the internal memory of the datagrams in the
system kernel to the that of the detection program, or builds a cache region in the
user space and maps it to the kernel space. Then the detection program directly
accesses the internal memory, so as to reduce internal memory copy from system
kernel to user space and reduce the amount of system calls.
￿
Mobile Equipments : At present, mobile devices are more widely used. As mobile
device functions become increasingly stronger, they feature more complex and
multiple means of data acquisition as well as more variety of data. Mobile
devices may acquire geographical location information through positioning
systems; acquire audio information through microphones; acquire pictures,
videos, streetscapes, two-dimensional barcodes, and other multimedia informa-
tion through cameras; acquire user gestures and other body language information
through touch screens and gravity sensors. Over the years, wireless operators
have improved the service level of the mobile Internet by acquiring and ana-
lyzing such information. For example, iPhone itself is a “mobile spy.” It may
collect wireless data and geographical location information, and then send such
information back to Apple Inc. for processing, of which the user may not be
Search WWH ::




Custom Search