Databases Reference
In-Depth Information
failover. To manage this, the client session ID information is associated with child zNodes and locks,
which will enable the failover client to synchronize.
Zookeeper is a highly available system, and it is critical that it can perform its functions in a
timely manner. It is recommended to run Zookeeper on dedicated machines. Running it in a shared-
services environment will adversely impact performance.
In Hadoop deployment, Zookeeper serves as the coordinator for managing all the key activities:
Manage configuration across nodes . Zookeeper helps you quickly push configuration changes
across dozens or hundreds of nodes.
Implement reliable messaging. A guaranteed messaging architecture to deliver messages can be
implemented with Zookeeper.
Implement redundant services. Managing a large number of nodes with a Zab approach will
provide a scalable redundancy management solution.
Synchronize process execution. With Zookeeper, multiple nodes can coordinate the start and end
of a process or calculation. This approach can ensure consistency of completion of operations.
Please see the Zookeeper configuration and administrators guide for further details
( http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html ).
Pig
Analyzing large data sets introduces dataflow complexities that become harder to implement in a
MapReduce program as data volumes and processing complexities increase. A high-level language
that is more user friendly, is SQL-like in terms of expressing dataflows, has the flexibility to manage
multistep data transformations, and handles joins with simplicity and easy program flow, was needed
as an abstraction layer over MapReduce.
Apache Pig is a platform that has been designed and developed for analyzing large data sets. Pig
consists of a high-level language for expressing data analysis programs and comes with infrastructure
for evaluating these programs. At the time of writing, Pig's current infrastructure consists of a com-
piler that produces sequences of MapReduce programs. Pig's language architecture is a textual lan-
guage platform called Pig Latin, of which the design goals were based on the requirement to handle
large data processing with minimal complexity and include:
Programming flexibility. . The ability to break down complex tasks comprised of multiple steps and
interprocess-related data transformations should be encoded as dataflow sequences that are easy to
design, develop, and maintain.
Automatic optimization . Tasks are encoded to let the system optimize their execution
automatically. This allows the user to have greater focus on program development, allowing the
user to focus on semantics rather than efficiency.
Extensibility. Users can develop user-defined functions (UDFs) for more complex processing
requirements.
Programming with pig latin
Pig is primarily a scripting language for exploring large data sets. It is developed to process multiple
terabytes of data in a half-dozen lines of Pig Latin code. Pig provides several commands to the devel-
oper for introspecting the data structures in the program, as it is written.
Search WWH ::




Custom Search