Database Reference
In-Depth Information
Defining MapReduce
Many of the big data toolsets offer MapReduce, a software framework originally developed by
Google to improve its indexing algorithms and heuristics. The idea behind MapReduce is that large
amounts of unstructured data can be processed in lots of smaller parallel nodes across many proces-
sers or stand-alone machines. This effectively distributes the process loads such that it allows pro-
grammers to handle massive amounts of data much faster and without communication or server
failures. MapReduce is most commonly used for data mining, analysis of large financial systems, and
data-intensive scientific simulations.
Cons: Does not allow incremental update of data, forcing a complete rebuild when underlying data
changes. This is a pretty big limitation for large datasets as it can take quite a bit of time to reload.
IBM Big SQL
Provider: IBM
Web Site: www.ibm.com/software/data/bigdata
Platforms: This tool is an SQL engine that was developed by IBM and sits on the IBM-variant of the
Hadoop platform called BigInsights.
Technology Overview: Big SQL is a proprietary SQL engine that is designed for analytics purposes.
According to IBM, Big SQL takes the SQL syntax submitted by the user and translates it to individual
MapReduce jobs. Big SQL can also support real-time queries over a single node only.
Pros: On-premises or cloud installation. Allowing on-premises installation is critical if your organization
has a no-cloud policy.
Cons: Does not allow real-time SQL queries against multiple nodes because it relies on MapReduce
jobs for that.
Google BigQuery
Provider: Google
Web Site: http://cloud.google.com/products/bigquery
Platforms: This tool is a real-time SQL engine that was developed by Google using its proprietary
technology.
Technology Overview: Google BigQuery is a proprietary SQL engine that is designed for analytics
purposes and high scalability. BigQuery is the public version of Google's own Dremel query service
that Google has used for years to track device installation and analyze spam. Dealing in read-only
datasets, Google BigQuery allows programmers to use SQL-like queries to extract and analyze billions
of rows at a time.
 
Search WWH ::




Custom Search