Databases Reference
In-Depth Information
on Hadoop with minimal configuration efforts and can scale very effectively. While not all machine
learning algorithms mandate the need for an enterprise data scientist, this is definitely the most com-
plex area in the processing of large data sets, and having a team of data scientists will definitely be
useful for any enterprise.
SUMMARY
As we see from the discussions in this chapter, processing Big Data is indeed a complex and chal-
lenging process. Since the room for error in this type of processing is very minimal if allowed,
the quality of the data used for processing needs to be very pristine. This can be accomplished by
implementing a data-driven architecture that uses all the enterprise data assets available to create a
powerful foundation for analysis and integration of data across the Big Data and the DBMS. This
foundational architecture is what defines the next generation of data warehousing where all data types
are stored and processed to empower an enterprise toward making and executing profitable decisions.
The data in the next-generation data warehouse cannot be growing forever, as the initial size of the
new data warehouse starts in the hundreds of terabytes range and normally touches a petabyte very
easily. The next chapter will focus on the relevance of information life-cycle management in the age
of Big Data, where we need to ensure that the right data sets are always available for processing and
consumption by the user at the right time, along with the right metadata. Additionally, the discussion
will also focus on when we archive data from Hadoop or NoSQL and how to store that data set if
needed for reprocessing.
Search WWH ::




Custom Search