Database Reference
In-Depth Information
Figure 1-3. A general big data environment
Figure 1-3 does not include all types of feeds. Also, it does not have the feedback loops that probably would exist.
For instance, data warehouse feeds might form inputs, have their data enriched, and feed outputs. Web log data might
be inputs, then enriched with location and/or transaction data, and become enriched outputs. However, the idea here
is that a single, central big data repository can exist to hold an organization's big data.
Benefits of Big Data Systems
Why investigate the use of big data and a parallel processing approach? First, if your data can no longer be processed
by traditional relational database systems (RDBMS), that might mean your organization will have future data
problems. You might have been forced to introduce NoSQL database technology so as to process very large data
volumes in an acceptable time frame. Hadoop might not be the immediate solution to your processing problems,
owing to its high latency, but it could provide a scalable big data storage platform.
Second, big data storage helps to establish a new skills base within the organization. Just as data warehousing
brought with it the need for new skills to build, support, and analyze the warehouse, so big data leads to the same type
of skills building. One of the biggest costs in building a big data system is the specialized staff needed to maintain it
and use the data in it. By starting now, you can build a skills pool within your organization, rather than have to hire
expensive consultants later. (Similarly, as an individual, accessing these technologies can help you launch a new and
lucrative career in big data.)
Third, by adopting a platform that can scale to a massive degree, a company can extend the shelf life of its system
and so save money, as the investment involved can be spread over a longer time. Limited to interim solutions, a
company with a small cluster might reach capacity within a few years and require redevelopment.
Fourth, by getting involved in the big data field now, a company can future-proof itself and reduce risk by
building a vastly scalable distributed platform. By introducing the technologies and ideas in a company now, there
will be no shock felt in later years, when there is a need to adopt the technology.
In developing any big data system, your organization needs to keep its goals in mind. Why are you developing the
system? What do you hope to achieve? How will the system be used? What will you store? You measure the system use
over time against the goals that were established at its inception.
 
Search WWH ::




Custom Search