Databases Reference
In-Depth Information
scientist must have the ability to bring the scenarios to life by using data and visualization
techniques: this is nothing but storytelling. They also need to effectively collaborate
across several stakeholders within an enterprise (business and technology). Somebody
within the enterprise may be holding a vast knowledge of business context behind the
data patterns, but the data scientists need to transcend the statistics and mathematics
realm and effectively collaborate with these persons.
To solve complex problems, find patterns within volumes of data, and develop
intuitive and easily understandable data visualization, the data scientist must be
innovative in his/her thinking. The creativity element is very critical; think outside the
box, otherwise you end up looking at the data with the same pair of eyes and same
thoughts without realizing that the data is actually revealing some interesting aspects.
The data scientist should also have enough leadership qualities to emphatically position
the findings in front of senior management within the enterprise. Often the data scientist
needs to put together a team of data management resources and business analysts to
solve a complex problem. In such situations one must have the ability to lead a team
and manage the efforts of teams of statisticians, data administrators and integration
professionals, and data visualization, reporting, and application integration developers.
The Big Data Workflow
A big data platform can provide a rich data ecosystem by combining data from traditional
data warehouses. As far as unstructured data, machine-generated data, and free-form text are
concerned, finding answers from this enriched and vast data platform is not a trivial pursuit.
In general, data analysis has many constituent parts. Data must be acquired from
myriad sources and cleansed. It must be sorted and joined so that queries can be made
against it. It needs to be stored in persistent repositories. Analysts and programmers must
then work together in a statistical environment such as R, SAS, or SPSS to query the data.
Then the data must be visualized in some format—a static report, or perhaps in a
2D or 3D visualization tool. The problem is that all of this work with data is not done by
a business analyst alone. It is in large measure done by a team of specialists behind the
scenes in IT, and every step in this process requires getting someone else involved, who
already has a substantial backlog of work.
To the above process, when we add big-data-related unstructured data sources and
streaming data, etc., the complexity of managing the activities increases multifold and
involves a number of handoffs, resulting in delays based on high demand for specialized
data and analytic skills. The person closest to the business user, the data analyst or
business analyst, can't do most of the work, and so the time from question to insight
involves numerous delays. In fact, it is often the case that decisions are made based on
limited information long before the answers come back from the data analysis workflow.
Figure 9-2 is a representation of current practices adopted in a data analysis
workflow which can be contrasted with the workflow in a big data setting as shown in
Figure 9-3 .
 
Search WWH ::




Custom Search