Data Analytics Lifecycle - Data Science and Big Data Analytics

Database Reference

In-Depth Information

These are typical considerations that should be part of the thought process as the

team evaluates the datasets that are obtained for the project. Becoming deeply

knowledgeable about the data will be critical when it comes time to construct and

run models later in the process.

2.3.6 Common Tools for the Data Preparation Phase

Several tools are commonly used for this phase:

• Hadoop [10] can perform massively parallel ingest and custom analysis

for web traffic parsing, GPS location analytics, genomic analysis, and

combining of massive unstructured data feeds from multiple sources.

• Alpine Miner [11] provides a graphical user interface (GUI) for creating

analytic workflows, including data manipulations and a series of analytic

events such as staged data-mining techniques (for example, first select the

top 100 customers, and then run descriptive statistics and clustering) on

Postgres SQL and other Big Data sources.

• OpenRefine (formerly called Google Refine) [12] is “a free, open source,

powerful tool for working with messy data.” It is a popular GUI-based tool

for performing data transformations, and it's one of the most robust free

tools currently available.

• Similar to OpenRefine, Data Wrangler [13] is an interactive tool for data

cleaning and transformation. Wrangler was developed at Stanford

University and can be used to perform many transformations on a given

dataset. In addition, data transformation outputs can be put into Java or

Python. The advantage of this feature is that a subset of the data can be

manipulated in Wrangler via its GUI, and then the same operations can be

written out as Java or Python code to be executed against the full, larger

dataset offline in a local analytic sandbox.

For Phase 2, the team needs assistance from IT, DBAs, or whoever controls the

Enterprise Data Warehouse (EDW) for data sources the data science team would

like to use.

Search WWH ::

Custom Search

Home