Database Reference
In-Depth Information
data from multiple sources. The following subsections cover the available
tools and when you should consider using each.
Excel
Excel is one of the most popular data analysis tools on the market today.
Most data analysts are Excel power users and are quite comfortable slicing
and dicing data using this tool. This excellent tool proves especially useful
for users who want to explore data on their own. They can connect to
various data sources, create pivot tables and charts, filter and slice the data,
and perform what-if analysis. Excel contains an extensive set of built-in
functions to analyze the data, including financial, statistical, and
engineering functions. Excel also contains features to help clean the data,
such as removing duplicates, consolidating data, and validating data. Figure
11.1 shows some of the menu items on the data tab that help you clean,
validate, and shape the data.
Figure 11.1 Data cleansing features in Excel.
PowerPivot
When the amount of data you need to analyze pushes the limits (about
1 million rows for Excel 2013) of Excel's capabilities, consider moving to
PowerPivot. PowerPivot is a free add-in to Excel. It is based on a columnar
database structure and data compression that allows it to support millions
of rows. It builds on top of Excel's feature set and enables users to create
pivot tables and charts and supports slicing and filtering and time-based
analysis.
Instead of basing pivot tables and charts on data contained in a data sheet,
PowerPivot is based on a data model that contains tables and relationships
between the tables. This proves extremely beneficial when the user needs
to combine data from various sources into a single mode for the analysis.
For example, it is easy to combine data from a relational database with data
contained in a Hadoop file structure.
 
 
Search WWH ::




Custom Search