Building Analytics Workf lows Using Python and Pandas - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

Score 1 6.2

Score 2 5.8

> scores.mean(skipna=True)

Score 1 6.2

Score 2 5.8

Score 3 15.0

# Fill in missing data explicitly

> scores['Score 3'].fillna(0)

0 0

1 0

2 17

3 0

4 13

Taken together, NumPy, SciPy, and Pandas can accomplish a lot of tasks that would

normally be in the realm of what can be done in R. The underlying data structures

are very performant, the syntax of these tools is very “Pythonic,” and the code pro-

duced can be very readable.

For small scripting tasks using common computational algorithms, there's really not

much of a difference between using R and Python. In general, R can be very useful

for exploratory statistical tasks, and there's no question that there are far more ready-

to-use modules already written in R for a large variety of tasks, especially uncommon

computational domains.

In terms of scientific computing, Python excels when an application needs to grow

into anything more than a simple exploratory or interactive script. As soon as an

interactive test starts to become something requiring robust application development,

it's hard to argue against using Python. Many statisticians and mathematicians have

knowledge of R, but a programmer on your staff might be more familiar with Python

than with a more domain-specific language.

Tool Chain

It wouldn't be right to talk about Python's data-centric modules without mention-

ing iPython , a popular interactive Python shell that, over time, has blossomed into a

full-f ledged scientific-computing environment. After winning the 2012 Free Software

Foundation award for his work, iPython's creator Dr. Fernando Perez stated that the

project started as “sort of a hybrid of an interactive Python console and a Unix shell,

but it has grown into a set of components for scientific computing from interactive

Search WWH ::

Custom Search

Home