Database Reference
In-Depth Information
many common statistical applications such as linear regression, clustering, and hypoth-
esis testing, but there are also plenty of modules available for the long tail of specific
scientific-computing needs. Developers can find R code that addresses such domain
spaces as baseball, astronomy, and even tools to visualize and statistically analyze ani-
mal movement. In short, if there is a statistical need, there is a chance someone has
contributed a useful R package that addresses it—and in many cases, there are several
packages that address the same problem.
R is popular and useful, and features an enormous collection of open-source code
for multiple problem domains. So why try to replace it with something else? The first
reason is that the widespread popularity of Python gives it a much larger community
of experienced developers. Within the field of statistics applications, R is often cited as
the most popular language for asking statistical questions about a dataset. Although it's
difficult to track exactly who is using which programming language, it's almost cer-
tainly true that the Python community has more developers overall than those using
statistical- and scientific-focused languages like R or Julia. 1
The TIOBE Programming Community index is an attempt to use rankings from
popular search engines to rate the popularity of programming languages. Although
TIOBE is probably not the most scientific way to measure programming-language
popularity, in April 2013 Python ranked eighth in the index, whereas R was ranked
twenty-sixth. The RedMonk Programming Language Rankings, which uses data from
popular programming-community Web sites such as GitHub and Stack Overf low,
ranked Python fourth to R's seventeenth in their February 2013 rankings. RedMonk's
Stephen O'Grady went on to say that the “first tier of languages [including Python]
does not appear to be relinquishing its hold on developer time and attention” whereas
“R has seen steady traction but little growth,” 2 which was ascribed to competition
from other platforms.
Extending Existing Code
R is a special-purpose language that puts a large emphasis on the mathematics and
statistics domains. Python, on the other hand, is much more of a general-purpose lan-
guage, meant to handle general programming tasks. To put the scale of Python devel-
opment into perspective, the Python Package Index, or PyPI, contains nearly 10 times
as many libraries as CRAN.
An advantage to using Python for data analysis is that it might be something you
are already using. Python is excellent for general text processing. In fact, if you've
done any kind of scripting work at all, you have probably already used Python for
parsing, extracting, or ordering text. Python's list data type is a core aspect of the lan-
guage, and the many available list methods make it easy to slice and extract sequential
data. Python, of course, has a great deal of other functionality built in as well, includ-
ing frameworks for networking and writing Web applications. All of this, along with
1. http://redmonk.com/sogrady/2012/09/12/language-rankings-9-12/
2. http://redmonk.com/sogrady/2013/02/28/language-rankings-1-13/
 
Search WWH ::




Custom Search