Database Reference
In-Depth Information
bindings for many types of database software, makes it possible to develop your entire
data processing and analysis software using Python.
As a programming language, Python has always had a reputation for simplicity and
readability. Python's syntactical design results in programming concepts that can often
be expressed in very few lines of code. Python also supports a culture of having fun
and has a developer community that is active and organized.
Tools and Testing
Popular general-purpose languages such as Python can provide another big advantage
over languages with a smaller user base. General-purpose languages often have the best
tools for robust application development and testing.
This is the same reason why one might choose to build data workf lows using Java
and the Cascading framework rather than a special-purpose language such as Apache
Pig (see Chapter 9, “Building Data Transformation Workf lows with Pig and Cascad-
ing”). By using a general-purpose language like Python, you automatically get all the
benefits, including debugging, testing, and functionality. of these popular tools. Most
importantly, you also gain access to a large number of programmers who are used to
working with these languages in the first place.
Python Libraries for Data Processing
Python is often described as a well-designed language, with many core principles that
help developers build powerful applications using very little code. Python is easy to
learn and read, because it has a very limited core syntax, and statements use common
English words instead of symbols when possible. One thing that Python is not opti-
mized for is speed. As an interpreted language, Python's ease of use is emphasized over
raw performance. For many scientific applications with lots of data processing needs,
speed is essential; how can Python be effective in this domain? One way is to write
new Python modules that are optimized for computational speed, and the most popu-
lar library for this purpose is NumPy.
NumPy
The fundamental data structure used in Python is the list, or what well-known
Python author Mark Pilgrim calls Python's workhorse data type . 3 Python lists are objects
that provide a large number of operations on sequential data, including index-based
retrieval, splicing, and the ability to iterate over elements using loops or list com-
prehensions. Lists are excellent for a large number of use cases, but for scientific data
analysis, it's often more important to run an operation on all fields at once.
NumPy is the fundamental Python module for scientific and statistical computing.
NumPy provides two very important extensions to Python's core data types. First of
3. www.diveinto.org/python3/native-datatypes.html
 
 
Search WWH ::




Custom Search