Database Reference
In-Depth Information
Working with Bad or Missing Records
Dealing with real data means that you will inevitably come across missing values.
Pandas makes it easy to ignore, drop, or even fill in values that are missing. The
isnull DataFrame method returns a Boolean when a cell contains no value. The
fillna method enables you to replace missing values with some default. There are
even interpolation methods that provide the ability to backfill missing data. List-
ing 12.6 provides examples.
Listing 12.6 Pandas: Examples of working with broken data
> from pandas import DataFrame, isnull
> from numpy import random
# Generate some random data; in this case, scores
> scores = DataFrame(random.random_integers(1,9,size=(5,2)),
columns=['Score 1','Score 2'])
Score 1 Score 2
0 2 6
1 6 3
2 4 3
# Add a new column, setting all values to None (Null)
> scores['Score 3'] = None
# Change some Null cells to a new value
> scores['Score 3'][2] = 17
> scores['Score 3'][2] = 13
Score 1 Score 2 Score 3
0 2 6 None
1 6 3 None
2 4 3 17
# Which cells are Null?
> isnull(scores)
Score 1 Score 2 Score 3
0 False False True
1 False False True
2 False False False
# Find the mean; include or exclude columns
# that contain Null values
> scores.mean(skipna=False)
 
Search WWH ::




Custom Search