Database Reference
In-Depth Information
plt.bar(pos, y_axis, width, color='lightblue')
plt.xticks(rotation=30)
fig = matplotlib.pyplot.gcf()
fig.set_size_inches(16, 10)
The image you have generated should look like the one here. It appears that the most pre-
valent occupations are student , other , educator , administrator , engineer , and pro-
grammer .
Distribution of user occupations
Spark provides a convenience method on RDDs called countByValue ; this method
counts the occurrences of each unique value in the RDD and returns it to the driver as a
Python dict method (or a Scala or Java Map method). We can create the
count_by_occupation variable using this method:
Search WWH ::




Custom Search