Building Analytics Workf lows Using Python and Pandas - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

# OAuth credentials to access the Twitter Streaming API

# Visit dev.twitter.com/apps to register an app

auth = OAuth(

consumer_key='[your app consumer key]',

consumer_secret='[your app consumer secret]',

token='[your app access token]',

token_secret='[your app access token secret]'

)

# Create an NLTK Tokenizer

tokenizer = RegexpTokenizer('#[a-zA-Z]\w+')

hashtags = {}

tweet_count = 0

# Connect to the Twitter Streaming API

twitter_stream = TwitterStream(auth=auth)

iterator = twitter_stream.statuses.sample()

for tweet in iterator:

text = tweet.get('text')

if text:

words = tokenizer.tokenize(text)

if len(words) > 0:

for word in words:

hashtags[word] = hashtags.get(word, 0) + 1

tweet_count += 1

if tweet_count % 100 == 0:

print "Looked at %d tweets..." % tweet_count

if tweet_count > 1000:

break

# Print out a summary of Tweet statistics using Pandas

s = Series(hashtags)

print 'Top Hashtags in this dataset:'

print s.order(ascending=False)[0:15]

print 'Hashtag dataset statistics'

print s.describe()

We've combined a collection of different, powerful Python libraries and have built

the beginning of a useful data-analysis application in just a few lines of code.

Search WWH ::

Custom Search

Home