Database Reference
In-Depth Information
# OAuth credentials to access the Twitter Streaming API
# Visit dev.twitter.com/apps to register an app
auth = OAuth(
consumer_key='[your app consumer key]',
consumer_secret='[your app consumer secret]',
token='[your app access token]',
token_secret='[your app access token secret]'
)
# Create an NLTK Tokenizer
tokenizer = RegexpTokenizer('#[a-zA-Z]\w+')
hashtags = {}
tweet_count = 0
# Connect to the Twitter Streaming API
twitter_stream = TwitterStream(auth=auth)
iterator = twitter_stream.statuses.sample()
for tweet in iterator:
text = tweet.get('text')
if text:
words = tokenizer.tokenize(text)
if len(words) > 0:
for word in words:
hashtags[word] = hashtags.get(word, 0) + 1
tweet_count += 1
if tweet_count % 100 == 0:
print "Looked at %d tweets..." % tweet_count
if tweet_count > 1000:
break
# Print out a summary of Tweet statistics using Pandas
s = Series(hashtags)
print 'Top Hashtags in this dataset:'
print s.order(ascending=False)[0:15]
print 'Hashtag dataset statistics'
print s.describe()
We've combined a collection of different, powerful Python libraries and have built
the beginning of a useful data-analysis application in just a few lines of code.
 
Search WWH ::




Custom Search