Database Reference
In-Depth Information
Chapter 3
Storing Twitter Data
In the previous chapter, we covered data collection methodologies. Using these
methods, one can quickly amass a large volume of Tweets, Tweeters, and network
information. Managing even a moderately-sized dataset is cumbersome when
storing data in a text-based archive, and this solution will not give the performance
needed for a real-time application. In this chapter we present some common storage
methodologies for Twitter data using NoSQL.
3.1
NoSQL Through the Lens of MongoDB
Keeping track of every purchase, click, and “like” has caused the data needs of many
companies to double every 14 months. There has been an explosion in the size of
data generated on social media. This data explosion calls for a new data storage
paradigm. At the forefront of this movement is NoSQL [ 3 ], which promises to store
big data in a more accessible way than the traditional, relational model.
There are several NoSQL implementations. In this topic, we choose MongoDB 1
as an example NoSQL implementation. We choose it for its adherence to the
following principles:
￿
Document-Oriented Storage. MongoDB stores its data in JSON-style objects.
This makes it very easy to store raw documents from Twitter's APIs.
￿
Index Support. MongoDB allows for indexes on any field, which makes it easy
to create indexes optimized for your application.
￿
Straightforward Queries. MongoDB's queries, while syntactically much dif-
ferent from SQL, are semantically very similar. In addition, MongoDB supports
MapReduce, which allows for easy lookups in the data.
Search WWH ::




Custom Search