Database Reference
In-Depth Information
Indexes can only use fields in the order they were created.
Say, for example,
we create the index
{
"timestamp"
:1,
"retweet_count"
:1,
"keywords"
:1}
.
This query is valid for queries structured in the following order:
-
timestamp, retweet_count, keywords
-
timestamp
-
timestamp, retweet_count
This query is
not
valid for queries structured in the following order:
- retweet_count, timestamp, keywords
-keywords
-
timestamp, keywords
Indexes can contain, at most, one array.
Twitter provides Tweet metadata in
the form of arrays, but we can only use one in any given index.
3.8
Extracting Documents: Retrieving All Documents
in a Collection
The simplest query we can provide to MongoDB is to return all of the data in a
collection. We use MongoDB's
find
function to do this, an example of which is
shown in Listing
3.3
.
3.9
Filtering Documents: Number of Tweets Generated
in a Certain Hour
Suppose we want to know the number of Tweets in our dataset from a particular
hour. To do this we will have to filter our data by the
timestamp
field with
“operators”: special values that act as functions in retrieving data.
Listing
3.4
shows how we can drill down to extract data only from this hour.
We use the
$gt
(“greater than”), and
$lte
(“less than or equal to”) operators to
pull dates from this time range. Notice that there is no explicit “AND” or “OR”
operator specified. MongoDB treats all co-occurring key/value pairs as “AND”s
unless explicitly specified by the
$or
operator.
5
Finally, the result of this query
is passed to the
count
function, which returns the number of documents returned
by the
find
function.
5
For more operators, see
http://docs.mongodb.org/manual/reference/operator/.