Database Reference
In-Depth Information
"t" : 1000,
"i" : 0
},
"ns" : "cloud-docs.spreadsheets",
"min" : {
"username" : { $minKey : 1 },
"_id" : { $minKey : 1 }
},
"max" : {
"username" : { $maxKey : 1 },
"_id" : { $maxKey : 1 }
},
"shard" : "shard-a"
}
Can you figure out what range this chunk represents? If there's just one chunk, then it
spans the entire sharded collection. That's borne out by the
min
and
max
fields, which
show that the chunk's range is bounded by
$minKey
and
$maxKey
.
MINKEY AND MAXKEY
$minKey
and
$maxKey
are used in comparison opera-
tions as the boundaries of
BSON
types.
$minKey
always compares lower than
all
BSON
types, and
$maxKey
compares greater than all
BSON
types. Because
the value for any given field can contain any BSON type, MongoDB uses
these two types to mark the chunk endpoints at the extremities of the
sharded collection.
You can see a more interesting chunk range by adding more data to the
spreadsheets
collection. You'll use the Ruby script again, but this time you'll run 100 iterations,
which will insert an extra 20,000 documents totaling 100
MB
:
$ ruby load.rb 100
Verify that the insert worked:
> db.spreadsheets.count()
20200
> db.spreadsheets.stats().size
103171828
Sample insert speed
Note that it may take several minutes to insert this data into the shard cluster. There
are three reasons for the slowness. First, you're performing a round trip for each
insert, whereas you might be able to perform bulk inserts in a production situation.
Second, you're inserting using Ruby, whose BSON serializer is slower than that of cer-
tain other drivers. Finally, and most significantly, you're running all of the shard's
nodes on a single machine. This places a huge burden on the disk, as four of your
nodes are being written to simultaneously (two replica set primaries, and two repli-
cating secondaries). Suffice it to say that in a proper production installation, this
insert would run much more quickly.