Databases Reference
In-Depth Information
ObjectIds
ObjectId
is the default type for
"_id"
. It is designed to be lightweight, while still being
easy to generate in a globally unique way across disparate machines. This is the main
reason why MongoDB uses
ObjectId
s as opposed to something more traditional, like
an autoincrementing primary key: it is difficult and time-consuming to synchronize
autoincrementing primary keys across multiple servers. Because MongoDB was de-
signed from the beginning to be a distributed database, dealing with many nodes is an
important consideration. The
ObjectId
type, as we'll see, is easy to generate in a sharded
environment.
ObjectId
s use 12 bytes of storage, which gives them a string representation that is 24
hexadecimal digits: 2 digits for each byte. This causes them to appear larger than they
are, which makes some people nervous. It's important to note that even though an
ObjectId
is often represented as a giant hexadecimal string, the string is actually twice
as long as the data being stored.
If you create multiple new
ObjectId
s in rapid succession, you can see that only the last
few digits change each time. In addition, a couple of digits in the middle of the
ObjectId
will change (if you space the creations out by a couple of seconds). This is
because of the manner in which
ObjectId
s are created. The 12 bytes of an
ObjectId
are
generated as follows:
0 1 2 3 4 5 6 7 8 9 10 11
Timestamp Machine PID Increment
The first four bytes of an
ObjectId
are a timestamp in seconds since the epoch. This
provides a couple of useful properties:
• The timestamp, when combined with the next five bytes (which will be described
in a moment), provides uniqueness at the granularity of a second.
• Because the timestamp comes first, it means that
ObjectId
s will sort in
roughly
insertion order. This is not a strong guarantee but does have some nice properties,
such as making
ObjectId
s efficient to index.
• In these four bytes exists an implicit timestamp of when each document was cre-
ated. Most drivers expose a method for extracting this information from an
ObjectId
.
Because the current time is used in
ObjectId
s, some users worry that their servers will
need to have synchronized clocks. This is not necessary because the actual value of the
timestamp doesn't matter, only that it is often new (once per second) and increasing.
The next three bytes of an
ObjectId
are a unique identifier of the machine on which it
was generated. This is usually a hash of the machine's hostname. By including these
bytes, we guarantee that different machines will not generate colliding
ObjectId
s.