Databases Reference
In-Depth Information
ObjectIds
ObjectId is the default type for "_id" . It is designed to be lightweight, while still being
easy to generate in a globally unique way across disparate machines. This is the main
reason why MongoDB uses ObjectId s as opposed to something more traditional, like
an autoincrementing primary key: it is difficult and time-consuming to synchronize
autoincrementing primary keys across multiple servers. Because MongoDB was de-
signed from the beginning to be a distributed database, dealing with many nodes is an
important consideration. The ObjectId type, as we'll see, is easy to generate in a sharded
environment.
ObjectId s use 12 bytes of storage, which gives them a string representation that is 24
hexadecimal digits: 2 digits for each byte. This causes them to appear larger than they
are, which makes some people nervous. It's important to note that even though an
ObjectId is often represented as a giant hexadecimal string, the string is actually twice
as long as the data being stored.
If you create multiple new ObjectId s in rapid succession, you can see that only the last
few digits change each time. In addition, a couple of digits in the middle of the
ObjectId will change (if you space the creations out by a couple of seconds). This is
because of the manner in which ObjectId s are created. The 12 bytes of an ObjectId are
generated as follows:
0 1 2 3 4 5 6 7 8 9 10 11
Timestamp Machine PID Increment
The first four bytes of an ObjectId are a timestamp in seconds since the epoch. This
provides a couple of useful properties:
• The timestamp, when combined with the next five bytes (which will be described
in a moment), provides uniqueness at the granularity of a second.
• Because the timestamp comes first, it means that ObjectId s will sort in roughly
insertion order. This is not a strong guarantee but does have some nice properties,
such as making ObjectId s efficient to index.
• In these four bytes exists an implicit timestamp of when each document was cre-
ated. Most drivers expose a method for extracting this information from an
ObjectId .
Because the current time is used in ObjectId s, some users worry that their servers will
need to have synchronized clocks. This is not necessary because the actual value of the
timestamp doesn't matter, only that it is often new (once per second) and increasing.
The next three bytes of an ObjectId are a unique identifier of the machine on which it
was generated. This is usually a hash of the machine's hostname. By including these
bytes, we guarantee that different machines will not generate colliding ObjectId s.
 
Search WWH ::




Custom Search