Databases Reference
In-Depth Information
Figure A-2. Indexing reifies sets of entities in a document store
Where data hasn't been indexed, queries are typically much slower, because a full search
of the dataset has to happen. This is obviously an expensive task and is to be avoided
wherever possible—and as we shall see, rather than process these queries internally, it's
normal for document database users to externalize this kind of processing in parallel
compute frameworks.
Because the data model of a document store is one of disconnected entities, document
stores tend to have interesting and useful operational characteristics. They should scale
horizontally, due to there being no contended state between mutually independent re‐
cords at write time, and no need to transact across replicas.
Sharding
Most document databases (e.g., MongoDB, RavenDB) require users to plan for shard‐
ing of data across logical instances to support scaling horizontally. Scaling out thus
becomes an explicit aspect of development and operations. (Key-value and column
family databases, in contrast, tend not to require this planning, because they allocate
data to replicas as a normal part of their internal implementation.) This is sometimes
puzzlingly cited as a positive reason for choosing document stores, most likely because
it induces a (misplaced) excitement that scale is something to be embraced and lauded,
rather than something to be skillfully and diligently mastered.
For writes, document databases tend to provide transactionality limited to the level of
an individual record. That is, a document database will ensure that writes to a single
document are atomically persisted—assuming the administrator has opted for safe
 
Search WWH ::




Custom Search