Database Reference
In-Depth Information
Write complexity and data integrity
The amount of work we need to do to write data in the fully normalized strategy is basic-
ally equal to what we needed to do with a partially normalized layout. Our storage needs to
increase by a bit, now we're storing one full copy of each status update for every follower
the author has. However, storage is cheap, and writing data in Cassandra is cheap, so we've
managed to make our timeline read pattern far more efficient at low cost.
One concern in any sort of denormalized scenario is data integrity. At the Cassandra level,
the only thing stopping us from adding a status update to the user_status_updates
table is forgetting to add copies as appropriate to the home_status_updates table, or
vice versa. Even worse, if a user deletes a status update and we don't properly remove cop-
ies from all the home_status_updates table, the user's followers might see status up-
dates that they aren't supposed to.
For the most part, the responsibility for maintaining data integrity falls on the application,
and there's no magic formula: just well-factored data access logic and lots of tests.
However, there is one scenario that is outside the application's control: what if part of the
write operation succeeds but another part fails?
In our example, it's possible that, although writing dave 's status update to the
user_status_updates table works fine, the write to the home_status_updates
table fails because the required nodes are unavailable. By the time the application knows
something is wrong, it has already written data to user_status_updates , leaving the
data in an invalid state.
One approach would be for the application to delete the row from
user_status_updates if it encounters an error on a subsequent write. However, this
sort of manual rollback is error-prone and burdensome, particularly as write operations be-
come more complex.
Happily, Cassandra gives us a cleaner way to ensure that write failures don't lead to a
breakdown of data integrity. Multiple write statements can be sent in a single batch; Cas-
sandra will guarantee that, if any statement in a batch succeeds, all of it will. So, we can
perform all the write operations to store dave 's status update without worry:
BEGIN BATCH
INSERT INTO "user_status_updates"
Search WWH ::




Custom Search