Denormalizing Data for Maximum Performance - Learning Apache Cassandra

Database Reference

In-Depth Information

Chapter 6. Denormalizing Data for

Maximum Performance

In the previous chapter, we created a structure that allows a user to follow other users. The

goal of the follow system was to allow users to see all of their followed users' status up-

dates in one place that we'll call the "home timeline". In this chapter, we will build a table

to store users' home timelines.

The follow structures in Chapter 5 , Establishing Relationships introduced the concept of

denormalization, the practice of storing the same piece of data in more than one place in or-

der to optimize read performance. The denormalization we used for follows was fairly

mild, however, each follow relationship is stored in exactly two places. For home timelines,

we will create a much more aggressively denormalized data structure: a given piece of data

will be stored in an arbitrary number of places.

While this highly denormalized structure will be the end result of our work in this chapter,

we'll explore several approaches along the way, starting with a fully normalized data struc-

ture, proceeding through a partially denormalized approach, and finally settling on a fully

denormalized design. We'll discuss the advantages and disadvantages of each, both in terms

of implementation complexity and runtime performance. By the end of this chapter, you'll

learn:

• How to retrieve data from more than one specific partition

• How to order and paginate in multipartition queries

• How to apply different denormalization strategies to a problem

• How to group multiple write statements into a single operation using logged

batches

Search WWH ::

Custom Search

Home