Advanced Patterns for Data Modeling - HBase Design Patterns

Database Reference

In-Depth Information

Now that you are back, let's take a look at our proposed solution:

You can now see what we have done. We have denormalized the data and stored

each element twice. The first table lists all the courses for a given student in one row.

The second table lists all the students for a given course, also in one row. Thus, the

same student-course relationship is stored twice in our schema. It is true that you

will have to provide double the amount of storage volume. However, hard drive

space is cheap. Also, it is true that your code logic will have to maintain two tables.

However, what you get in return is a no-limit scalability, and this is the name of the

game nowadays.

Let's take a look at some of the technical aspects of this design:

• The student-course relationship is represented as columns:

Remember, column names can be dynamic. This means that you can store

as many courses as you want and can call these columns anything you

like. Thus, you can get all the courses for a student in one read, without an

expensive join, which would otherwise become a bottleneck.

Here's a question for you: how do you write a query to find out whether a

student is taking a course? Again, please think of an answer first.

Here's our query that can do this for you:

Examine the value of Students [student_id] [courses][course_id] .

As an exercise, create such a table in the HBase shell and store/retrieve

some values. Check the topic's repository on GitHub for our solution.

• The student-course relationship is represented in both the tables:

Note that we have denormalized both the tables. As you can see, it is a

common approach; if I need a relationship and HBase does not provide it

out of the box, I will just create another table to model this relationship.

Search WWH ::

Custom Search

Home