Database Reference
In-Depth Information
Now that you are back, let's take a look at our proposed solution:
You can now see what we have done. We have denormalized the data and stored
each element twice. The first table lists all the courses for a given student in one row.
The second table lists all the students for a given course, also in one row. Thus, the
same student-course relationship is stored twice in our schema. It is true that you
will have to provide double the amount of storage volume. However, hard drive
space is cheap. Also, it is true that your code logic will have to maintain two tables.
However, what you get in return is a no-limit scalability, and this is the name of the
game nowadays.
Let's take a look at some of the technical aspects of this design:
• The student-course relationship is represented as columns:
Remember, column names can be dynamic. This means that you can store
as many courses as you want and can call these columns anything you
like. Thus, you can get all the courses for a student in one read, without an
expensive join, which would otherwise become a bottleneck.
Here's a question for you: how do you write a query to find out whether a
student is taking a course? Again, please think of an answer first.
Here's our query that can do this for you:
Examine the value of Students [student_id] [courses][course_id] .
As an exercise, create such a table in the HBase shell and store/retrieve
some values. Check the topic's repository on GitHub for our solution.
• The student-course relationship is represented in both the tables:
Note that we have denormalized both the tables. As you can see, it is a
common approach; if I need a relationship and HBase does not provide it
out of the box, I will just create another table to model this relationship.
 
Search WWH ::




Custom Search