Database Reference
In-Depth Information
The index table will look like this:
Baghdad | 5 |
-------------------+-----------------------------------------
Cape Town | 1 | 4 |
-------------------+-----------------------------------------
Kabul | 2 | 6 |
-------------------+-----------------------------------------
Mogadishu | 3 |
-------------------+-----------------------------------------
So, indexes are basically wide-rows or dynamic rows with row keys as values in the
column being indexed (remember wide rows from the previous section?). These indexes
are stored locally, which means an index table on node X will have an index created only
for the rows that exist on node X. This means, every search that includes an indexed
column makes a query on the index table on all the nodes, but do not worry about it, it is
fast. The other things that you should keep in mind are as follows:
Avoid high cardinality columns : Cardinality is the number of unique values. In
our user's case, we have 40,000 or fewer cities across the world. If you have 40
million users, you have 1,000 users per city on average. If you have 25 nodes, you
are making 25 reads (one on each node) from the index table, and then 1,000
reads for 1,000 rows. This looks okay; 1,025 reads for 1,000 records is about one
read per record. Cities compared to users have low cardinality, so this helps us to
optimize a read request. What happens when we use something that has high car-
dinality? Let's say we use the user's phone number as the index. Each user has a
unique phone number. So, it is likely that when we search by phone number, we
will get at most one record. The read process makes 25 read requests to the nodes
to look into their local indexes; then if it exists, it will make one read request to
the node that has the row. So, we are basically making 26 read requests read one
row.
We have seen that high cardinality columns are not good for indexes and hence,
we should avoid them. It may be acceptable if it is a very low volume query (ac-
cessed once in a while), but it may be wise to create another table manually that
has a phone number as the row key (primary key) and one of the columns as
user_id from the users table. This way, we will make two read request fetch
one record. The downside of this approach is, it is an extra effort on the developer
side.
Search WWH ::




Custom Search