Database Reference
In-Depth Information
Identify Your Queries Ahead of Time
While it's very easy to insert a lot of data into Cassandra, it can be somewhat
harder to get that data back out. Traditional relational DBs allow ad hoc queries to
be performed on tables. If you ask the question, the DB will give you the answer,
although the answer may take more time if you've not defined the indexes ahead
of time. In contrast, Cassandra doesn't really allow ad hoc queries; you need to be
very careful to identify what queries you will need to do ahead of time and define
the appropriate indexes. Unlike SQL, Cassandra doesn't allow queries on nonin-
dexed columns, so if this is something you plan to do, you might have to think
about having a separate data warehousing solution that provides such an interface.
This was a major source of frustration for our data analysts and operations
teams who traditionally ran ad hoc SQL queries to gather data on the business.
With a MySQL setup you can just point someone to a read-only instance and tell
them to grab whatever data they need. With Cassandra this isn't really possible.
Instead, we've found that the easiest thing to do is provide some mechanism to
export the raw data as CSV/XML/JSON or similar format so that consumers can
import it into their favorite tool.
Cassandra Doesn't Do Transactions
What this means is that with no rollback provision, in the event of failure you can-
not be sure what state your data has been left in, so your systems need to take a
different approach to failure recovery. Design your systems to retry failed opera-
tions and make those operations idempotent.
Idempotence is particularly important for Hailo. We use NSQ for delivering
our statistics where the model for reliable delivery is based on the producer send-
ing the same message to multiple brokers and then de-duping on the consumer
side. Idempotent operations enable us to bypass the need for explicitly de-duping,
which can be complicated when running a truly distributed and stateless service-
oriented architecture.
Know Your Cluster
For 99% of the time, Cassandra just works and needs little to no intervention on
the part of the developer. For those times that you do need to tinker or monitor,
there is a great tool from DataStax called OpsCenter that graphs the performance
of various aspects of your cluster to give you much greater insight into what's ac-
tually happening and could just help you identify that problem you're having.
Search WWH ::




Custom Search