Database Reference
In-Depth Information
eBay's Transactional Data Platform
eBay operates at an enormous scale. Every day, our data centers perform hundreds
of billions of reads and writes on petabytes of data, and it's growing explosively.
Simultaneously, there is an increasing demand to process data at blistering speeds.
Scalability and availability are not afterthoughts at eBay; they're a primary re-
quirement for all our systems. Our transactional data platform is a mixture of mul-
tiple SQL and NoSQL databases deployed on thousands of servers across multiple
data centers. We've realized that one database really cannot solve various chal-
lenges we face at eBay and that has led us to polyglot persistence. Our transaction-
al database platform is built on Oracle, MySQL, Cassandra, MongoDB, and XMP.
We also use Hadoop/HBase for deep analytics, and our new search infrastructure,
named Cassini, is built on top of it.
Why Cassandra?
There are many use cases that don't fit well in relational database systems. These
include sparse data sets, data sets that require flexible schemas, or a data set that
is incredibly large and requires time-series storage. Cassandra's sparse, flexible,
and sorted data model has enabled us to efficiently design systems requiring stor-
age of various kinds of semistructured data. Cassandra gives us always availab-
ility for both reads as well as writes because of its peer-to-peer (as opposed to
master/slave) architecture. Its linear scalability with built-in sharding mechanisms
based on consistent hashing makes data distribution painless. Anyone who has
done manual sharding is aware of the pains of manually balancing the shards!
Also, linear scalability on commodity hardware makes Cassandra a good fit for
eBay's cloud environment where capacity requirements remain fluid.
Our databases are deployed in multiple data centers, and we always need to
be ready for disaster recovery. We like Cassandra's multi-data-center support,
which is baked into its architecture from the get-go. Unlike many other NoSQL
databases, Cassandra gives us active-active data centers instead of active-passive.
In addition to always being available, active-active data centers give us 100%
local low-latency requests from the application servers to the database servers, as
our application servers now never have to cross data centers. We leverage Cas-
sandra's great write performance, distributed counters, and Hadoop to do real-time
and near-real-time analytics. Cassandra's log-structured merge-tree-based internal
storage enables amazing write performance. But this optimized storage engine for
write workload causes compaction overhead, which can impact read performance.
However, the Cassandra development team has made many optimizations to en-
Search WWH ::




Custom Search