Database Reference
In-Depth Information
databases. Gremlin can be invoked from a Java Virtual Machine via its implementation in ac-
cordance with JSR 223.
We look at only a couple of graph databases here, but if you're a regular Hadoop user you
might also check out the Hama project, which is in Incubator status as of this writing. Hama
is a package on top of Hadoop that adds support for massive matrix and graph data. See
http://incubator.apache.org/hama . There is also a Google project called Pregel, which they've
been using internally for a couple of years and which they might open source. You can read
Google's announcement on Pregel at http://googleresearch.blogspot.com/2009/06/large-scale-
graph-computing-at-google.html .
FlockDB
In April 2010, Twitter announced that they were open-sourcing to GitHub their new graph data-
base called FlockDB. They created FlockDB to store the adjacency lists for followers on Twitter,
so they could readily understand who follows whom and who blocks whom. It scales horizont-
ally and is designed for online, low-latency, high-throughput environments. The Twitter Flock-
DB cluster stores 13+ billion edges and sustains peak traffic of 20,000 writes per second and
100,000 reads per second.
Website : http://github.com/twitter/lockdb
Orientation : Graph
Created : Created in 2010 by Twitter
Implementation language : Scala
License : Apache License v2
Distributed : Yes
Schema : The schema is very straightforward, as FlockDB does not attempt to solve every
database problem, but only those relating to the set of problems Twitter faces with their rela-
tionship graphs and the size of their dataset. The graph contains entries with four attributes:
a source ID, a destination ID, a position, and a state.
Client : FlockDB uses the Thrift 0.2 client, and Twitter has also written a Ruby frontend that
offers a richer interface.
Replication : Yes
Storage : MySQL
Production use : Twitter
Search WWH ::




Custom Search