The datastore also uses GFS to store data and log files. GFS is a scalable, fault-
tolerant file system designed for large, distributed, data-intensive applications such
as Gmail and YouTube. Originally developed to store crawling data and search
indexes, GFS is now widely used to store user-generated content for numerous
Bigtable stores data as entities with properties organized by application-defined
kinds such as customers, sales orders, or products. Entities of the same kind are not
required to have the same properties or the same value types for the same properties.
Bigtable queries entities of the same kind and can use filters and sort orders on both
keys and property values. It also pre-indexes all queries, which results in impressive
performance even with very large data sets. The service also supports transactional
updates on single or application-defined groups of entities.
The first thing you'll notice about Bigtable is that it is not a relational database.
Bigtable utilizes a non-relationship object model to store entities, allowing you to
create simple, fast, and scalable applications. Google isn't alone in offering this type
of architecture. Amazon's SimpleDB and many open-source datastores (for example,
CouchDB and Hypertable) use this same approach, which requires no schema while
providing auto-indexing of data and simple APIs for storage and access.
You can interact with Bigtable using either a standard API or a-low level API. With
the standard API, either a Java Data Objects (JDO)) or Java Persistence API (JPA))
implementation, you can ensure that your applications are portable to other hosting
providers and database technologies if you decide to jump ship. This makes a good
argument for App Engine as it prevents vendor lock-in. If you are certain that your
application will always run on App Engine, you can utilize the low-level API as it
exposes the full capabilities of Bigtable. Both APIs achieve roughly the same results in
terms of ability and performance, so it comes down to personal preference. Do you
like working with low-level database functionality or abstracting this layer so that
your experience is applicable across multiple datastore implementations?
The datastore provides full CRUD (create, read, update, and delete) access to
entities in Bigtable and allows you to query against the datastore using a standard
SQL-like query language called JDOQL. The syntax is enough like SQL to lull you into
a sense of familiarity, but there are some differences when dealing with JDO-
enhanced objects. One notable exception is the lack of support for joins, which is
present in relational databases. However, this is understandable since the datastore is
Working with Entities
The fundamental unit of data in the datastore is an “entity,” which consists of an
immutable identifier and zero or more properties. Once again, entities are schema-
less and this allows for some interesting possibilities. Since entities are not required