Database Reference
In-Depth Information
not for a single user on a desktop, but for anyone who had access to an Internet con-
nection. Much of the growth of open-source database MySQL was due to the avail-
ability of easy-to-use integrations with Web-friendly scripting languages such as Perl
and PHP.
Codd's concept of database structure imposes an upfront understanding of the data.
Schemas and relationships must be defined before making a single insert. The rela-
tional model also requires a bit of work on the part of the software itself. Consider the
process of an application writing a record to a relational database; data might be repre-
sented using more than one table. The database itself must take care to ensure that data
is consistent after the write has occurred, which takes a bit of computational overhead.
However, as the user base of a Web site grows, so does the need to handle the scale
of data. Some of the early Web pioneers such as Amazon and Google found that rela-
tional databases were not always the right tool for the job. The priorities of existing
relational database systems were geared more toward consistency than availability.
Consider an online messaging system in which users post and share comments pub-
licly with other users. A relational database design architecture might define a table
to keep track of individual users, with each user being assigned a unique identifier. In
order to facilitate message sharing, we would also require a table relating each posted
message to information about the target recipient. Although heavily simplified, this
type of system is not unlike the many comment and blog systems currently used on
the Web.
Now imagine the Web site has gone viral and that millions of users access this
online system at all times. How can we handle the scale? With computing prices
dropping every day, servers and hard disks are available to handle quite a lot of trans-
actional processing. At some point, a single machine might not be able to handle the
load of many thousands of queries every second. Furthermore, Web traffic, log data,
and other factors may mean that, over time, it might not be possible to continually
upgrade a single server. The need for higher capacity and plenty of data throughput
requires other strategies.
Although commodity computer hardware tends to become cheaper over time, con-
tinually upgrading to more massive server hardware has been historically economically
infeasible. Spending twice as much money on a huge, single machine may not provide
double the performance. In contrast, smaller, more modest servers remain inexpensive.
In general, it makes more economic sense to scale horizontally: in other words, to
simply add more cheap machines to the system rather than try to put a single relational
database on one expensive, massive server.
In order to guarantee performance of this Web application, one might consider
splitting the relational tables across a collection of machines. Perhaps each table could
reside on a different machine; it might be possible to split, or shard , individual tables
of a relational database system off to a dedicated server. At some point, the table with
the most data might again become too large to host on a single machine. Situations
like this create bottlenecks in our system. When faced with the onslaught of Web-
scale data, the popular relational database model begins to create very challenging
 
Search WWH ::




Custom Search