Database Reference
In-Depth Information
An SQL join statement can be written to create a view
showing products that have appeared together in a transac-
tion. That view can then be processed to compute support,
and the support view can then be processed to compute
confidence and lift.
A distributed database is a database that is stored and
processed on more than one computer. A replicated data-
base is one in which multiple copies of some or all of the
database are stored on different computers. A partitioned
database is one in which different pieces of the database are
stored on different computers. A distributed database can be
replicated and distributed.
Distributed databases pose processing challenges. If a
database is updated on a single computer, then the challenge
is simply to ensure that the copies of the database are logically
consistent when they are distributed. However, if updates
are to be made on more than one computer, the challenges
become significant. If the database is partitioned and not
replicated, then challenges occur if transactions span data on
more than one computer. If the database is replicated and if
updates occur to the replicated portions, then a special lock-
ing algorithm called distributed two-phase locking is required.
Implementing this algorithm can be difficult and expensive.
Objects consist of methods and properties or data values.
All objects of a given class have the same methods, but they
have different property values. Object persistence is the pro-
cess of storing object property values. Relational databases
are difficult to use for object persistence. Some specialized
products called object-oriented DBMSs were developed in
the 1990s but never received commercial acceptance. Oracle
and others have extended the capabilities of their relational
DBMS products to provide support for object persistence.
Such databases are referred to as object-relational databases.
The NoSQL movement (now often read as “not only
SQL”) is built upon the need to meet the Big Data stor-
age needs of companies such as Amazon.com, Google, and
Facebook. The tools used to do this are nonrelational DBMSs
known as structured storage. Early examples were Dynamo
and Bigtable; a more recent popular example is Cassandra.
These products use a non-normalized table structure built
on columns, super columns, and column families tied
together by rowkey values from a keyspace. Data process-
ing of the very large data sets found in Big Data is done by
the MapReduce process, which breaks a data processing
task into many parallel tasks done by many computers in
the cluster and then combines these results to produce
a final result. An emerging product that is supported by
Microsoft and Oracle Corporation is the Hadoop Distributed
File System (HDFS), with its spinoffs HBase, a nonrelational
storage component, and Pig, a query language.
Key Terms
Amazon Web Services (AWS)
Big Data
Bigtable
business intelligence (BI) system
Cassandra
click-stream data
cloud computing
conformed dimension
curse of dimensionality
data mart
data mining application
data warehouse
data warehouse metadata database
date dimension
dimension table
dimensional database
distributed database
distributed two-phase locking
dirty data
drill down
Dynamo
DynamoDB database sevice
EC2 service
enterprise data warehouse (EDW)
architecture
Extract, Transform, and Load (ETL) System
F score
Hadoop Distributed File System (HDFS)
HBase
Host machine
hypervisor
fact table
M score
measure
method
MapReduce
nonintegrated data
NoSQL
Not only SQL
object
object-oriented DBMS (OODBMS)
object-oriented programming (OOP)
 
 
Search WWH ::




Custom Search