Database Reference
In-Depth Information
By The WAy The NoSQL world is an exciting one, but you should be aware that, if you
want to participate in it, you will need to sharpen your OOP programming
skills. Whereas we can develop databases in Microsoft Access, Microsoft SQL Server,
Oracle Database, and Oracle MySQL using management and applications develop-
ment tools that are very user friendly (Microsoft Access itself, Microsoft SQL Server
Management Studio, Oracle SQL Developer, and MySQL Workbench), application
development in the NoSQL world is currently done in programming languages.
This, of course, may change, and we look forward to seeing the future develop-
ments in the NoSQL realm. For now, you'll need to sign up for that programming course!
Summary
Big Data is the name given to the current situation where
companies are gathering extremely large amounts of data,
and need some means to deal with it. The topics covered
in this chapter survey the development of tools to handle
increasing amounts of data.
Business intelligence (BI) systems assist managers and
other professionals in the analysis of current and past activi-
ties and in the prediction of future events. BI applications are
of two major types: reporting applications and data mining
applications. Reporting applications make elementary calcu-
lations on data; data mining applications use sophisticated
mathematical and statistical techniques.
BI applications obtain data from three sources: oper-
ational databases, extracts of operational databases, and
purchased data. BI systems sometimes have their own
DBMS, which may or may not be the operational DBMS.
Characteristics of reporting and data mining applications
are listed in Figure 12-3.
Direct reading of operational databases is not feasible
for all but the smallest and simplest BI applications and
databases for several reasons. Querying operational data can
unacceptably slow the performance of operational systems,
operational data have problems that limit their usefulness
for BI applications, and BI system creation and maintenance
require programs, facilities, and expertise that are normally
not available for an operational database.
Problems with operational data are listed in Figure 12-5.
Because of these, many organizations have chosen to create
and staff data warehouses and data marts. Data warehouses
extract and clean operational data and store the revised data
in data warehouse databases. Organizations may also pur-
chase and manage data obtained from data vendors. Data
warehouses maintain metadata that describes the source,
format, assumptions, and constraints about the data they
contain. A data mart is a collection of data that is smaller
than that held in a data warehouse and that addresses a
particular component or functional area of the business. In
Figure 12-7, the data warehouse distributes data to three
smaller data marts. Each data mart services the needs of a
different aspect of the business.
Operational databases and dimensional databases
have different characteristics, as shown in Figure 12-8.
Dimensional databases use a star schema with a fully nor-
malized fact table that connects to dimension tables that
may be non-normalized. Dimensional databases must deal
with slowly changing dimensions, and therefore a time
dimension is important in a dimensional database. Fact
tables hold measures of interest, and dimension tables hold
attribute values used in queries. The star schema can be
extended with additional fact tables, dimension tables, and
conformed dimensions.
The purpose of a reporting system is to create meaning-
ful information from disparate data sources and to deliver that
information to the proper users on a timely basis. Reports are
produced by sorting, filtering, grouping, and making simple
calculations on the data. RFM analysis is a typical reporting
application. Customers are grouped and classified according
to how recently they have placed an order (R), how frequently
they order (F), and how much money (M) they spend on
orders. The result of an RFM analysis is three scores. In a typi-
cal analysis, the scores range from 1 to 5. An RFM score of {1
1 4} indicates that the customer has purchased recently, pur-
chases frequently, but does not purchase large-dollar items.
OnLine Analytical Processing (OLAP) is a generic
category of reporting applications that enable users to
dynamically restructure reports. A measure is the data item
of interest. A dimension is a characteristic of a measure. An
OLAP cube is an arrangement of measures and dimensions.
With OLAP, users can drill down and exchange the order of
dimensions. Because of the high processing requirements,
some organizations designate separate computers to func-
tion as OLAP servers.
Data mining is the application of mathematical and sta-
tistical techniques to find patterns and relationships and to
classify and predict. Data mining has arisen in recent years
because of the confluence of factors shown in Figure 12-25.
 
 
 
Search WWH ::




Custom Search