Database Reference
In-Depth Information
Customer
Demographics
CustomerKey
BusinessType
OwnershipType
TotalEmployees
PermanentEmployees
AnnualRevenue
AnnualProfit
YearEstablished
StoreSurface
ParkingSurface
DateFirstOrder
DateLastOrder
Customer
NewCustomers
CustomerKey
CustomerID
CompanyName
Address
PostalCode
CityKey
NewCustomerKey
CompanyName
StreetAddress
City
State
PostalCode
Country
BusinessType
OwnershipType
TotalEmployees
PermanentEmployees
AnnualRevenue
AnnualProfit
YearEstablished
StoreSurface
ParkingSurface
Fig. 9.1 Tables CustomerDemographics and NewCustomers added to the Northwind
data warehouse given in Fig. 5.4
Attributes AnnualRevenue and AnnualProfit have the following values:
￿ 1: Under $10,000
￿ 2: $10,000-50,000
￿ 3: $50,000-100,000
￿ 4: $100,000-500,000
￿ 5: $500,000-1,000,000
￿ 6: Over $1,000,000
Attribute YearEstablished has a range of 1950-1997. Attributes StoreSurface
and ParkingSurface are expressed in square meters. Finally, attributes Date-
FirstOrder and DateLastOrder are derived from the Sales fact table.
9.1.2 Supervised Classification
Supervised classification is the process that allocates a set of objects in
a database to different predefined classes according to a model built on the
attributes of these objects. For this, a database DB is split into a training
set E and a test set T . The tuples of DB and T havethesameformat,
while the tuples in E have an additional field, which is the class identity ,
which stores the class of each tuple in the training set. These classes are used
to generate a model to be used for classifying new data. Once the model
is built using the training set (with labeled records), the correctness of the
classification is evaluated using the test set (with unlabeled records). Typical
uses of classification are credit approval (risk classification), marketing, health
planning, and so on.
 
Search WWH ::




Custom Search