Creating dimensions in OWB Part 2 (Oracle Warehouse Builder 11gR2)

The Product dimension

In the Product dimension, we will create the attributes that describe the products sold by ACME Toys and Gizmos. The principles of the Time dimension apply to this dimension as well. The same four characteristics need to be defined — Levels, Dimension Attributes, Level Attributes, and Hierarchies. The only difference will be that they are product-oriented instead of time/date-oriented.

Let’s begin by looking at the attributes of our products, and then we’ll group by levels and a hierarchy. The first thing we should consider is how each toy or gizmo sold by ACME is represented. As with any retail operation, a Stock Keeping Unit (SKU) is maintained that uniquely identifies each individual type of item sold. This is an individual number assigned by the main office that uniquely identifies each type of product sold by ACME, and there could be tens of thousands of different items. There could be more than one product with the same name, but they won’t have the same SKU. So the SKU, together with the NAME, forms the business identifier we can use for the products. A business identifier contains one or more attributes that have been selected by us to uniquely represent a record to differentiate it from another. These attributes are what we think of when we think about what differentiates an individual product from another. An SKU number all by itself is not very helpful. Therefore, in our Product dimension, we will want to make available more descriptive information about each product such as the description.


Every SKU can be grouped together by brand name — the toy manufacturer who makes the product—and then by the category of product, such as game, doll, action figure, sporting goods, and so on. Each category could be grouped by department in the store. Already, a list of attributes is starting to take shape and a product hierarchy is forming in our minds. For each of those levels in the hierarchy, that is the department, category, and brand, we need to have a business identifier. For that the NAME will be sufficient as there are no departments, categories, or brands that have the same name.

Let’s put this down on paper to formalize it and add some more details.

Product attributes (attribute type)

• ID (Dimension/Level)

• SKU (Level)

• Name (Dimension/Level)

• Description (Dimension/Level)

• List Price (Level)

Product levels

• Department located in

• Category of item

• Brand

• Item

Product hierarchy (highest to lowest)

• Department

• Category

• Brand

• Item

Looking at the product attributes, we see that they have been listed above with the type and that ID, Name, and Description are labeled as dimension attributes. This means they can appear on more than one level. Each level has a name (Item, Brand, Category, and Department) that identifies the level, but what about the names of the individual brands, or the different categories or departments? There has to be a place to store those names and descriptions, and that is the purpose of these dimension attributes. By labeling them as dimension attributes, they appear once for each level in the dimension. They are used to store the individual names and descriptions of the brands, categories, and departments. Likewise, each level will have a unique ID that will act as the surrogate key for that level, as well as one or more attributes defined as the business identifier. In our previous discussion about the Time dimension, we saw how a surrogate key was used as an identifier and how business identifiers were used; that same principle applies here, including the use of a surrogate key which wasn’t used for a time dimension.

As we want the computer to do most of the work for us, let’s use the OWB Dimension Wizard to create our Product dimension now that we’ve determined what will be in it.

Creating the Product dimension with the new Dimension Wizard

OWB provides a wizard that we can use to create a dimension. It is similar to the Time Dimension Wizard we used earlier, but is more generic for applying to other dimensions. As a result, there will be more steps involved in the wizard, just because it has to ask us more because it will not be able to make as many assumptions as it did with the Time dimension. This wizard can be used with any dimension, and therefore things such as attributes, levels, and hierarchies are going to need to be defined explicitly. Right-click on the Dimensions node under our acme_dwh Oracle module, which is under Databases in the Design Center Project Navigator. Choose New Dimension to launch the Create Dimension Wizard. The very first screen we’ll see is the Welcome screen that will describe for us the steps that we will be going through. We can see that it requires more steps than the Time Dimension Wizard:

tmp14-81_thumb[2]

We will have to provide a name for our dimension, and tell it what type of storage to use—relational or multidimensional—just as we did for the Time Dimension Wizard. It will then ask us to define our dimension attributes. We didn’t have to do that for the Time dimension. That wizard had a preset number of attributes it defined for us automatically because it knew it was creating a Time dimension. We then had to define the levels where we simply chose from a preset list of levels for the Time dimension. Here we have to explicitly name the levels. This is where we’ll have to pay close attention to aggregations. We will then choose our level attributes from the dimension attributes.

Then we see in the previous figure that we will have to choose the slowly changing dimension type, which is how we want to handle changes to values in our dimension attributes over time. This is a new concept we haven’t dealt with yet that pertains to dimensional modeling, and we’ll soon briefly discuss just what that involves when we see the choices we’ll be able to make for it. We’ll then get a last chance to review the settings, and then it will create the dimension for us showing us the progress, which is similar to the last two steps of the Time Dimension Wizard.

1. After reviewing the steps, the wizard will go to the next screen where we enter a name for the dimension that we will call Product.

2. We’ll then proceed to step 2, which is where we will select the ROLAP: Relational storage option for relational, as we did for the Time dimension.

3. Proceeding to step 3, we will be able to list the attributes that we want contained in our Product dimension. We see that the wizard was nice enough to create three attributes for us already — an ID, a NAME, and a DESCRIPTION as shown here:

tmp14-82_thumb[2]

Notice that the wizard has already labeled the ID as the Surrogate Identifier and the Name as the Business Identifier, and selected data types for those attributes for us. If we scroll that window to the right, we’ll see that it has chosen sizes for the character attributes also. We can change all of these options at this point, so let’s modify and add to this list to suit our Product dimension.

As we enter the attributes and decide on sizes and types for them, we can look back at topic 2 where we defined our source data structure for the ACME_POS transactional database in SQL Server to get an idea of what types and sizes to use. We should make them at least as large as the source data so as not to lose any data when it gets loaded into the data warehouse.

We’ll make the following changes:

° Enter SKU in the name column on line 4 and leave the data type as VARCHAR2, but change the length to 50. Scroll the window to the right if any columns are not visible that need to be changed. We can also expand the dialog box to show additional columns.

° Enter LIST_PRICE in the name column on line 5, leave the data type as NUMBER, and leave the precision and scale as eight and two as it suggested.

° Make SKU a Business Identifier field in addition to Name. (Click on the drop-down box in the identifier column for SKU, and select Business.)

° Change the length of the NAME column from 25 to 50.

° Change the length of the DESCRIPTION column from 40 to 200.

Notice how the precision and scale were entered automatically for us by the Wizard when we entered names for our attributes. Moreover, they tended to make sense for the type of attribute. The LIST_PRICE had a default of eight for precision and two for the scale that we did not have to modify. If we choose logical names for our measures, it is able to make very good guesses as to what the precision and scale should be. SKU is a character field created with a varchar2 type with a reasonable length. Likewise, a LIST_PRICE amount implies money which requires a number having two decimal places (scale 2).

Suppose we make a mistake and enter a value and then decide not to keep it. Then we can delete the row by right-clicking on the row number to the left of the row, and then selecting Delete from the pop-up menu.

If we were to scroll that window all the way to the right, or expand it completely, we’d see even more columns such as the Seconds Precision and Descriptor column. If we press the Help button, it will explain what each column is. Briefly, the Seconds Precision is applicable to only TIMESTAMP data types, and expresses the precision of the seconds’ portion of the value. The Descriptor is applicable to MOLAP (multidimensional) implementations and provides six standard descriptions that can be assigned to columns. It presets two columns, the Long description and the Short description. We can safely ignore them for our application.

The screen should now look like the following, expanded slightly to the right to see the additional length, precision, and scale columns:

tmp14-83_thumb[2]

4. The next step is where we can specify the levels in our dimension. There must be at least one level identified, but we are going to have four in our Product dimension. They are to be entered on this screen in order from top to bottom with the highest level listed first, then down to the lowest level. For our dimension, we’ll enter DEPARTMENT, CATEGORY, BRAND, and ITEM in that order from top to bottom.

You might have noticed there is no step where we get to input hierarchies. The wizard will automatically create a default hierarchy called Standard that will contain the levels we enter here in this order. To create additional hierarchies, we must use the data object editor for dimensions after creating the dimension in the wizard.

The dialog box should now look like this:

tmp14-84_thumb[2]

5. Moving on to the next screen, we get to specify the level attributes. At the top are the levels, and at the bottom is the list of attributes with checkboxes beside each. If we click on each level in the top portion of the dialog box, we can see in the bottom portion that the wizard has preselected attributes for us. It chooses the three default attributes it created for us to be level attributes for each level, and the other two attributes — the sku and list_price — that we entered as level attributes for the bottom-most level—the ITEM level. We are not going to make any changes on this screen. The wizard has chosen wisely in this case. We could edit the descriptions of each of the level attributes if we wanted to.

6. This brings us to step 6 where we get to choose the Slowly Changing Dimension (SCD) type. This refers to the fact that dimension values will change over time. Although this doesn’t happen often, they will change and hence the "slowly" designation. For example, we might have an sku assigned to a Super Ball made by the ACME Toy Manufacturing Company, which then gets bought out by the Big Toy Manufacturing Company. This causes the Brand that is stored in the dimension for that sku to change. In this screen we specify how we want to handle the change. We will have the following three choices, which are related to the issue of whether or how we want to maintain a history of that change in the dimension:

° Type 1: Do not keep a history. This means we basically do not care what the old value was and just change it.

° Type 2: Store the complete change history. This means we definitely care about keeping that change along with any change that has ever taken place in the dimension.

° Type 3: Store only the previous value. This means we only care about seeing what the previous value might have been, but don’t care what it was before that.

The Type 2 and Type 3 options require additional licensing for our database if we want OWB to handle them automatically. We will need a license for the Warehouse Builder Enterprise ETL Option for that. As we are considering only basic functionality in this topic, we’ll leave this selection as Type 1 for now.

Handling SCDs can be done manually in a relational implementation. The Type 2 option to maintain a complete history would result in needing additional attributes where we want to maintain historical information. We need attributes designated as Triggering attributes. If changed, these attributes will generate a historical record. We also need an Effective Date attribute and an Expiration Date attribute. The Effective Date is when the record is entered. If a triggering attribute changes, the Expiration Date is set and a new record created with the updated information.

For the slowly changing Type 3 option, a new attribute in the dimension will be required to store only the most recent value.

With the Enterprise ETL option, the addition of these extra attributes and describing certain attributes as triggering attributes would be handled automatically for us.

7. Moving on, we get our summary screen of the actions we performed. Here we can review our actions, and go back and make any changes if needed. It will look like the following, based on the selections we’ve made:

tmp14-85_thumb[2]

8. Everything looks fine, so we move on to step 8. This step creates the dimension, showing us a progress bar as it does its work. It will report a successful completion when it’s done, and clicking on the Next button at this point will bring us to the summary screen where we see the above information followed by additional information that the wizard has created for us based on our responses. The extra items are as shown here:

tmp14-86_thumb[2]

To reiterate, nothing has been physically created for us in the database yet. What the wizard has created for us are the definitions of our dimension, and the underlying table and other objects in OWB. The previous screen shows a Sequence, Table Name, and Unique Key that all correspond to objects that the wizard is creating for us in OWB. Later in topic 8, we’ll deploy these objects to create the actual physical database objects in the target schema.

The sequence is an object that will be created to populate the ID values with unique numbers. It is created automatically for us by the wizard. This ID value is used as what is called the Surrogate Identifier for a level record which we discussed earlier as a value standing in (acting as a surrogate) for the actual unique identifier for the record. The actual identifier (a Business Identifier which as we discussed above) contains one or more attributes to uniquely identify a record. When we link a dimension to a cube, it will use that surrogate identifier as the key to link to, as this is easier for the database to use than a potentially multi-attribute business identifier but we’ll use the business identifier to decide which actual dimension record to use since that’s easier for us to understand.

We can also see the additional attribute names that were added, which will become column names in the table to support the levels we identified. When we checked these boxes (or rather left checked the ones the wizard checked) beside the attributes for each level to indicate they were level attributes for that level back in step 5, it created additional names for each based on the level name.

Our Product dimension is now created and we can see it in the Project Navigator window under the Dimensions node under our acme_dwh Oracle module.

The Store dimension

We can create our Store dimension in a similar manner using the wizard. We will not go through it in much detail as it is very similar to how we created the Product dimension. The only difference is the type of information we’re going to have in our Store dimension. This dimension provides the location information for our data warehouse, and so it will contain address information.

The creation of this dimension will be left as an exercise for the reader using the following details about the dimension.

Store attributes (attribute type), data type and size, and (Identifier)

• ID (Dimension/Level): Leave default for type and size (Surrogate ID)

• Store_Number (Level, STORE only): VARCHAR2 length 10 (Business ID)

• Name (Dimension/Level): VARCHAR2 length 50 (Business ID)

• Description (Level, COUNTRY and REGION only): VARCHAR2 length 200

• Addressl (Level, STORE only): VARCHAR2 length 60

• Address2 (Level, STORE only): VARCHAR2 length 60

• City (Level, STORE only): VARCHAR2 length 50

• State (Level, STORE only): VARCHAR2 length 50

• ZipPostalCode (Level, STORE only): VARCHAR2 length 50

• County (Level, STORE only): VARCHAR2 length 255

Store levels

• Country

• Region

• Store

Store hierarchy (highest to lowest)

• Country

• Region

• Store

Creating the Store dimension with the New Dimension Wizard

We will follow the same procedure as we had seen in the creation of the Product dimension. There are a few steps that are a little different from the previous procedure, and they are mentioned here.

In Step 3, where we put in the attributes listed previously, we need to make sure not to forget to specify the surrogate and business identifiers. The surrogate identifier can stay as the default on the ID, but we will have to change the business identifier to be the store_number, which is a unique number that ACME Toys and Gizmos Company assigns to each of its stores.

You may have the urge to include the region and/or country as an attribute in step 3, but resist the urge. They are being designated as levels. By specifying the level attributes to include the Name dimension attribute, we’ll have our region and country included for us—as we’ll see in a moment when we get to the final summary screen.

In step 5 where we specify the level attributes, (the above-listed attributes that are applicable to each level) we need to specify all the attributes except description for the Store level, and then just id, name, and description for the Region and Country levels. This is how we will include the region and country information.

It may seem a bit redundant to include a description as well as a name for the Country and Region levels as our source data at the moment only includes one field to identify the country and region. However, this is needed to prevent an error from occurring later when we map data to this dimension. The same holds true for the Product dimension. If all we had were the ID and the NAME, those would be two key fields that cannot be changed for a record. There would be no descriptive information that could be changed, and the Warehouse Builder generates code for loading the dimension such that it requires at least one updatable field to be mapped, without which the following error would occur:

VLD-5005: No updatable inputs connected for dimension level <dimension><level>

At least one updatable input must be connected for level <dimension><level>, or the generated code will fail. Parent reference key and level natural key inputs are not updatable attributes in the target.

The New Dimension Wizard actually helps us to avoid this error by automatically including three attributes: an ID as the surrogate identifier, a NAME as the business identifier, and a DESCRIPTION as the updatable field.

In step 7, the Pre Create settings page, as shown next, we can see what we should have specified for the Store dimension. We can click on the Back button to go back to make any changes.

tmp14-87_thumb[2]

The final summary screen should look like the following when scrolled all the way to the bottom:

tmp14-88_thumb[2]

Notice the region and country level attributes that are shown in the above image. This is where we see that information included as levels, instead of being specified as dimension attributes.

We’ll make sure to save our work after creating this dimension, and then we’ll move on to creating the cube.

Next post:

Previous post: