Importing/defining source database object metadata Part 4 (Oracle Warehouse Builder 11gR2)

Case sensitivity of column names in SQL Server

We used all uppercase for the column names above because the Warehouse Builder defaults the case to upper for any column name that we enter. However, this could cause a problem later while retrieving data from the SQL Server tables using those column names if the case does not match the way they are defined in SQL Server. SQL Server will allow mixed case for column names, but the Oracle database assumes all uppercase for column names. It is possible to query a mixed-case column name in SQL Server from an Oracle database, but the name must be enclosed in double quotes. The code generated by the Warehouse Builder recognizes this and puts double quotes around any references to column name. If the import had worked, it would have created the column names with matching case and there would have been no problem. However, when the Table Editor manually enters columns, it has no option to enter a mixed-case name. We’ll run into errors later if the corresponding SQL Server column names are not in all uppercase. The database scripts that can be downloaded from the Packt website (http://www.packtpub. com/files/code/344 9_Code) to build the database contain the column names in all uppercase to avoid any problems.

To import the remainder of the tables automatically using the wizard, just follow the steps described above for importing source table metadata but start the import by right clicking on the acme_pos database node under ODBC instead of the Oracle node. The remainder of the import steps are as documented previously.


Be sure to save each table as it is created to make sure no work gets lost. When all the tables are entered or imported, our defining and importing of source metadata is completed.

Importing source metadata from files

One final object type we need to discuss before we wrap up the source metadata importing and defining is the importing of data object metadata from a file. The Warehouse Builder can take data from a flat file and make it available in the database as if it were a table, or just load it into an existing table. The metadata that describes this file must be defined or imported in the Warehouse Builder. The file format must be delimited, usually with commas separating each column and a carriage return at the end of a record (CSV file). The option to use a flat file greatly expands the flexibility of the Warehouse Builder because now it allows us to draw upon data from other sources, and not just databases. That can also be of great assistance even in loading data from a database if the database is not directly network accessible to our data warehouse. In that case, the data can be exported out of the source database tables and saved to a CSV file for us to import.

For our ACME Toys and Gizmos company data warehouse, we’ve been provided a flat file. This file contains information for counties in the USA that the management wanted to see in the data warehouse to allow analyzing by county. For stores in the USA, the store number includes a code that identifies the county the store is located in, and the flat file we’ve been provided with contains the cross reference of the code to the county that we’ll need.

The file name is counties.csv and it is available in the download files from the Packt website at http://www.packtpub.com/files/ code/3449_Code.

The process of creating the module and importing the metadata for a flat file is different from Oracle or non-Oracle databases because we’re dealing with a file in the file system now instead of a database. The steps involved in creating the module and importing the metadata for a flat file are as follows:

1. The first task we need to perform, as we did earlier for the source databases, is to create a new module to contain our file definition. If we look in the Projects tab under our project, we’ll see that there is a Files node right below the Databases node. We will launch the Create Module Wizard the same way as we did earlier, but we’ll do it on the Files node and not the Databases node. We’ll right-click on the Files node and select New Flat File Module from the pop-up menu to launch the wizard.

2. When we click on the Next button on the Welcome screen, we notice a slight difference already. The Step 1 of the Create Module wizard only asks for a name and description. The other options we had for databases above are not applicable for file modules. We’ll enter a name of ACME_FILES and click on the Next button to move to Step 2.

3. We need to edit the connection in Step 2 just as we did for the database previously. So we’ll click on the Edit button and immediately notice the other major difference in the Create Module Wizard for a file compared to a database. As we see in the following image, it only asks us for a name, a description, and the path to the folder where the files are:

tmp14-52_thumb[2][2]_thumb

The Name field is prefilled with the suggested name based on the module name. As it did for the database module location names, it adds that number 1 to the end. So, we’ll just edit it to remove the number and leave it set to ACME_FILES_ LOCATION.

1. Notice the Type drop-down menu. It has two entries: General and FTP. If we select FTP (File Transfer Protocol—used for getting a file over the network), it will ask us for slightly more information as shown in the following image:

tmp14-53_thumb[2][2]_thumb

The FTP option can be used if the file we need is located on another computer. We will need to know the name of the computer, and have a logon username and password to access that computer. We’ll also need to know the path to the location of the file. This option is used in process flows which is a more advanced option than we’ll be able to cover in this topic.

2. The simplest option is to store the file on the same computer on which we are running the database. This way, all we have to do is enter the path to the folder that contains the file. We should have a standard path we can use for any files we might need to import in the future. So we create a folder called GettingStartedWithOWB_files, which we’ll put in the C: drive. Choose any available drive with enough space and just substitute the appropriate drive letter. We’ll click on the Browse button on the Edit File System Location dialog box, choose the C:\GettingStartedWithOWB_files path, and click on the OK button.

3. We’ll then check the box for Import after finish and click on the Finish button. We could click the Next> button here and it would just take us to a screen summarizing what it’s going to do and then we’d hit the Finish button.

That’s it for the Create Module Wizard for files. It’s really very straightforward.

The File Import appears next. We’ll work through this one in a little more detail as it is different from importing from a database. If the File Import window has not appeared, then just right-click on the module name under the Files node under our project, and select Import and then Flat File…. The following are the steps to be performed in the File Import screens:

1. The first screen for importing a file is shown in the following screenshot:

tmp14-54_thumb[2][2]_thumb

This is where we will specify the file we wish to import. We’ll click Add Sample File. and select the counties.csv file. After selecting the file from the resulting popup, it will fill in the filename on the File Import screen. The View Sample File. button is now no longer grayed out so we can click on it. It will show us a view of the file we’ve just selected so we can verify it’s the correct data as shown next:

tmp14-55_thumb[2][2]_thumb[1]

2. If we’ve viewed the file we’ll just click OK to close the dialog. We’ll click the Import button now on the File Import screen to begin the import process. We are presented with an entirely new screen that we haven’t seen before. This is Flat File Sample Wizard, which has now been started. The Flat File Sample Wizard now has two paths that we can follow through it, a standard sequence for simple files and an advanced sequence for more complex files. The previous release included all these steps into one so we had no choice but now if we have a simple CSV file to import, we can save some time. The two sets of steps are indicated on the Welcome screen as shown below:

tmp14-56_thumb[2][2]_thumb

3. Clicking the Next button will take us to the first step which is shown below:

tmp14-57_thumb[2][2]_thumb[1]

This screen displays the information the wizard pulled out of the file, displayed as columns of information. It knows what’s in the columns because the file has each column separated by a comma, but doesn’t know at this point what type of data or column name to use for each column—so it just displays the data. It picks a name based on the file name, which is fine. So we’ll leave this and the remaining options set to the default. The following are the options on this screen:

° More information about what those fields mean can be found by clicking on the Help button.

° The Character set is language related. For English language, the default character set will work fine.

° The Number of characters to sample determines how much of the file the wizard will read to get an idea of what’s in it. If we were to import a file with multiple record types, this field might have come into play. But for our purposes, the default is enough.

4. This step 1 screen is also where we can choose to take the advanced path through the wizard which will consist of more steps, or we can just click the Next button to move on through the simple path. The simple path is for basic comma delimited files with single rows separated by a carriage return. We’ll follow the simple 3 step path through the wizard and then go back and take a look at the extra steps the advanced option gives us. We’re going to just click the Next button to move on to the simple step 2.

5. Step 2 of the simple steps includes the record and field delimiters choices as shown next:

tmp14-58_thumb[2][2]_thumb

Our records are separated by a Comma, and that is the default, so we’ll leave it at that. The Enclosures: selection is OWB’s way of specifying the characters that surround the text values in the file. Frequently, in text-based files such as CSV files, the text is differentiated from numerical values by surrounding the text values in double quotes, single quotes, or something similar. This is where we specify the characters, if any, that surround the text-field values in this file. As our file does not use any character to surround text values and does not contain any double quotes, this setting will have no effect and we can safely ignore it. It is possible that double quotes, or any of the other characters available from the drop-down menu, might appear in text strings in the file but not as delimiters. We would need to set this to blank to indicate that there’s no text delimiter in that case so that it wouldn’t get confused. We’ll click Next at this point to move on to the final step.

tmp14-59_thumb[2][2][2]

6. The final step is where we specify the details about what each field contains, and give each field a name. Check the box that says Use the first record as the field names and we’ll see that all the column names have changed to using the values from that first row. It’s not uncommon to receive a file with a large number of columns; and this can be a big time-saver. After clicking on the box, our screen now looks like the following screenshot:

tmp14-60_thumb[2][2][2]

Notice that the field type for the first column has changed. The ID is now INTEGER instead of character, as it has now correctly detected that the remaining rows after that first one all contain integer data. Length is specified there, which defaults to 0. If we scroll the window to the right, we’ll also notice an SQL data type that is set for each field name. The reason for these extra attributes is that Warehouse Builder can directly use this file in a mapping or can use it indirectly referenced by an external table. An external table is a table definition within the Oracle database that actually refers to information stored in a flat file. The direct access is via the SQL*Loader utility. This is used to load files of information into a table in the database and when we specify the source in a mapping to be that file, it builds an SQL*Loader configuration to load it using the information provided here. More details about this can be found by clicking on the Help button on this screen. We do not need to worry about specifying a length here as the columns are delimited by commas. We can enter a value if we happen to know the maximum length a field could be.

7. Click on Next to get a summary screen of what the wizard will do, or just click on the Finish button to continue. After clicking Finish it will create our file module under the Files node and we will be able to access it in the Projects tab. We can see that the imported file is displayed as counties_csv, which was the name it had defaulted to and which we left it set to.

What this File Sample Wizard just asked us for is a direct example of what we mean when we talk about metadata. We just entered the data that describes our data contained in an imported file.

8. We’ll make sure to select Save All from the Design menu in Design Center to save the metadata we just entered.

Let’s take a quick look at what the extra Advanced option steps would be:

1. The first step we saw had the Advanced button on it to take us down the advanced path through the wizard and if we click on that we are presented a screen where we can specify even more information about the characteristics of the file as shown in the following screenshot:

tmp14-61_thumb[2][2][2]

This is where we could specify detailed record information including whether the file contains logical records. It has the record delimiter we saw in the simple option step 2 above but has a different option about logical records. The commas only determine where one column ends and another begins. But where does the next row start? We can see by the display that it already seems to have figured that out because it assumes that a carriage return <CR> character will indicate the end of a row. That is the default that it uses.

This is an invisible character that gets entered into a text file when we press the Enter key while editing a file to move to the next line. It’s possible that we might get a file with some other character indicating the end of a row, but our files use the carriage return, which is the most common. So we’ll leave it set to that.

The other option here is to indicate whether or not the file contains logical records. Our file contains a physical record for each logical record. In other words, each row in the file is only one record. It’s possible that one record’s worth of information in a table might be contained in more than one physical row in the file. If this is the case, we could check that box and then specify the number of physical records that make up one complete logical record.

1. Hitting Next brings us to step 3 of the advanced steps and for the advanced options, this is where the field delimiter is specified that was on the simple step 2 screen or we can define the file as fixed length columns. The simple option assumed delimited fields where the advance option will allow a file that has fixed column lengths that aren’t delimited.

2. Hitting the Next button from here takes us to step 4 of the advance options where we can specify if there are any rows to skip and what record type we have in the file, single record or multi-record as shown next:

tmp14-62_thumb[2][2][2]

Sometimes the provided files may contain a number of rows of preliminary text information before the actual rows of data start. We can tell the wizard to skip over those rows at the start as they aren’t formatted like the rows of data. All the rows in our file are formatted the same, so we will leave the default set to 0 as we want it to read all the rows. We might be tempted to skip over the first row of data by setting the number of rows to skip to 1 since that’s just header information and not actual data but header rows can be used to indicate the column names as we saw above so we wouldn’t want to set this to 1 or wouldn’t have that option available to us.

3. The next step in the advanced install then takes us to the Field Properties screen the same as the final step in the simple install except this is Step 5 of 5 instead of Step 3 of 3. Finishing from here will accomplish the same task as the simple install, creating the new file node under our new files module.

This concludes all the object types we want to cover here for importing. Let’s summarize what we’ve learned so far, and move on to the next step, which is defining our target.

Summary

That was a lot of information presented in this topic. We began with a brief discussion about the source data for our scenario using the ACME Toys and Gizmos company. We then went through the process of importing metadata about our sources, and saw how to import metadata from an Oracle database as well as a flat file. Because we are dealing with a non-Oracle database, we were also exposed to configuring the Oracle database to communicate with a non-Oracle database using Oracle Heterogeneous Services. We also saw how to configure it for generic connectivity using the supplied ODBC agent. We worked through the process of manually creating table definitions for source tables for a SQL Server database.

At this point, you should run through these procedures to import or define the remaining tables that were identified in the source databases. For this, you can use the procedures we walked through above for practice. We’ll be using the SQL Server database tables for the POS transactional database throughout the remainder of topic, so be sure to have those defined at a minimum. Now that we have our sources all defined and imported, we need a target where we’re going to load the data into from these sources. That’s what we’re going to discuss in the next topic where we talk about designing and building our target data structures.

Next post:

Previous post: