Databases Reference
In-Depth Information
sizes such as 36, 38, 40, and so on. We will need to transform these into the
North American equivalents.
5. Thirdly and fourthly, we have product data from another of the retailer's
suppliers, Runway Collections Ltd. This supplier's product data comes in
two files, one containing the main product's content: names, descriptions,
and so on, and the other containing SKUs and prices only (this allows the
supplier to send price changes without having to send all of the product
data). To make this integration even more interesting, there's no guarantee
that a product file and a price file will arrive at the same time, and, even
if they do, they might contain data for different products and SKUs. This
presents a challenge for us. There's a constraint on the website such that it
can accept products without prices, but not prices without products, so we'll
need to figure out how we can work around this issue.
6. All three data sources described previously can send multiple datafiles per
day and there's no fixed time for each of the files to be sent. Further more, the
source systems will FTP the data onto the server hosting the Studio and the
website into some nominated directories.
7. There is no connection between the three systems supplying the data and it
is possible that they may use the same product and SKU IDs, so we'll need
some way of making the SKUs unique across the website platform.
8. The datafiles are presented with filenames of a similar format, namely:
[data_source]_[yyyyMMddhhmmss].[file_extension]
Examples of filenames are:
° erp_20120930120000.xml
° fabulous_fashions_20120930142524.csv
Our next task is to pick out the key information from the previous scenarios and use
this to define the high-level job requirements.
1. The scenario described previously is quite complex, and it often makes sense
to break down complex requirements into smaller, simpler requirements. An
obvious way to do this here is to define four separate jobs, one for each data
source, rather than trying to combine the requirements into one mega job.
Sometimes this will not be appropriate, but in this case, we'll go with
this approach.
2. The website has a standard import process and it requires a file named
catalog.xml . As we have four data sources feeding into this process at
an undefined schedule, we need some way of checking that a file has not
already been presented to the website import process before we try to
present another, otherwise files will be overwritten.
 
Search WWH ::




Custom Search