Databases Reference
In-Depth Information
FIGURE B.14
Issues with text processing.
FIGURE B.15
Text processing with textual ETL.
Other obstacles
Text carries with it a whole other set of obstacles. Figure B.14 illustrates some of those obstacles.
Fortunately there is technology that is designed to address the many obstacles of textual ana-
lytics. That technology is textual ETL from Forest Rim Technology. The world has long had ETL
technology—called classical ETL technology here. The world has had Informatica, Ascential, and Ab
Initio. Classical ETL technology is designed to take data from older legacy transaction-oriented sys-
tems and integrate the data into a form and structure useful for data warehouse processing. Classical
ETL processing is for repetitive transaction processing systems.
Forest Rim Technology, on the other hand, is dedicated to processing textual nonrepetitive infor-
mation. Forest Rim Technology uses textual information as the input and produces standard databases
as the output. As such, you can create your data warehouse using textual data as input with the textual
ETL provided by Forest Rim Technology. Forest Rim Technology has developed and heavily patented
the process of taking text and preparing text for data warehouse processing. Although ETL for repeti-
tive data and ETL for textual data appears to be conceptually the same, at a detailed level classical
repetitive ETL and textual nonrepetitive ETL are very different. ( Note: If you wish to copy Forest
Rim's patented textual ETL technology, please contact Forest Rim (http://www.forestrimtech.com)
for licensing rights.)
Textual ETL is designed to take text and do the many things to it that are necessary to prepare the
text for entry into a database structure and to integrate the text so that textual analytical processing
can be done against it.
The net result of textual ETL is that medical documents can be read, the text integrated, and the
results can be placed into a standard database management system. Figure B.15 shows this type of
processing.
The process of integration of text is at the heart of being able to do textual analytical processing.
(See Forest Rim Technology's patented technology for a detailed explanation of how to do textual
integration.) Some of the functions that must be achieved by textual integration include:
Stop word processing—the removal of extraneous words.
Stemming—the reduction of words to a common Latin or Greek stem.
Alternate spelling—the recognition that some words have alternate spellings.
Search WWH ::




Custom Search