Biomedical Engineering Reference
In-Depth Information
cell anemia. Similarly, documents dealing with other proteins and experimental processes can be
identified by comparing their vectors with a library of vectors representing other concepts.
Figure 7-17. Documents Represented as Word Frequency Vectors. The
vector of a document under analysis (left) is compared to the standard
vector (right) that represents spotting of hemoglobin from patients
suffering from sickle-cell anemia. A vector library (top) contains vectors
representing a variety of concepts relevant to the researcher.
Text Summarization
In addition to NLP, text mining is facilitated by text summarization, a process that takes a page or
more of text as its input and generates a summary paragraph as the output. Because each summary
paragraph represents a sample of the source document, analysis of the summaries can be used as an
initial screen for data on a particular topic described in documents or document clusters. In effect,
text summarization utilities, such as the "AutoSummarize" feature within Microsoft Word, are useful
in creating a rough abstract of a document when none has been provided by the author. Like
semantic-level NLP, text summarization is an imperfect, evolving technology that works well in niche
areas, but not universally.