Database Reference
In-Depth Information
functional discovery from different datasets. However, it should be noted that
the techniques used for this task have to be aware of the nature and distribu-
tion of the dataset being processed. For instance, while techniques for microar-
ray data preprocessing should take into account the log -transformed nature of
the data from each experiment, techniques for interaction data should not ig-
nore the network structure of the complete dataset. Another complexity that
may have to be handled occurs when a dataset poses several preprocessing
issues simultaneously. For example, microarray data produced through exper-
iments often suffers from the problems of missing values and a difference in
the scales of values produced by the constituent experiments. Often, the ef-
fect of one preprocessing operation may have implications for the subsequent
operations. Thus, a systematic preprocessing pipeline needs to be developed
for each data type, as has been done for microarray data. 61 Finally, an effort
should be made toward finding biological justifications for the changes made
to the original dataset during the preprocessing step(s).
8.5 Materials Informatics
Our third example is from the domain of materials science and focuses on
a different step in the data analysis process, namely, dimension reduction. It
illustrates how inferences from even small datasets can be made with a careful
application of the analysis techniques, coupled with domain expertise.
One may naturally assume that having large amounts of data is critical for
any serious informatics studies. However, what constitutes “enough” data in
materials science applications can vary significantly. In studying structural
ceramics, for instance, fracture toughness measurements are di cult to make,
and in some of the more complex materials, just a few careful measurements
can be of great value. Similarly, having reliable measurements on fundamen-
tal constants or properties for a given material involves very detailed mea-
surement and/or computational techniques. In essence, datasets in materials
science fall into two broad categories. The first is datasets on the behavior of
a given material as related to mechanical or physical properties. The other is
datasets related to intrinsic information based on the chemical characteristic
of the material such as thermodynamic datasets.
Historically, in the materials science community, crystallographic and ther-
mochemical databases have been two of the most well established datasets.
The former serves as the foundation for interpreting crystal structure data of
metals, alloys, and inorganic materials. The latter involves the compilation of
fundamental thermochemical information in terms of heat capacity and calori-
metric data. While crystallographic databases are primarily used as a reference
source, thermodynamic databases were actually one of the first early exam-
ples of informatics as these databases were integrated into thermochemical
Search WWH ::




Custom Search