Database Reference
In-Depth Information
(0, 0) position of the loading plot. However, attributes such as pseudo-potential
radii are a very dominant factor, while electronegativity and ionization energy
play an important but lesser role. The loading plot also indicates the noticeable
negative correlation between ionization energy and pseudo-potential radii in
terms of their influence on average valency clustering of the compounds as
they reside in opposite quadrants. We can further gather more information
by juxtaposition of information from both scoring and loading plots and by
different visualization schemes.
The strong effect of valency on the linear pattern of clustering in the scoring
plot is consistent with its large distance from the origin on the loading plot.
In Figure 8.6, we have presented the same scoring plot as before, except now
each compound is labeled according to structure type rather than transition
temperature. The highest transition compounds are the cupric oxides (marked
in light green).
To summarize, when we start with a multivariate data matrix, PCA analysis
permits us to reduce the dimensionality of that dataset. This reduction in
dimensionality now offers us better opportunities to
identify the strongest patterns in the data,
capture most of the variability of the data by a small fraction of the
total set of dimensions, and
eliminate much of the noise in the data, making it beneficial for both
data mining and other data analysis algorithms.
8.5.2 Prediction via Data Mining
While a fundamental tenet in materials science is to establish structure-
property relationships, it is the life sciences and organic chemistry commu-
nities that have formally introduced the concept of quantitative structure-
activity (or also termed property ) relationships (QSAR or QSPR), as discussed
in Section 8.3. Unlike classical materials science approaches, which is relating
structure and function through physically based models, QSARs are derived
from a model-independent approach, sometimes referred to as soft modeling .
These data-driven dimensionality reduction techniques help to guide links be-
tween structure and properties. The partial least squares (PLS) technique ex-
presses a dependent variable (target property) in terms of linear combinations
of the principal components. The PLS method can be applied to rationalize
the materials attributes relevant to materials function or property; this per-
mits one to use PLS methods to develop explicit quantitative relationships
that identify the relative contributions of different data descriptors, and the
resulting relationship between all these descriptors as a linear combination,
to the final property.
For instance, Suh and Rajan 85 explored the attributes used in electronic
structure calculations and their influence on predicting bulk modulus. Using
PLS, a QSAR was developed relating bulk modulus with a variety of electronic
Search WWH ::




Custom Search