Geology Reference
In-Depth Information
variation in the Gamma statistic value.
This approach was questioned by Kisi
[ 17 ], who claimed that the input T has more in
uence on evaporation modeling than
the input W. Kisi [ 17 ] argued that the models with the inputs of W, T, RH, Ed may
perform better than the model with three inputs. This chapter fully agrees with the
argument of Kisi [ 17 ] that data-driven models should be carefully established with
proper comparison. Otherwise, the model results may be in con
ict with the physics
of the examined results. Thus, this chapter has evaluated various model structures
and training lengths with the help of state-of
the-art data selection approaches and
-
comprehensive modeling (LLR and ANN).
Further studies in this chapter highlighted that the best model structure is neither
[W, RH, Ed] nor [W, RH, Ed, T]. Analysis with entropy theory con
rmed that the
input combination [W, T, Ed] was better than the other two combinations in terms
of possible transferable inherent information in the data to the model. Entropy
theory identi
ed the daily saturation vapor pressure de
cit (Ed) as the most in
u-
encing data series among the available inputs. This
finding was in line with the
assessment of AIC, BIC, and traditional cross correlation analysis. The entropy
study showed that the relative importance of inputs is Ed > T > W > RH. However,
entropy analysis with an increase in data points failed to identify the training data
length within the available range of data. Entropy theory has suggested that the
available data length is insuf
cient to make the best data-based model which rep-
resents the complete information in the phenomenon. Analysis with AIC and BIC
has identi
ed the best combination as [W, T, Ed], with least information criterion
values. However, the training data length identi
cation results were unsatisfactory.
AIC value has shown a sudden decreasing change after the 3,500 data position,
whereas the BIC value was relatively stable. Data splitting analysis agreed with
AIC and BIC
findings, giving low error values at the 3,500 data point. Even though
there are disputes in suggestions made by AIC/BIC and GT, the chapter adopted the
training data length as 2,327 (GT suggested) because the analysis result from GT
was on the randomized data. To check the reliability of this [W, T, Ed] input
structure, a comprehensive modeling was performed with LLR and ANN models
on all 15 input scenarios. The modeling with LLR has shown that [1101] model
([W, T, Ed]) model has given better performance followed by that of [1110] model
([W, T, RH]) model. This result matches that of the GT suggested inputs. It is also
interesting to note that the model with four inputs ([1111] model) gave a better
performance during training, and overall performance was much lower than several
models with three or two inputs because of its poor performance during validation.
The usefulness of controlled experiments [ 2 ] to evaluate the effectiveness of input
selection is fully explored in this case study. The improper input structure with
lowest Gamma value, which gives results contrasting with the physics, could be
connected to the distribution of the Gamma static values. There should be more
research on this area to ascertain how much uncertainty is associated with the
Gamma static values. This uncertainty could be evaluated through analyzing PDFs
and CDFs of distribution of Gamma values on different ensembles of data points.
This chapter urges researchers to focus more into those aspects and to avoid
treacherous analysis research which can lead to pitfalls in Gamma static evaluation.
Search WWH ::




Custom Search