Geoscience Reference
In-Depth Information
approach. However, the author is aware that the actual data rarely satisfies the assumptions.
That is, the method is used even though at some points the assumptions are not necessarily
true.
Firstly, the analysis was conducted to choose type of regression equation forms. Two
candidate's forms of equations were investigated, which are straight line and power
function (Equations 2.4.1 & 2.4.2). This was achieved by comparing the strength of
correlation between sediment fill and catchment area, and corresponding log-transformed
values (2.4.3). A power relationship is confirmed when the correlation of log-transformed is
high, otherwise, a linear model is chosen.
y
x
, i = 1, …, n.
(2.4.1)
i
o
1
i
i
Where n is a number of observations, x i , is independent variable (catchment area), y i,
dependent variable (Sediment yield-fill), and two parameters, β 0 and β 1 , and i is an error
term and the subscript i indexes a particular observation.
y i = α x i β (2.4.2)
where α and β are coefficient and exponent of the equation, respectively, y i and x i are as
defined above.
Log y i = Log α + β Log X i (2.4.3)
Secondly, the parameter values were estimated under Excel 2007's Regression Analysis Tool
using 70% of the data set, where applicable. The splitting of data was possible for cases
where the sample size was adequate for 2 independent variables ( i.e ., α, β) as presented
above. As recommended by Statsoft (2011) at least 10 to 20 times as many observations
(cases, respondents) as variables, should be used for stable estimates of the regression line
and replicability of the results. The tool outputs, among others; the t statistic (a measure of
how extreme a statistical estimate is); a p -value (a measure of how much evidence we have
against the null hypothesis, Ho , no change or no effect; confidence interval (an interval in
which a measurement or trial falls corresponding to a given probability, the best confidence
interval used is 95%); degrees of freedom (the minimal number of values which should be
specified to determine all the data points), df ; the standardized residual value (observed
minus predicted divided by the square root of the residual mean square), Coefficient of
determination, R 2 (this is the square of the product-moment correlation between two
variables -It expresses the amount of common variation between the two variables);
Multiple R (is the positive square root of R-square - this statistic is useful in multivariate
regression when you want to describe the relationship between the variables); The standard
error ( is the standard deviation of a mean). The developed equations were validated using
independent data set (30%), where appropriate.
3. Results and discussions
3.1 Selected regression model
As a result of conducting correlation analysis as described under section 2.4 above and
qualitative analysis of scatter plots (Figs. 3.1a,b) below, the power function was chosen as
the best regression model for this study. It can be seen from the plots that the strength of
Search WWH ::




Custom Search