Developing Sediment Yield Prediction Equations for Small Catchments in Tanzania - Advances in Data, Methods, Models and Their Applications in Geoscience

Geoscience Reference

In-Depth Information

approach. However, the author is aware that the actual data rarely satisfies the assumptions.

That is, the method is used even though at some points the assumptions are not necessarily

true.

Firstly, the analysis was conducted to choose type of regression equation forms. Two

candidate's forms of equations were investigated, which are straight line and power

function (Equations 2.4.1 & 2.4.2). This was achieved by comparing the strength of

correlation between sediment fill and catchment area, and corresponding log-transformed

values (2.4.3). A power relationship is confirmed when the correlation of log-transformed is

high, otherwise, a linear model is chosen.

y









x





, i = 1, …, n.

(2.4.1)

i

o

1

i

Where n is a number of observations, x i , is independent variable (catchment area), y i,

dependent variable (Sediment yield-fill), and two parameters, β 0 and β 1 , and  i is an error

term and the subscript i indexes a particular observation.

y i = α x i β (2.4.2)

where α and β are coefficient and exponent of the equation, respectively, y i and x i are as

defined above.

Log y i = Log α + β Log X i (2.4.3)

Secondly, the parameter values were estimated under Excel 2007's Regression Analysis Tool

using 70% of the data set, where applicable. The splitting of data was possible for cases

where the sample size was adequate for 2 independent variables ( i.e ., α, β) as presented

above. As recommended by Statsoft (2011) at least 10 to 20 times as many observations

(cases, respondents) as variables, should be used for stable estimates of the regression line

and replicability of the results. The tool outputs, among others; the t statistic (a measure of

how extreme a statistical estimate is); a p -value (a measure of how much evidence we have

against the null hypothesis, Ho , no change or no effect; confidence interval (an interval in

which a measurement or trial falls corresponding to a given probability, the best confidence

interval used is 95%); degrees of freedom (the minimal number of values which should be

specified to determine all the data points), df ; the standardized residual value (observed

minus predicted divided by the square root of the residual mean square), Coefficient of

determination, R 2 (this is the square of the product-moment correlation between two

variables -It expresses the amount of common variation between the two variables);

Multiple R (is the positive square root of R-square - this statistic is useful in multivariate

regression when you want to describe the relationship between the variables); The standard

error ( is the standard deviation of a mean). The developed equations were validated using

independent data set (30%), where appropriate.

3. Results and discussions

3.1 Selected regression model

As a result of conducting correlation analysis as described under section 2.4 above and

qualitative analysis of scatter plots (Figs. 3.1a,b) below, the power function was chosen as

the best regression model for this study. It can be seen from the plots that the strength of

Advances in Data, Methods, Models and Their Applications in Geoscience

Search WWH ::

Custom Search

Home