Database Reference
In-Depth Information
method may not offer information on the influence of many of the factors on
the behavior of the component. Ensemble tree-based algorithms are strong
methods which overcome this limitation.
The survival trees method is highly popular among the tree-based
methods. This method is useful for identifying factors that may influence a
failure event and the mileage or time to an event of interest. Survival trees
do not require any defined distribution assumptions and they tend to be
resistant to the influence of outliers. When a single tree framework is used,
the data are split by only a subset of factors and the rest are disregarded
due to the trees stopping conditions, e.g. minimum number of observations
in a terminal node. Therefore, a single tree-based method may not offer
information on the influence of many of the factors on the behavior of the
component. In order to overcome this limitation, a new ensemble approach
is proposed.
The Random Survival Forests (RSF) is a method for the analysis
of right-censored survival data. Both Random Forest (RF) and RSF are
very ecient algorithms for analyzing large multidimensional datasets.
However, due to their random nature they are not always intuitive and
comprehensible to the user. Different trees in the forest might yield
conflicting interpretations. In contrast to the RF, the C-Forest function
in R creates random forests from unbiased classification trees based on a
conditional inference framework.
Like in classification trees, also in survival trees the splitting criterion
that is very crucial to the success of the algorithm. Bou-Hamad et al . (2011)
provide a very detailed comparison of splitting criteria. Most of the existing
algorithms are using statistical tests for choosing the best split. One possible
approach is to use the logrank statistic to compare the two groups formed
by the children nodes. The chosen split is the one with the largest significant
test statistic value. The use of the logrank test leads to a split which
assures the best separation of the median survival times in the two children
nodes. Another option is to use the likelihood ratio statistic (LRS) under an
assumed model to measure the dissimilarity between the two children nodes.
Another option is to use the KolmogorovSmirnov statistic to compare the
survival curves of the two nodes. Some researchers suggest to select the split
based on residuals obtained from fitting a model. The degree of randomness
of the residuals is quantified and the split that appears the least random is
selected. The party package in R provides a set of tools for training survival
trees. Section 10.3 presents a walk-through-guide for building Regression
Trees in R.
Search WWH ::




Custom Search