Database Reference
In-Depth Information
Modeling methods
In the next few sections, we will cover the following important analytical methods in
• Decision trees (classification)
• Association rules (unsupervised learning)
• Linear and logistic regression
• Naive Bayesian classifier (classification)
• K-means clustering (unsupervised learning)
• Text analysis.
Decision trees
Decision trees are an example of classification technique. Here, we classify data in a
tree format using data features or attributes. Since decision trees depict the flows and
possible outcome for each flow, they are used in identifying the best strategy to reach
the goal.
In decision trees, we start with testing an attribute and split the data based on that
• We continue with the process.
• We can build multiple decision trees for the same problem.
• The efficiency and size of the tree is directly proportional to the attributes
chosen by us.
• We also need to have termination criteria:
• One obvious criterion is that all the records at the node belong to one
class and hence cannot be split.
• A significant majority of records belong to a single class (say, if 99 per-
cent records are buyers, we are fine).
• The segment contains only one or a very small number of records.
• Theimprovementisnotsubstantialenoughtowarrantmakingthesplit.
If we do not terminate at the right place, we might overfit the data.
• We can read a decision tree as a rule. Each branch connects nodes
with "and" and multiple branches are connected by "or".
Search WWH ::

Custom Search