MATLAB File Help: cv.RTrees Index
cv.RTrees

Random Trees

The class implements the random forest predictor.

Random Trees

Random trees have been introduced by [BreimanCutler]. The algorithm can deal with both classification and regression problems. Random trees is a collection (ensemble) of tree predictors that is called forest further in this section (the term has been also introduced by L. Breiman). The classification works as follows: the random trees classifier takes the input feature vector, classifies it with every tree in the forest, and outputs the class label that recieved the majority of votes.In case of a regression, the classifier response is the average of the responses over all the trees in the forest.

All the trees are trained with the same parameters but on different training sets. These sets are generated from the original training set using the bootstrap procedure: for each training set, you randomly select the same number of vectors as in the original set (=N). The vectors are chosen with replacement. That is, some vectors will occur more than once and some will be absent. At each node of each trained tree, not all the variables are used to find the best split, but a random subset of them. With each node a new subset is generated. However, its size is fixed for all the nodes and all the trees. It is a training parameter set to sqrt(#variables) by default. None of the built trees are pruned.

In random trees there is no need for any accuracy estimation procedures, such as cross-validation or bootstrap, or a separate test set to get an estimate of the training error. The error is estimated internally during the training. When the training set for the current tree is drawn by sampling with replacement, some vectors are left out (so-called oob (out-of-bag) data). The size of oob data is about N/3. The classification error is estimated by using this oob-data as follows:

  1. Get a prediction for each vector, which is oob relative to the i-th tree, using the very i-th tree.
  2. After all the trees have been trained, for each vector that has ever been oob, find the class-winner for it (the class that has got the majority of votes in the trees where the vector was oob) and compare it to the ground-truth response.
  3. Compute the classification error estimate as a ratio of the number of misclassified oob vectors to all the vectors in the original data. In case of regression, the oob-error is computed as the squared error for oob vectors difference divided by the total number of vectors.

References

[BreimanCutler]:

Leo Breiman and Adele Cutler: http://www.stat.berkeley.edu/users/breiman/RandomForests/

[1]:

Machine Learning, Wald I, July 2002. http://stat-www.berkeley.edu/users/breiman/wald2002-1.pdf

[2]:

Looking Inside the Black Box, Wald II, July 2002. http://stat-www.berkeley.edu/users/breiman/wald2002-2.pdf

[3]:

Software for the Masses, Wald III, July 2002. http://stat-www.berkeley.edu/users/breiman/wald2002-3.pdf

[4]:

And other articles from the web site http://www.stat.berkeley.edu/users/breiman/RandomForests/cc_home.htm

See also
Class Details
Superclasses handle
Sealed false
Construct on load false
Constructor Summary
RTrees Creates/trains a new Random Trees model 
Property Summary
ActiveVarCount The size of the randomly selected subset of features at each tree 
CVFolds If `CVFolds > 1` then algorithms prunes the built decision tree 
CalculateVarImportance Whether to compute variables importance. 
MaxCategories Cluster possible values of a categorical variable into 
MaxDepth The maximum possible depth of the tree. 
MinSampleCount If the number of samples in a node is less than this parameter then 
Priors The array of a priori class probabilities, sorted by the class label 
RegressionAccuracy Termination criteria for regression trees. 
TermCriteria The termination criteria that specifies when the training algorithm 
TruncatePrunedTree If true then pruned branches are physically removed from the tree. 
Use1SERule If true then a pruning will be harsher. 
UseSurrogates If true then surrogate splits will be built. 
id Object ID 
Method Summary
  addlistener Add listener for event. 
  calcError Computes error on the training or test dataset 
  clear Clears the algorithm state 
  delete Destructor 
  empty Returns true if the algorithm is empty 
  eq == (EQ) Test handle equality. 
  findobj Find objects matching specified conditions. 
  findprop Find property of MATLAB handle object. 
  ge >= (GE) Greater than or equal relation for handles. 
  getDefaultName Returns the algorithm string identifier 
  getNodes Returns all the nodes 
  getRoots GETROOS Returns indices of root nodes 
  getSplits Returns all the splits 
  getSubsets Returns all the bitsets for categorical splits 
  getVarCount Returns the number of variables in training samples 
  getVarImportance Returns the variable importance array 
  gt > (GT) Greater than relation for handles. 
  isClassifier Returns true if the model is a classifier 
  isTrained Returns true if the model is trained 
Sealed   isvalid Test handle validity. 
  le <= (LE) Less than or equal relation for handles. 
  load Loads algorithm from a file or a string 
  lt < (LT) Less than relation for handles. 
  ne ~= (NE) Not equal relation for handles. 
  notify Notify listeners of event. 
  predict Predicts response(s) for the provided sample(s) 
  save Saves the algorithm parameters to a file or a string 
  train Trains the Random Trees model