randomForest {RFO} | R Documentation |
Classification with Random Forest
Description
randomForest
implements Breiman's random forest algorithm (based on
Breiman and Cutler's original Fortran code) for classification.
Usage
## S3 method for class 'formula'
randomForest(formula, data = NULL, ..., subset,
na.action = na.fail)
## Default S3 method:
randomForest(x, y, ntree = 500,
mtry = floor(sqrt(ncol(x))),
replace = TRUE, classwt = NULL, cutoff,
sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)),
nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1,
maxnodes = NULL, na.action = na.fail, internal = FALSE, ...)
## S3 method for class 'randomForest'
print(x, ...)
Arguments
data |
an optional data frame containing the variables in the model.
By default the variables are taken from the environment which
|
subset |
an index vector indicating which rows should be used. (NOTE: If given, this argument must be named.) |
na.action |
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.) |
x , formula |
a data frame or a matrix of predictors, or a formula
describing the model to be fitted (for the
|
y |
A response vector of |
ntree |
Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. |
mtry |
Number of variables randomly sampled as candidates at each split. |
replace |
Should sampling of cases be done with or without replacement? |
classwt |
Priors of the classes. Need not add up to one. |
cutoff |
A vector of length equal to number of classes. The ‘winning’ class for an observation is the one with the maximum ratio of proportion of votes to cutoff. Default is 1/k where k is the number of classes (i.e., majority vote wins). |
sampsize |
Size(s) of sample to draw. |
nodesize |
Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). |
maxnodes |
Maximum number of terminal nodes trees in the forest
can have. If not given, trees are grown to the maximum possible
(subject to limits by |
internal |
For internal test only. |
... |
optional parameters to be passed to the low level function
|
Value
An object of class randomForest
, which is a list with the
following components:
call |
the original call to |
type |
|
classes |
the classes of the target. |
ntree |
number of trees grown. |
mtry |
number of predictors sampled for spliting at each node. |
forest |
a list that contains the entire forest. |
cutoff |
the cutoff vector used to build the model. |
ncat |
the number of levels of the attributes. |
attr.names |
the names of the attributes. |
xlevels |
the levels of the attributes. |
Note
For large data sets, especially those with large number of variables,
calling randomForest
via the formula interface is not advised:
There may be too much overhead in handling the formula.
Author(s)
Lei Zhang lei.c.zhang@oracle.com, Andy Liaw andy\_liaw@merck.com and Matthew Wiener matthew\_wiener@merck.com, based on original Fortran code by Leo Breiman and Adele Cutler.
References
Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.
Breiman, L (2002), “Manual On Setting Up, Using, And Understanding Random Forests V3.1”, http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf.
See Also
Examples
## Classification:
##data(iris)
set.seed(71)
iris.rf <- randomForest(Species ~ ., data=iris)
print(iris.rf)
## "x" can be a matrix instead of a data frame:
set.seed(17)
x <- matrix(runif(5e2), 100)
y <- gl(2, 50)
(myrf <- randomForest(x, y))
(predict(myrf, x))
## Grow no more than 4 terminal nodes per tree:
rf <- randomForest(Species ~ ., data=iris, maxnodes=4, ntree=30)