Deck 5: Random Forests

Full screen (f)
exit full mode
Question
Which is an "ensemble" method?

A) Classification trees
B) Regression trees
C) Random forests
D) None of the above
Use Space or
up arrow
down arrow
to flip the card.
Question
Which is NOT a use of random forests?

A) prediction of outcomes; organizing observations into classes;
B) partitioning between- and within- variance components
C) organizing observations into classes
D) selection of the most important predictor variables
Question
With random forests, categorical variables may be used as independent or dependent variables and may be coded as characters or as numbers.
Question
Which is a characteristic of random forests?

A) They are nonparametric
B) They are nonlinear
C) They are robust against multicollinearity
D) All of the above
Question
The same random forest model with the randomForest command will always return the same solution on each run of the model
Question
In random forest solutions, there is no way to print a graph showing the "typical" tree.
Question
Fill-In. Developing a random forest model with a training subset of the data and validating it with an OOB (out of box) subset is called ___________________________________
Question
The randomForest command is ….

A) Part of the "randomForest" package and needs explicit installation in R.
B) Part of R's built-in "stats" package and needs no explicit installation.
C) Part of R's built-in "graphics" package and needs no explicit installation.
D) Part of R's built-in "base" package and needs no explicit installation.
Question
What does the "mtry" option do in random forests?

A) Specifies the number of iterations in the solution
B) Specifies the number of trees to use in the forest
C) Specifies the number of predictors to consider at each split in a tree
D) There is no such option
Question
What will increasing the size of the "minbucket" option do in random forests?

A) No effect
B) Increase the number of trees used for the solution
C) Trees in the forest will be more complex (more splits)
D) Trees in the forest will be less complex (fewer splits)
Question
If RFclass is an object created by the randomForest command, what is RFclass$predicted?

A) A variable called "predicted" in the data frame used to create RFclass
B) An automatically generated element of RFclass containing predictions of the outcome variable
C) A variable added to RFclass after using the predict command
D) A variable added to the data frame used to create RFclass after using the predict command
Question
If RFclass is an output object created by the randomForest command, typing plot(RFclass) will generate an error rate plot. What is this plot used for?

A) Used to determine the needed size of minbucket
B) Used to determine the needed size of mtry
C) Used to determine the needed size of ntree
D) None of the above
Question
What does a "confusion table" show?

A) Overlap of independent variables with each other
B) Overlap of independent variables with the dependent variable
C) The error rate in a regression random forest model
D) The error rate in a classification random forest model
Question
In random forest classification models, what is a metric for variable importance?

A) MeanDecreaseAccuracy
B) MeanDecreaseGini
C) Both A and B
D) Neither A nor B
Question
If RFclass is an object created by a randomForest classification model, what is RFclass$proximity?

A) An element with coefficients showing the average frequency that pairs of observations follow the same path and wind up in the same terminal node.
B) An element s with coefficients showing the intercorrelation of variables in the model
C) An element with coefficients showing Euclidean distance between observations
D) An element with coefficients showing Euclidean distance between predictors
Question
What is the primary model performance metric for random forest regression models?

A) Accuracy
B) Gini Index
C) R-Squared
D) Mean square error
Question
Is this a true statement: "The more trees in a random forest solution, the better the model performance."
Question
Fill-In. What is the name of the package which can return up to seven different variable importance criteria for a random forest model, or importance by a combination of criteria?
Question
The "ctree" package implements conditional inference trees. What is the distinguishing aspect of a cforest solution compared to a randomForest solution?

A) Conditional inference trees use partial correlations as splitting criteria.
B) Conditional inference trees use factor analysis to form clusters.
C) Conditional inference trees use p values from significance tests as splitting criteria
D) Conditional inference trees do not incorporate a random element.
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/19
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 5: Random Forests
1
Which is an "ensemble" method?

A) Classification trees
B) Regression trees
C) Random forests
D) None of the above
C
"Ensemble" is defined as "a group of". In this context it refers to random forest procedures making estimates based on a large group of decision trees, in contraxt to classification and regression trees.
2
Which is NOT a use of random forests?

A) prediction of outcomes; organizing observations into classes;
B) partitioning between- and within- variance components
C) organizing observations into classes
D) selection of the most important predictor variables
B
While random forests do A, C, and D, they do not partition variance into between- and within-group components, as is done in multilevel modeling. Also, while regression forests use mean square error (MSE), they do not generate an R-square which can be interpreted in the same way, as percent of variance explained. Rather, apparent and crossvalidated R-square values are generated and these are different in meaning from OLS regression.
3
With random forests, categorical variables may be used as independent or dependent variables and may be coded as characters or as numbers.
True
4
Which is a characteristic of random forests?

A) They are nonparametric
B) They are nonlinear
C) They are robust against multicollinearity
D) All of the above
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
5
The same random forest model with the randomForest command will always return the same solution on each run of the model
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
6
In random forest solutions, there is no way to print a graph showing the "typical" tree.
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
7
Fill-In. Developing a random forest model with a training subset of the data and validating it with an OOB (out of box) subset is called ___________________________________
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
8
The randomForest command is ….

A) Part of the "randomForest" package and needs explicit installation in R.
B) Part of R's built-in "stats" package and needs no explicit installation.
C) Part of R's built-in "graphics" package and needs no explicit installation.
D) Part of R's built-in "base" package and needs no explicit installation.
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
9
What does the "mtry" option do in random forests?

A) Specifies the number of iterations in the solution
B) Specifies the number of trees to use in the forest
C) Specifies the number of predictors to consider at each split in a tree
D) There is no such option
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
10
What will increasing the size of the "minbucket" option do in random forests?

A) No effect
B) Increase the number of trees used for the solution
C) Trees in the forest will be more complex (more splits)
D) Trees in the forest will be less complex (fewer splits)
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
11
If RFclass is an object created by the randomForest command, what is RFclass$predicted?

A) A variable called "predicted" in the data frame used to create RFclass
B) An automatically generated element of RFclass containing predictions of the outcome variable
C) A variable added to RFclass after using the predict command
D) A variable added to the data frame used to create RFclass after using the predict command
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
12
If RFclass is an output object created by the randomForest command, typing plot(RFclass) will generate an error rate plot. What is this plot used for?

A) Used to determine the needed size of minbucket
B) Used to determine the needed size of mtry
C) Used to determine the needed size of ntree
D) None of the above
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
13
What does a "confusion table" show?

A) Overlap of independent variables with each other
B) Overlap of independent variables with the dependent variable
C) The error rate in a regression random forest model
D) The error rate in a classification random forest model
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
14
In random forest classification models, what is a metric for variable importance?

A) MeanDecreaseAccuracy
B) MeanDecreaseGini
C) Both A and B
D) Neither A nor B
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
15
If RFclass is an object created by a randomForest classification model, what is RFclass$proximity?

A) An element with coefficients showing the average frequency that pairs of observations follow the same path and wind up in the same terminal node.
B) An element s with coefficients showing the intercorrelation of variables in the model
C) An element with coefficients showing Euclidean distance between observations
D) An element with coefficients showing Euclidean distance between predictors
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
16
What is the primary model performance metric for random forest regression models?

A) Accuracy
B) Gini Index
C) R-Squared
D) Mean square error
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
17
Is this a true statement: "The more trees in a random forest solution, the better the model performance."
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
18
Fill-In. What is the name of the package which can return up to seven different variable importance criteria for a random forest model, or importance by a combination of criteria?
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
19
The "ctree" package implements conditional inference trees. What is the distinguishing aspect of a cforest solution compared to a randomForest solution?

A) Conditional inference trees use partial correlations as splitting criteria.
B) Conditional inference trees use factor analysis to form clusters.
C) Conditional inference trees use p values from significance tests as splitting criteria
D) Conditional inference trees do not incorporate a random element.
Unlock Deck
Unlock for access to all 19 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 19 flashcards in this deck.