Essay
To examine the local housing market in a particular region, a sample of 120 homes sold during a year are collected. The data are given below:
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the sale price using a regression tree. Use Sale Price as the output variable and all the other variables as input variables. In Step 2 of XLMiner's Regression Tree procedure, be sure to Normalize input data, to set the Maximum #splits for input variables to 59, to set the Minimum #records in a terminal node to 1, and specify Using Best prune tree as the scoring option. In Step 3 of XLMiner's Regression Tree procedure, set the maximum number of levels to 7. Generate the Full tree and Best pruned tree.
a. In terms of number of decision nodes, compare the size of the full tree to the size of the best pruned tree.
b. What is the root mean squared error (RMSE) of the best pruned tree on the validation data and on the test data?
c. What is the average error on the validation data and test data? What does this suggest?
d. By examining the best pruned tree, what are the critical variables in predicting the sale price of a home?
Correct Answer:

Verified
a. There 59 decision nodes in the full t...View Answer
Unlock this answer now
Get Access to more Verified Answers free of charge
Correct Answer:
Verified
View Answer
Unlock this answer now
Get Access to more Verified Answers free of charge
Q1: Observation refers to the<br>A)estimated continuous outcome variable.<br>B)set
Q14: An analysis of items frequently co-occurring in
Q17: The impurity of a group of observations
Q23: A sample is representative of the entire
Q27: Single linkage is a measure of calculating
Q28: Which of the following is true of
Q29: Which of the following is true of
Q30: k-means clustering is the process of:<br>A) agglomerating
Q31: Average linkage is a measure of calculating
Q45: In which of the following data-mining process