Mathematics
Mathematics, 07.04.2020 23:21, GaudySky

Predicting Prices of Used Cars (Regression Trees). The file ToyotaCorolla. xls contains the data on used cars (Toyota Corolla) on sale during late summer of 2004 in The Netherlands. It has 1436 records containing details on 38 attributes, including Price, Age, Kilometers, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications. (The example in Section 9.8 is a subset of this dataset).

Data Preprocessing: Split the data into training (50%), validation (30%), and test (20%) datasets, using the Partitioning Variable in column AN (in the data partitioning menu choose "use partition variable" and select Partitioning Variable.).

a. (7 points) Run a regression tree (RT) using the Prediction menu, manual tree option, in Data Mining with the output variable Price and input variables Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfg_Guarantee, Guarantee_Period, Airco, Automatic_Airco, CD Player, Powered_Windows, Sport_Model, and Tow_Bar. Place Fuel_Type in "Categorical Variables" pane. Keep the Records in Terminal Nodes to 1, other limits set to 100, and the scoring option to Full Tree, to make the run least restrictive.

Which appear to be the three or four most important car specifications for predicting the car’s price? [HINT - use the "Feature Importance" option]
Make a note of the RMSE for the training data.
Now re-run the model, this time using the pruning option and Best Pruned Tree. Comment on (a) the error in the training and validation data and (b) the error in the training data compared to the full tree results.

b. (6 points) Let us explore the effect of turning the price variable into a categorical variable. First, create a new variable that categorizes price into 20 bins, leaving other options at their defaults (use Transform > Transform Continuous Data > Bin). Now repartition the data keeping Binned Price instead of Price. Run a classification tree (CT) using the Classification menu of XLMiner with the same set of input variables as in the RT, and with Binned Price as the output variable. Set the minimum number of records in a terminal node to 1, and the other limits at their maximums. Enable pruning, and choose Best Pruned Tree for display and scoring.

Compare the tree generated by the CT with the one generated by the RT. Are they different? (Look at structure, the top predictors, size of tree, etc.) Why?

answer
Answers: 1

Other questions on the subject: Mathematics

image
Mathematics, 21.06.2019 13:00, Victoriag2626
The actual length of side t is 0.045 cm. use the scale drawing to find the actual side length of w. a) 0.06 cm b) 0.075 cm c) 0.45 cm d) 0.75 cm
Answers: 3
image
Mathematics, 21.06.2019 18:30, rheamskeorsey33
Acoin bank containing only dimes and quarters has 12 more dimes than quarters. the total value of the coins is $11. how many quarters and dimes are in the coin bank?
Answers: 1
image
Mathematics, 21.06.2019 19:00, ksiandua07
65% of students in your school participate in at least one after school activity. if there are 980 students in you school, how many do not participate in an after school activity?
Answers: 1
image
Mathematics, 21.06.2019 21:30, maxB0846
Write an equation of the line that passes through the point (2, 3) and is perpendicular to the line x = -1. a) y = 1 b) y = 3 c) y = 0 eliminate d) y = -3
Answers: 1
Do you know the correct answer?
Predicting Prices of Used Cars (Regression Trees). The file ToyotaCorolla. xls contains the data on...

Questions in other subjects:

Konu
Biology, 20.09.2020 07:01