Computers and Technology

We will practice building a machine learning algorithm using a new dataset, iris, that provides multiple predictors for us to use to train. To start, we will remove the setosa species and we will focus on the versicolor and virginica iris species using the following code: library(caret)
data(iris)
iris <- iris[-which(iris$Species=='setosa') , ]
y <- iris$Species
The following questions all involve work with this dataset.
1. First let us create an even split of the data into train and test partitions using createDataPartition() from the caret package. The code with a missing line is given below:
# set. seed(2) # if using R 3.5 or earlier
set. seed (2, sample. kind="Rounding") # if using R 3.6 or later
# line of code
test <- iris[test_index, ]
train <- iris[-test_index, ]
2. Which code should be used in place of # line of code above?
a. test_index <- createDataPartition(y, times=1, p=0.5)
b. test_index <- sample(2, length(y), replace=FALSE)
c. test_index <- createDataPartition(y, times=1, p=0.5, list=FALSE)
d. test_index <- rep(1, length(y))
Note: for this question, you may ignore any warning message generated by the code. If you have R 3.6 or later, you should always use the sample. kind argument in set. seed for this course.
3. Next we will figure out the singular feature in the dataset that yields the greatest overall accuracy when predicting species. You can use the code from the introduction and from Q1 to start your analysis.
Using only the train iris dataset, for each feature, perform a simple search to find the cutoff that produces the highest accuracy, predicting virginica if greater than the cutoff and versicolor otherwise. Use the seq function over the range of each feature by intervals of 0.1 for this search. Which feature produces the highest accuracy?
a. Sepal. Length
b. Sepal. Width
c. Petal. Length
d. Petal. Width
4. For the feature selected in Q8, use the smart cutoff value from the training data to calculate overall accuracy in the test data. What is the overall accuracy?
Notice that we had an overall accuracy greater than 96% in the training data, but the overall accuracy was lower in the test data. This can happen often if we overtrain. In fact, it could be the case that a single feature is not the best choice. For example, a combination of features might be optimal. Using a single feature and optimizing the cutoff as we did on our training data can lead to overfitting.
Given that we know the test data, we can treat it like we did our training data to see if the same feature with a different cutoff will optimize our predictions. Repeat the analysis in Q8 but this time using the test data instead of the training data. Which feature best optimizes our overall accuracy when using the test set?
a. Sepal. Length
b. Sepal. Width
c. Petal. Length
d. Petal. width
5. Now we will perform some exploratory data analysis on the data.
plot(iris, pch=21, bg=iris$Species)
Notice that Petal. Length and Petal. width in combination could potentially be more information than either feature alone. Optimize the the cutoffs for Petal. Length and Petal. width separately in the train dataset by using the seq function with increments of 0.1. Then, report the overall accuracy when applied to the test dataset by creating a rule that predicts virginica if Petal. Length is greater than the length cutoff OR Petal. Width is greater than the width cutoff, and versicolor otherwise. What is the overall accuracy for the test data now?

answer
Answers: 2

Other questions on the subject: Computers and Technology

image
Computers and Technology, 21.06.2019 17:10, chiah
Type the correct answer in the box. spell all words correctly. which technology should andrea use? andrea owns a potato chips manufacturing unit. she has been getting complaints about the quality of the chips. she knows her product is good. she realizes that she needs to change the way the chips are packaged. she should use technology, which uses gases, such as carbon dioxide or argon, to create an air cushion, which improves the shelf life of products.
Answers: 2
image
Computers and Technology, 22.06.2019 16:30, buky0910p6db44
Corey set up his presentation for delivery to his team. the information he had to convey was critical to their job performance. he knew he would need a lot of time to explain each point
Answers: 3
image
Computers and Technology, 23.06.2019 00:00, destinysmithds7790
Suppose you have 9 coins and one of them is heavier than others. other 8 coins weight equally. you are also given a balance. develop and algorithm to determine the heavy coin using only two measurements with the of the balance. clearly write your algorithm in the form of a pseudocode using the similar notation that we have used in the class to represent sorting algorithms
Answers: 1
image
Computers and Technology, 23.06.2019 02:00, eila3601
As with any small island country, cuba has fewer natural resources than countries such as brazil. this affects their economy in that cuba a) exports only manufactured products. b) exports more products than it imports.. c) must import more products than it exports. d) has imposed trade barriers against the united states.
Answers: 3
Do you know the correct answer?
We will practice building a machine learning algorithm using a new dataset, iris, that provides mult...

Questions in other subjects:

Konu
Mathematics, 03.12.2021 05:50
Konu
Chemistry, 03.12.2021 05:50