Computers and Technology

Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward function for the MDP, but instead, we are given with samples of what an agent actually experiences when it interacts with the environment (although, we do know that we do not remain in the same state after taking an action). In this problem, instead of first estimating the transition and reward functions, we will directly estimate the Q function using Q-learning. Assume, the discount factor, γ is 0.5 and the step size for Q-learning, α is 0.5. Our current Q function, Q(s, a), is shown in the left figure. The agent encounters the samples shown in the right figure: s A B a s' с r Clockwise 1.501 -0.451 2.73 A Counterclockwise C 8.0 Counterclockwise 3.153-6.055 2.133 Counterclockwise A 0.0
Provide the Q-values for all pairs of (state, action) after both samples have been accounted for.

answer
Answers: 3

Other questions on the subject: Computers and Technology

image
Computers and Technology, 22.06.2019 04:50, edenlbarfield
Which are steps taken to diagnose a computer problem? a) reproducing the problem and using error codes b) reproducing the problem and troubleshooting c) using error codes and troubleshooting d) using error codes and stepping functions
Answers: 1
image
Computers and Technology, 22.06.2019 18:00, Geo777
Suppose an astronomer discovers a large, spherical-shaped body orbiting the sun. the body is composed mostly of rock, and there are no other bodies sharing its orbit. what is the best way to categorize this body? a. planet b. moon c. comet d. asteroid
Answers: 1
image
Computers and Technology, 23.06.2019 21:00, kkpsmith
Alcohol’s affects on the cornea and lens of the eye make it more difficult
Answers: 1
image
Computers and Technology, 24.06.2019 18:20, seema12
7. design a circuit with three inputs (x, y, and z) representing the bits in a binary number, and three outputs (a, b, and c) also representing bits in a binary number. when the input is 1, 2, or 3, the binary output should be one lesser than the input. when the input is 4, 5, or 6, the binary output should be one greater than the input. when the input is 0, the output is 0, and when the input is 7, the output is 7. show your truth table, all computations for simplification, and the final circuit.
Answers: 2
Do you know the correct answer?
Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not k...

Questions in other subjects:

Konu
Mathematics, 22.03.2021 02:40
Konu
Mathematics, 22.03.2021 02:40
Konu
Mathematics, 22.03.2021 02:40
Konu
Mathematics, 22.03.2021 02:40