Physics, 17.07.2019 17:20, kaitlyn0123

gradient descent: consider the code example discussed in class. the prediction function is 1 +ws and the loss function is y(w, i) t where t represents the vector of observed targets, and x represents the vector of observed features. (a) derive the gradient and hessian of l(wx), with respect to w b) implement them and re-run the example, playing around with the step size and starting values do you see how much work it took to get get newton's method to converge to something sensible? (c) modify the code to give you stochastic gradient descent. try this with different mini-batch sizes and starting values, to get a feel for how it works- particularly the stability of the algorithm with respect to these hyper-parameters.

Answers: 3

Show answers

Other questions on the subject: Physics

Physics, 21.06.2019 22:30, dez73

Aforce of 200 n is applied to an input piston of cross-sectional area 2 sq. cm pushing it downward 2.8 cm. how far does the output piston of cross-sectional area 12 sq. cm move upward? show all work.

Answers: 1

continue

Physics, 22.06.2019 11:20, ignacio73

How to tell if a molecule is polar or nonpolar with electronegativity

Answers: 2

continue

Physics, 22.06.2019 12:30, mommer2019

Consider a system with two masses that are moving away from each other. why will the kinetic energy differ if the frame of reference is a stationary observer or one of the masses?

Answers: 1

continue

Physics, 22.06.2019 14:00, kaylaciamp65

The mantle is made up of which structural zones?

Answers: 1

continue

Do you know the correct answer?

gradient descent: consider the code example discussed in class. the prediction function is 1...