Gradient Descent Tests

ERErnieParke•Created June 21, 2017

527 views

Instructions

A fun project about math. For the confused, try clicking the button "What is this?". Gradient descent is used by data scientists, especially in artificial intelligence.

Description

Sections: Credits, Controls, Comparison == Credits == 1). Wikipedia. It's such a great resource! 2). TensorFlow for the parameter values! 3). Scratch as always! :D == Controls == They're mostly on screen. W to toggle slow gradient descent. Click to start gradient descent at the mouse pointer. == Comparison == All algorithms are run until the gradient is epsilon (0.001) small in each dimension. Running the algorithms on equation #1 shows that RMSProp often converges ~4 times faster then regular gradient descent with optimal step size. Considering RMSProp was originally created for equations that are roughly quadratic near their minimums, this makes sense. Running the algorithms on equation #2 shows similar convergence for RMSProp compared to regular gradient descent. On the down side, RMSProp starts to crack. In a few situations, RMSProp falls into a "jitter". Aka it jumps around the origin, but doesn't converge, even with smaller step sizes. For equation #3, RMSProp suffers even more from jitters. Overall, Adam, with the recommended TensorFlow variable values, converges more slowly then regular gradient descent. This is confusing. However, decreasing Beta1 (to ~0.5) works great on equations #1 and #3, avoiding RMSProp's binary jitter. Compared to regular gradient descent with a step size just below divergence, Adam is also faster. However, it appears too low of a Beta1 (<0.3) can cause jitters. Knowing this, it appears RMSProp also gets more jitters with a lower Beta1.

Project Details

Project ID166773986

CreatedJune 21, 2017

Last ModifiedOctober 19, 2017

SharedJune 22, 2017

Visibilityvisible

CommentsAllowed