A fun project about math. For the confused, try clicking the button "What is this?". Gradient descent is used by data scientists, especially in artificial intelligence.
Sections: Credits, Controls, Comparison == Credits == 1). Wikipedia. It's such a great resource! 2). TensorFlow for the parameter values! 3). Scratch as always! :D == Controls == They're mostly on screen. W to toggle slow gradient descent. Click to start gradient descent at the mouse pointer. == Comparison == All algorithms are run until the gradient is epsilon (0.001) small in each dimension. Running the algorithms on equation #1 shows that RMSProp often converges ~4 times faster then regular gradient descent with optimal step size. Considering RMSProp was originally created for equations that are roughly quadratic near their minimums, this makes sense. Running the algorithms on equation #2 shows similar convergence for RMSProp compared to regular gradient descent. On the down side, RMSProp starts to crack. In a few situations, RMSProp falls into a "jitter". Aka it jumps around the origin, but doesn't converge, even with smaller step sizes. For equation #3, RMSProp suffers even more from jitters. Overall, Adam, with the recommended TensorFlow variable values, converges more slowly then regular gradient descent. This is confusing. However, decreasing Beta1 (to ~0.5) works great on equations #1 and #3, avoiding RMSProp's binary jitter. Compared to regular gradient descent with a step size just below divergence, Adam is also faster. However, it appears too low of a Beta1 (<0.3) can cause jitters. Knowing this, it appears RMSProp also gets more jitters with a lower Beta1.