Rendered at 12:09:06 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
laGrenouille 2 hours ago [-]
Great visualizations. Really enjoyed having a well-written example where mathematical proofs directly help with understanding a practical application.
I wonder what would happen with this analysis if a momentum term was added to the gradient descent. It seems that it would fix the specific failure modes in the examples, but I wonder if there's a corresponding mathematical way of categorizing what kinds of functions can(not) be quickly optimized with GD + momentum.
xuzhenpeng 4 hours ago [-]
The animation is very good, making the article easy to understand
Guestmodinfo 3 hours ago [-]
We studied it in our peparation for college entrance exams in India. Though the detail the article goes in is exhaustive. But I thought that this maybe common or almost common knowledge.
We used to call it sandwich theorem
I wonder what would happen with this analysis if a momentum term was added to the gradient descent. It seems that it would fix the specific failure modes in the examples, but I wonder if there's a corresponding mathematical way of categorizing what kinds of functions can(not) be quickly optimized with GD + momentum.