机器学习中的优化 Optimization Chapter 2 Gradient Descent(2)
2022/4/29 6:13:00
本文主要是介绍机器学习中的优化 Optimization Chapter 2 Gradient Descent(2),对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
\(\large \bf{Theorem }\ 2.7:\)
\(f:\mathbb{R^d}\rightarrow\mathbb{R}\text{ be convex and differentiable with a global minimum }x^*;\text{ Suppose }f\text{ is smooth with parameter }L.\text{ Choosing stepsize: }\gamma = \frac{1}{L},\text{ gradients descent yields:}\)
\(\large\bf Proof:\)
\(f\text{ is differentiable and smooth, according to Lemma 2.6, we can get:}\)
\(\text{Therefore:}\)
\[\begin{align} \frac{1}{2L}||g_t||^2\leq f(x_t)-f(x_{t+1}) \end{align} \]\(\text{Now we sum up:}\)
\[\begin{align} \frac{1}{2L}\sum_{t=0}^{T-1}||g_t||^2&\leq \sum_{t=0}^{T-1}[f(x_t)-f(x_{t+1})]\\ &=f(x_0)-f(x_T) \end{align} \]\(\gamma = 1/L,\text{ therefore from previous analysis:}\)
\[\begin{align} \sum_{t=0}^{T-1}[f(x_t)-f(x^*)]\leq \frac{\gamma}{2}\sum_{t=0}^{T-1}||g_t||^2+\frac{1}{2\gamma}||x_0-x^*||^2 \end{align} \]\(\text{Combine (5) and (6):}\)
\[\begin{align} \sum_{t=0}^{T-1}[f(x_t)-f(x^*)]&\leq \frac{\gamma}{2}\sum_{t=0}^{T-1}||g_t||^2+\frac{1}{2\gamma}||x_0-x^*||^2 \\ &\leq f(x_0)-f(x_T)+\frac{1}{2\gamma}||x_0-x^*||^2 \end{align} \]\(\text{Hence:}\)
\[\begin{align} \sum_{t=1}^{T}[f(x_t)-f(x^*)]&\leq \frac{1}{2\gamma}||x_0-x^*||^2\\ &=\frac{L}{2}||x_0-x^*||^2 \end{align} \]\(\text{As the result:}\)
\[\begin{align} T\cdot (f(x_T)-f(x^*))&\leq \sum_{t=1}^T[f(x_t)-f(x^*)]\\ &=\frac{L}{2}||x_0-x^*||^2 \end{align} \]\[\begin{align} \Rightarrow f(x_T)-f(x^*)\leq \frac{L}{2T}||x_0-x^*||^2 \end{align} \]1. Smooth and strongly convex function:\(O(\log(1/\epsilon))\) steps
\(\text{First-order method: only use the gradient information to minimize }f.\)
\(\large\bf Definition\ 2.9:\)
\(\text{Strongly convex function: }\)
\(\large \bf Lemma\ 2.10:\)
\(\text{if }f \text{ is strongly convex with parameter }\mu>0,\text{ then }f\text{ is }\bf{strictly\ convex\ and\ has\ a\ unique\ global\ minimum.}\)
\(\text{Assume that }f\text{ is stringly convex with }\mu,\text{ from vanilla analysis:}\)
\[\begin{align} g_t(x_t-x^*)&=\nabla f(x_t)^T(x_t-x^*)\\ &\geq f(x_t)-f(x^*)+\frac{\mu}{2}||x_t-x^*||^2 \end{align} \]\(\text{Hence:}\)
\[\begin{align} f(x_t)-f(x^*)&\leq \frac{1}{2\gamma}[\gamma^2||g_t||^2+||x_t-x^*||^2-||x_{t+1}-x^*||^2]-\frac{\mu}{2}||x_t-x^*||^2 \end{align} \]\(\text{Rewrite it as:}\)
\[\begin{align} ||x_{t+1}-x^*||^2\leq 2\gamma [f(x^*)-f(x_t)]+\gamma^2||g_t||^2+(1-\mu\gamma)||x_t-x^*||^2 \end{align} \]\(\large\bf{Theorem\ 2.12:}\)
\(f:\mathbb{R^d}\rightarrow\mathbb{R}\text{ be convex and differnentiable. Suppose }f\text{ is smooth with }L,\text{ and strongly convex with }\mu. \text{ Choosing stepsize:}\)
\(\text{Gradient descent with arbitary }x_0\text{ satisfies the following two properties:}\)
\((i)\)
\(\large\bf Proof:\)
\(\text{By smooth, we know:}\)
\(\text{Combine (18), we get}\)
\[\begin{align} ||x_{t+1}-x^*||^2&\leq -\gamma^2||g_t||^2+\gamma^2||g_t||^2+(1-\mu\gamma)||x_t-x^*||^2\\ &\leq (1-\frac{\mu}{L})||x_t-x^*||^2 \end{align} \]\((ii)\)
\[\begin{align} f(x_T)-f(x^*)\leq \frac{L}{2}(1-\frac{\mu}{L})^T||x_0-x^*||^2 \end{align} \]\(\large\bf Proof:\)
\(\text{From smooth:}\)
这篇关于机器学习中的优化 Optimization Chapter 2 Gradient Descent(2)的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-05-15PingCAP 黄东旭参与 CCF 秀湖会议,共探开源教育未来
- 2024-05-13PingCAP 戴涛:构建面向未来的金融核心系统
- 2024-05-09flutter3.x_macos桌面os实战
- 2024-05-09Rust中的并发性:Sync 和 Send Traits
- 2024-05-08使用Ollama和OpenWebUI在CPU上玩转Meta Llama3-8B
- 2024-05-08完工标准(DoD)与验收条件(AC)究竟有什么不同?
- 2024-05-084万 star 的 NocoDB 在 sealos 上一键起,轻松把数据库编程智能表格
- 2024-05-08Mac 版Stable Diffusion WebUI的安装
- 2024-05-08解锁CodeGeeX智能问答中3项独有的隐藏技能
- 2024-05-08RAG算法优化+新增代码仓库支持,CodeGeeX的@repo功能效果提升