预测模型
预测模型是一种用于预测未来事件或未知数据的模型。它通过学习历史数据的规律,推断出未来的结果。
回归模型
回归模型是一种用于建模因变量(目标变量)与一个或多个自变量(特征变量)之间关系的统计模型。它通常用于预测连续值。
一元线性回归
回归模型
我有两个样本(1,2)和(2,3),那么很明显,此时回归模型的k和d分别是1,1。
y = x + 1 y=x+1 y=x+1
如果我现在增加了一个样本(3,6),它并不满足 y = x + 1 y=x+1 y=x+1。那么此时,如何让模型能够预测样本4最可能出现的地方?
那么这个模型必须满足对之前3个样本的y轴上的距离之和最小。(这里可不是垂直于要求的直线方程,而是y轴上,因为y是你模型根据特征x所预测的值,y轴方向上越靠近,才是模型预测的越准确)
因为距离是个正数,所以可以用距离的平方来代替。假设每一个样本在模型中输入特征 x i x_i xi,所得到的预测值)为 y i y_i yi,那么我们要做的事情就是求一组k和d的值,能够使得所有样本的y轴距离(的平方)最小:
arg min k , d ∑ i = 1 m ( y i − y ^ i ) 2 \arg \min_{k,d} \sum _{i=1}^m(y_i-\hat y_i)^2 argk,dmini=1∑m(yi−y^i)2
因为特征值 x i x_i xi对于 y i y_i yi和 y ^ i \hat y_i y^i是相同的,可以把 y ^ i = k x i + d \hat y_i=kx_i+d y^i=kxi+d代入,得到
arg min k , d ∑ i = 1 m ( y i − k x i − d ) 2 \arg \min_{k,d} \sum _{i=1}^m(y_i-kx_i-d)^2 argk,dmini=1∑m(yi−kxi−d)2
视最小化误差平方和为多元函数f(k,d)
(1)对 d d d求偏导
设 f ( k , d ) = ∑ i = 1 m ( y i − k x i − d ) 2 f(k,d)=\sum _{i=1}^m(y_i-kx_i-d)^2 f(k,d)=∑i=1m(yi−kxi−d)2,求d的偏导数,并且令导数是为0;
∂ f ∂ d 0 = ∑ i = 1 m ( y i − k x i − d 0 ) 2 = 0 \frac{\partial f}{\partial d_0}=\sum _{i=1}^m(y_i-kx_i-d_0)^2=0 ∂d0∂f=i=1∑m(yi−kxi−d0)2=0
只有0的平方才是0,因此
∑ i = 1 m 2 ( y i − k x i − d 0 ) = 0 \sum _{i=1}^m2(y_i-kx_i-d_0)=0 i=1∑m2(yi−kxi−d0)=0
两遍同时除以数量m,得到平均数
∑ i = 1 m ( y i − k x i − d 0 ) m = 0 \frac {\sum_{i=1}^m(y_i-kx_i-d_0)}{m} =0 m∑i=1m(yi−kxi−d0)=0
∑ i = 1 m y i − ∑ i = 1 m k x i − ∑ i = 1 m d 0 ) m = 0 \frac {\sum_{i=1}^m y_i-\sum_{i=1}^m kx_i- \sum_{i=1}^m d_0)}{m} =0 m∑i=1myi−∑i=1mkxi−∑i=1md0)=0
y ‾ − x ‾ k − m d 0 m = 0 \frac {\overline y -\overline x k -md_0}{m} =0 my−xk−md0=0
y ‾ − x ‾ k − d 0 = 0 \overline y -\overline x k -d_0=0 y−xk−d0=0
d 0 = y ‾ − x ‾ k d_0=\overline y -\overline x k d0=y−xk
(2)对 k k k求偏导
还是 f ( k , d ) = ∑ i = 1 m ( y i − k 0 x i − d 0 ) 2 f(k,d)=\sum _{i=1}^m(y_i-k_0x_i-d_0)^2 f(k,d)=∑i=1m(yi−k0xi−d0)2=0,对 k 0 k_0 k0求偏导
先把误差平方和公式的平方展开
( y i − k 0 x i − d 0 ) 2 = ( y i − k 0 x i ) 2 − 2 ( y i − d 0 ) ( k 0 x i ) + ( k 0 x i ) 2 (y_i-k_0x_i-d_0)^2=(y_i-k_0x_i)^2-2(y_i-d_0)(k_0x_i)+(k_0x_i)^2 (yi−k0xi−d0)2=(yi−k0xi)2−2(yi−d0)(k0xi)+(k0xi)2
对每一项分别求导
∂ ( ( y i − k 0 x i ) 2 ) ∂ k 0 = 0 \frac{\partial ((y_i-k_0x_i)^2)}{ \partial k_0 } = 0 ∂k0∂((yi−k0xi)2)=0
∂ ( ( − 2 ( y i − d 0 ) ( k 0 x i ) ) ∂ k 0 = − 2 ( y i − d 0 ) ( x i ) \frac{\partial ((-2(y_i-d_0)(k_0x_i))}{ \partial k_0 } = -2(y_i-d_0)(x_i) ∂k0∂((−2(yi−d0)(k0xi))=−2(yi−d0)(xi)
∂ ( ( k 0 x i ) 2 ) ∂ k 0 = 2 k 0 x i 2 \frac{\partial ((k_0x_i)^2)}{ \partial k_0 } = 2k_0x_i^2 ∂k0∂((k0xi)2)=2k0xi2
合并结果后得到
∂ f ∂ k 0 = ∑ i = 1 m ( − 2 ( y i − d 0 ) ( x i ) + 2 k 0 x i 2 ) \frac{\partial f}{ \partial k_0 } = \sum_{i=1}^m(-2(y_i-d_0)(x_i)+2k_0x_i^2) ∂k0∂f=i=1∑m(−2(yi−d0)(xi)+2k0xi2)
令偏导为0
∑ i = 1 m ( − 2 ( y i − d 0 ) ( x i ) + 2 k 0 x i 2 ) = 0 \sum_{i=1}^m(-2(y_i-d_0)(x_i)+2k_0x_i^2) =0 i=1∑m(−2(yi−d0)(xi)+2k0xi2)=0
两遍同时除以2
∑ i = 1 m ( − ( y i − d 0 ) ( x i ) + k 0 x i 2 ) = 0 \sum_{i=1}^m(-(y_i-d_0)(x_i)+k_0x_i^2) =0 i=1∑m(−(yi−d0)(xi)+k0xi2)=0
− ∑ i = 1 m ( y i − d 0 ) ( x i ) + ∑ i = 1 m k 0 x i 2 = 0 -\sum_{i=1}^m(y_i-d_0)(x_i)+\sum_{i=1}^mk_0x_i^2 =0 −i=1∑m(yi−d0)(xi)+i=1∑mk0xi2=0
∑ i = 1 m ( y i − d 0 ) ( x i ) = ∑ i = 1 m k 0 x i 2 \sum_{i=1}^m(y_i-d_0)(x_i)=\sum_{i=1}^mk_0x_i^2 i=1∑m(yi−d0)(xi)=i=1∑mk0xi2
带入 d 0 = y ‾ − x ‾ k 0 d_0=\overline y -\overline x k_0 d0=y−xk0
∑ i = 1 m ( y i − y ‾ + x ‾ k 0 ) ( x i ) = ∑ i = 1 m k 0 x i 2 \sum_{i=1}^m(y_i-\overline y +\overline x k_0)(x_i)=\sum_{i=1}^mk_0x_i^2 i=1∑m(yi−y+xk0)(xi)=i=1∑mk0xi2
展开括号
∑ i = 1 m ( ( y i − y ‾ ) x i ) + ∑ i = 1 m ( ( x ‾ k 0 ) x i ) = ∑ i = 1 m k 0 x i 2 \sum_{i=1}^m((y_i-\overline y)x_i)+\sum_{i=1}^m((\overline x k_0)x_i)=\sum_{i=1}^mk_0x_i^2 i=1∑m((yi−y)xi)+i=1∑m((xk0)xi)=i=1∑mk0xi2
把 k 0 k_0 k0放到一遍
∑ i = 1 m k 0 x i 2 − ∑ i = 1 m ( ( x ‾ k 0 ) x i ) = ∑ i = 1 m ( ( y i − y ‾ ) x i ) \sum_{i=1}^mk_0x_i^2 -\sum_{i=1}^m((\overline x k_0)x_i)=\sum_{i=1}^m((y_i-\overline y)x_i) i=1∑mk0xi2−i=1∑m((xk0)xi)=i=1∑m((yi−y)xi)
k 0 ∑ i = 1 m x i 2 − k 0 ∑ i = 1 m ( ( x ‾ ) x i ) = ∑ i = 1 m ( ( y i − y ‾ ) x i ) k_0\sum_{i=1}^mx_i^2 - k_0\sum_{i=1}^m((\overline x)x_i)=\sum_{i=1}^m((y_i-\overline y)x_i) k0i=1∑mxi2−k0i=1∑m((x)xi)=i=1∑m((yi−y)xi)
k 0 ( ∑ i = 1 m x i 2 − x ‾ ∑ i = 1 m x i ) = ∑ i = 1 m ( ( y i − y ‾ ) x i ) k_0(\sum_{i=1}^mx_i^2 - \overline x\sum_{i=1}^mx_i)=\sum_{i=1}^m((y_i-\overline y)x_i) k0(i=1∑mxi2−xi=1∑mxi)=i=1∑m((yi−y)xi)
k 0 ∑ i = 1 m ( x i 2 − x ‾ x i ) = ∑ i = 1 m ( ( y i − y ‾ ) x i ) k_0\sum_{i=1}^m(x_i^2 - \overline xx_i)=\sum_{i=1}^m((y_i-\overline y)x_i) k0i=1∑m(xi2−xxi)=i=1∑m((yi−y)xi)
标记下面这一步的分母
k 0 = ∑ i = 1 m ( ( y i − y ‾ ) x i ) ∑ i = 1 m ( ( x i − x ‾ ) x i ) k_0=\frac {\sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)x_i)} k0=∑i=1m((xi−x)xi)∑i=1m((yi−y)xi)
k 0 = ∑ i = 1 m ( ( y i − y ‾ ) x i ) ∑ i = 1 m ( ( x i − x ‾ ) ( x i − x ‾ + x ‾ ) ) k_0=\frac {\sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)(x_i-\overline x+\overline x))} k0=∑i=1m((xi−x)(xi−x+x))∑i=1m((yi−y)xi)
k 0 = ∑ i = 1 m ( ( y i − y ‾ ) x i ) ∑ i = 1 m ( ( x i − x ‾ ) ( x i − x ‾ + x ‾ ) ) k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)(x_i-\overline x+\overline x))} k0=∑i=1m((xi−x)(xi−x+x))∑i=1m((yi−y)xi)
k 0 = ∑ i = 1 m ( ( y i − y ‾ ) x i ) ∑ i = 1 m ( ( x i − x ‾ ) 2 + x ‾ ( x i − x ‾ ) ) k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)^2+\overline x(x_i-\overline x))} k0=∑i=1m((xi−x)2+x(xi−x))∑i=1m((yi−y)xi)
k 0 = ∑ i = 1 m ( ( y i − y ‾ ) x i ) ∑ i = 1 m ( ( x i − x ‾ ) 2 + ∑ i = 1 m ( x i − x ‾ ) x ‾ ) k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)^2+ \sum_{i=1}^m(x_i-\overline x)\overline x)} k0=∑i=1m((xi−x)2+∑i=1m(xi−x)x)∑i=1m((yi−y)xi)
k 0 = ∑ i = 1 m ( ( y i − y ‾ ) x i ) ∑ i = 1 m ( ( x i − x ‾ ) 2 + x ‾ ∑ i = 1 m ( x i − x ‾ ) ) k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)^2+ \overline x \sum_{i=1}^m(x_i-\overline x))} k0=∑i=1m((xi−x)2+x∑i=1m(xi−x))∑i=1m((yi−y)xi)
此时要看准分母中的 ∑ i = 1 m ( x i − x ‾ ) \sum_{i=1}^m(x_i-\overline x) ∑i=1m(xi−x)是必等于0的,因此
k 0 = ∑ i = 1 m ( ( y i − y ‾ ) x i ) ∑ i = 1 m ( x i − x ‾ ) 2 k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m(x_i - \overline x)^2} k0=∑i=1m(xi−x)2∑i=1m((yi−y)xi)
在另外一个层面上,也就说明原来的分母式子 ∑ i = 1 m ( ( x i − x ‾ ) x i ) \sum_{i=1}^m((x_i - \overline x)x_i) ∑i=1m((xi−x)xi)等于现在的分母式子 ∑ i = 1 m ( x i − x ‾ ) 2 \sum_{i=1}^m(x_i - \overline x)^2 ∑i=1m(xi−x)2,同理得出 x i = ( x i − x ‾ ) x_i=(xi- \overline x) xi=(xi−x),将这个式子带入分子,得到
k 0 = ∑ i = 1 m ( ( y i − y ‾ ) ( x i − x ‾ ) ) ∑ i = 1 m ( x i − x ‾ ) 2 k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)(xi- \overline x))}{\sum_{i=1}^m(x_i - \overline x)^2} k0=∑i=1m(xi−x)2∑i=1m((yi−y)(xi−x))
至此,推理完成。对于斜率k和截距b的回归模型总结如下,这个公式又叫“最小二乘法公式”:
k 0 = ∑ i = 1 m ( ( y i − y ‾ ) ( x i − x ‾ ) ) ∑ i = 1 m ( x i − x ‾ ) 2 k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)(xi- \overline x))}{\sum_{i=1}^m(x_i - \overline x)^2} k0=∑i=1m(xi−x)2∑i=1m((yi−y)(xi−x))
d 0 = y ‾ − x ‾ k d_0=\overline y -\overline x k d0=y−xk
再解释一下就是伴随着新的样本增加进来,我们可以通过样本训练,反推出预测模型的最优参数k和b。仔细想想,之前有一些概率模型都是有自己特有的参数的,通过不断修改这些参数的值,可以使其预测更加准确。
虽然这个推理是挺简单的,但是我搞了两天的时间才推理出来。终于可以开香槟了