Anmerkung: = Summe der Quadrate insgesamt, = Summe der quadrierten Fehler und = Regressionssumme der Quadrate. Die Gleichung im Titel wird oft geschrieben als:
Ziemlich einfache Frage, aber ich suche nach einer intuitiven Erklärung. Intuitiv scheint es mir sinnvoller zu sein, zu sein. Zum Beispiel suppose Punkt y-Wert hat , entspricht und y i = 3 , in dem y i der entsprechende Punkt auf der Regressionsgeraden ist. Nehmen Sie außerdem an, dass der mittlere y-Wert für den Datensatz ˉ y = 0 ist . Dann für diesen bestimmten Punkt i, S , währendund. Offensichtlich. Würde dieses Ergebnis nicht auf den gesamten Datensatz verallgemeinern? Ich verstehe es nicht.S S R
regression
least-squares
r-squared
Nocken
quelle
quelle
Antworten:
Addieren und Subtrahieren ergibt Wir müssen also zeigen, dass∑ n i = 1 ist
Actually, I think (a) is easier to show in matrix notation for general multiple regression of which the single variable case is a special case:
quelle
(1) Intuition for whySST=SSR+SSE
When we try to explain the total variation in Y (SST ) with one explanatory variable, X, then there are exactly two sources of variability. First, there is the variability captured by X (Sum Square Regression), and second, there is the variability not captured by X (Sum Square Error). Hence, SST=SSR+SSE (exact equality).
(2) Geometric intuition
Please see the first few pictures here (especially the third): https://sites.google.com/site/modernprogramevaluation/variance-and-bias
Some of the total variation in the data (distance from datapoint toY¯ ) is captured by the regression line (the distance from the regression line to Y¯ ) and error (distance from the point to the regression line). There's not room left for SST to be greater than SSE+SSR .
(3) The problem with your illustration
You can't look at SSE and SSR in a pointwise fashion. For a particular point, the residual may be large, so that there is more error than explanatory power from X. However, for other points, the residual will be small, so that the regression line explains a lot of the variability. They will balance out and ultimatelySST=SSR+SSE . Of course this is not rigorous, but you can find proofs like the above.
Also notice that regression will not be defined for one point:b1=∑(Xi−X¯)(Yi−Y¯)∑(Xi−X¯)2 , and you can see that the denominator will be zero, making estimation undefined.
Hope this helps.
--Ryan M.
quelle
When an intercept is included in linear regression(sum of residuals is zero),SST=SSE+SSR .
proveSST====∑i=1n(yi−y¯)2∑i=1n(yi−y^i+y^i−y¯)2∑i=1n(yi−y^i)2+2∑i=1n(yi−y^i)(y^i−y¯)+∑i=1n(y^i−y¯)2SSE+SSR+2∑i=1n(yi−y^i)(y^i−y¯)
Just need to prove last part is equal to 0:
∑i=1n(yi−y^i)(y^i−y¯)==∑i=1n(yi−β0−β1xi)(β0+β1xi−y¯)(β0−y¯)∑i=1n(yi−β0−β1xi)+β1∑i=1n(yi−β0−β1xi)xi
In Least squares regression, the sum of the squares of the errors is minimized.
SSE=∑i=1n(ei)2=∑i=1n(yi−yi^)2=∑i=1n(yi−β0−β1xi)2
Take the partial derivative of SSE with respect to β0 and setting it to zero.
∂SSE∂β0=∑i=1n2(yi−β0−β1xi)1=0
So
∑i=1n(yi−β0−β1xi)1=0
Take the partial derivative of SSE with respect to β1 and setting it to zero.
∂SSE∂β1=∑i=1n2(yi−β0−β1xi)1xi=0
So
∑i=1n(yi−β0−β1xi)1xi=0
Hence,
∑i=1n(yi−y^i)(y^i−y¯)=(β0−y¯)∑i=1n(yi−β0−β1xi)+β1∑i=1n(yi−β0−β1xi)xi=0
SST=SSE+SSR+2∑i=1n(yi−y^i)(y^i−y¯)=SSE+SSR
quelle
This is just the Pythagorean theorem!
quelle