Ich würde gerne wissen, ob es sinnvoll ist, die Diagramme der Residuen in Bezug auf die abhängige Variable zu untersuchen, wenn ich eine univariate Regression habe. Wenn es sinnvoll ist, was bedeutet eine starke, lineare, wachsende Korrelation zwischen Residuen (auf der y-Achse) und den geschätzten Werten der abhängigen Variablen (auf der x-Achse)?
regression
residuals
Luigi
quelle
quelle
Antworten:
Angenommen, Sie haben die Regression , wobei β 1 ≈ 0 ist . Dann y i - β 0 & ap ; & egr; i . Je höher der y- Wert ist, desto größer ist der Rest. Im Gegenteil, eine Auftragung der Residuen gegen x sollte keine systematische Beziehung zeigen. Auch der vorhergesagte Wert y i sollte etwa β 0yi=β0+β1xi+ϵi β1≈0 yi−β0≈ϵi y x y^i β^0 --- das gleiche für jede Beobachtung. Wenn alle vorhergesagten Werte ungefähr gleich sind, sollten sie nicht mit den Fehlern korreliert sein.
Die Handlung sagt mir, dass und y im Wesentlichen nichts miteinander zu tun haben (natürlich gibt es bessere Möglichkeiten, dies zu zeigen). Lassen Sie uns wissen , wenn Ihr Koeffizient ß 1 nicht nahe 0 ist.x y β^1
Verwenden Sie zur besseren Diagnose eine grafische Darstellung der Residuen gegen den vorhergesagten Lohn oder gegen den Wert. Sie sollten in diesen Darstellungen kein unterscheidbares Muster beobachten.x
Wenn Sie eine kleine R-Demonstration wünschen, können Sie loslegen:
quelle
Vorausgesetzt, das geschätzte Modell ist korrekt angegeben ...
So the scatter-plot of residuals against predicted dependent variable should show no correlation.
But!
The matrixσ2(I−PX) is a projection matrix, its eigenvalues are 0 or +1, it's positive semidefinite. So it should have non-negative values on the diagonal. So the scatter-plot of residuals against original dependent variable should show positive correlation.
As far as i know Gretl produces by default the graph of residuals against original dependent variable (not the predicted one!).
quelle
Is it possible you are confusing fitted/predicted values with the actual values?
As @gung and @biostat have said, you hope there is no relationship between fitted values and residuals. On the other hand, finding a linear relationship between the actual values of the dependent/outcome variable and the residuals is to be expected and is not particularly informative.
Added to clarify the previous sentence: Not just any linear relationship between residuals and actual values of the out come is to be expected... For low measured values of Y, the predicted values of Y from a useful model will tend to be higher than the actual measured values, and vice versa.
quelle
The answers offered are giving me some ideas about what's going on here. I do believe there may have been some mistakes made by accident. See if the following story makes sense: To start, I think there is probably a strong relationship between X & Y in the data (here's some code and a plot):
But by mistake Y was predicted just from the mean. Compounding this, the residuals from the mean only model are plotted against X, even though what was intended was to plot against the fitted values (code & plot):
We can fix this by fitting the appropriate model and plotting the residuals from that (code & plot):
This seems like just the kinds of goof-ups I made when I was starting.
quelle
This graph indicates that the model you fitted is not good. As @gung said in the first comments on the main question that there should be no relationship between predicated response and residual.
" an analyst should expect a regression model to err in predicting a response in a random fashion; the model should predict values higher than actual and lower than actual with equal probability. See this"
I would recommend first plot response vs independent variable to see the relationship between them. It might be reasonable to add polynomial terms in the model.
quelle
Isn't this what happens if there is no relationship between the X & Y variable? From looking at this graph, it appears you are essentially predicting Y with it's mean.
quelle
I think OP plotted residuals vs. the original response variable (not the fitted response variable from the model). I see plots like this all the time, with nearly the same exact pattern. Make sure you plot residuals vs. fitted values, as I'm not sure what meaningful inference you could gather from residuals vs. original Y. But I could certainly be wrong.
quelle