Warum ist das Produkt der bivariaten Regressionskoeffizienten der

11

Es gibt ein Regressionsmodell mit mit und , das einen Korrelationskoeffizienten von $Y = a + bX$ $a = 1.6$ $b=0.4$ . $r = 0.60302$

Wenn und dann umgeschaltet werden und die Gleichung zu wobei und , hat sie auch einen Wert von $X$ $Y$ $X = c + dY$ $c=0.4545$ $d=0.9091$ $r$ $0.60302$ .

Ich hoffe, jemand kann erklären, warum auch . $(d\times b)^{0.5}$ $0.60302$

correlation regression-coefficients Mike
quelle

17

und $b = r \; \text{SD}_y / \text{SD}_x$ , also . $d = r \; \text{SD}_x / \text{SD}_y$ $b\times d = r^2$

Viele Statistiklehrbücher würden dies ansprechen; Ich mag Freedman et al., Statistik . Siehe auch hier und diesen Wikipedia-Artikel .

Karl
quelle

10

Schauen Sie sich dreizehn Möglichkeiten an, den Korrelationskoeffizienten zu betrachten - und insbesondere die Möglichkeiten 3, 4, 5 sind für Sie von größtem Interesse.

Rodgers, JL & Nicewander, WA (1988). Dreizehn Möglichkeiten, den Korrelationskoeffizienten zu betrachten . The American Statistician, 42, 1 , S. 59-66.

Neugierig
quelle

2

Dies sollte wahrscheinlich ein Kommentar gewesen sein. Beachten Sie, dass der Link nicht mehr funktioniert. Ich habe den Link aktualisiert und ein vollständiges Zitat bereitgestellt. Können Sie dies näher erläutern oder zusätzliche Informationen bereitstellen, damit dies auch dann noch wertvoll ist, wenn der Link erneut unterbrochen wird?

Gung - Reinstate Monica

2

Der Artikel von Rodgers & Nicewander ist auf unserer Website unter stats.stackexchange.com/q/70969/22228 zusammengefasst .

whuber

3

$\DeclareMathOperator{\Cov}{Cov}$ $\DeclareMathOperator{\Corr}{Corr}$ $\DeclareMathOperator{\SD}{SD}$ $\DeclareMathOperator{\Var}{Var}$ $\DeclareMathOperator{\sgn}{sgn}$ $\DeclareMathOperator{\nsum}{\sum_{i=1}^{n}}$

Denken Sie daran, dass viele Einführungstexte definieren

S_{x y} = \sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})

$S_{xy} = \nsum (x_i - \bar x)(y_i - \bar y)$

Dann setzen wir durch Setzen von als und in ähnlicher Weise $y$ $x$ $S_{xx} = \nsum (x_i - \bar x)^2$ $S_{yy} = \nsum (y_i - \bar y)^2$ .

Formeln für den Korrelationskoeffizienten , die Steigung der on- Regression (Ihr ) und die Steigung der on- Regression (Ihr ) werden häufig wie folgt angegeben: $r$ $y$ $x$ $b$ $x$ $y$ $d$

\begin{aligned} (1) & r & = \frac{S_{x y}}{\sqrt{S_{x x} S_{y y}}} \\ (2) & {\hat{β}}_{y on x} & = \frac{S_{x y}}{S_{x x}} \\ (3) & {\hat{β}}_{x on y} & = \frac{S_{x y}}{S_{y y}} \end{aligned}

$\begin{align} r &= \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} \tag{1} \\ \hat \beta_{y\text{ on }x} &= \frac{S_{xy}}{S_{xx}} \tag{2} \\ \hat \beta_{x\text{ on }y} &= \frac{S_{xy}}{S_{yy}} \tag{3} \end{align}$

Dann ergibt das Multiplizieren von und eindeutig das Quadrat von : $(2)$ $(3)$ $(1)$

{\hat{β}}_{y on x} \cdot {\hat{β}}_{x on y} = \frac{S_{x y}^{2}}{S_{x x} S_{y y}} = r^{2}

$\hat \beta_{y\text{ on }x} \cdot \hat \beta_{x\text{ on }y} = \frac{S_{xy}^2}{S_{xx}S_{yy}} = r^2$

Alternativ werden die Zähler und Nenner der Brüche in , und häufig durch oder so dass die Dinge in Bezug auf Stichproben oder geschätzte Varianzen und Kovarianzen gerahmt werden. Zum Beispiel ist aus der geschätzte Korrelationskoeffizient nur die geschätzte Kovarianz, skaliert durch die geschätzten Standardabweichungen: $(1)$ $(2)$ $(3)$ $n$ $(n-1)$ $(1)$

\begin{aligned} (4) & r & = \hat{Corr} (X, Y) = \frac{\hat{Cov} (X, Y)}{\hat{SD (X)} \hat{SD (Y)}} \\ (5) & {\hat{β}}_{y on x} & = \frac{\hat{Cov} (X, Y)}{\hat{Var (X)}} \\ (6) & {\hat{β}}_{x on y} & = \frac{\hat{Cov} (X, Y)}{\hat{Var (Y)}} \end{aligned}

$\begin{align} r &= \widehat \Corr(X,Y) = \frac{\widehat \Cov(X,Y)}{\widehat{\SD(X)}\widehat{\SD(Y)}} \tag{4} \\ \hat \beta_{y\text{ on }x} &= \frac{\widehat \Cov(X,Y)}{\widehat{\Var(X)}} \tag{5} \\ \hat \beta_{x\text{ on }y} &= \frac{\widehat \Cov(X,Y)}{\widehat{\Var(Y)}} \tag{6} \end{align}$

We then immediately find from multiplying $(5)$ and $(6)$ that

{\hat{β}}_{y on x} {\hat{β}}_{x on y} = \frac{\hat{Cov} (X, Y)^{2}}{\hat{Var (X)} \hat{Var (Y)}} = {(\frac{\hat{Cov} (X, Y)}{\hat{SD (X)} \hat{SD (Y)}})}^{2} = r^{2}

$\hat \beta_{y\text{ on }x} \hat \beta_{x\text{ on }y} = \frac{\widehat \Cov(X,Y)^2}{\widehat{\Var(X)}\widehat{\Var(Y)}} = \left( \frac{\widehat \Cov(X,Y)}{\widehat{\SD(X)}\widehat{\SD(Y)}} \right)^2 = r^2$

We might instead have rearranged $(4)$ to write the covariance as a "scaled-up" correlation:

\begin{matrix} (7) & \hat{Cov} (X, Y) = r \cdot \hat{SD (X)} \hat{SD (Y)} \end{matrix}

$\widehat \Cov(X,Y) = r\cdot \widehat{\SD(X)} \widehat{\SD(Y)} \tag{7}$

Then by substituting $(7)$ into $(5)$ and $(6)$ we could rewrite the regression coefficients as $\hat \beta_{y\text{ on }x} = r \frac{\widehat \SD(y)}{\widehat \SD(x)}$ and $\hat \beta_{x\text{ on }y} = r \frac{\widehat \SD(x)}{\widehat \SD(y)}$ . Multiplying these together would also produce $r^2$ , and this is @Karl's solution. Writing the slopes in this way helps explain how we can see the correlation coefficient as a standardized regression slope.

Finally note that in your case $r = \sqrt{bd} =\sqrt{\hat \beta_{y\text{ on }x} \hat \beta_{x\text{ on }y}}$ but this was because your correlation was positive. If your correlation were negative, then you would have to take the negative root.

To work out whether your correlation is positive or negative, you simply need to regard the sign (plus or minus) of your regression coefficient — it doesn't matter whether you look at the $y$ -on-0 $x$ or $x$ -on- $y$ as their signs will be the same. So you can use the formula:

r = sgn ({\hat{β}}_{y on x}) \sqrt{{\hat{β}}_{y on x} {\hat{β}}_{x on y}}

$r = \sgn(\hat \beta_{y\text{ on }x}) \sqrt{\hat \beta_{y\text{ on }x} \hat \beta_{x\text{ on }y}}$

where $\sgn$ is the signum function, i.e. is $+1$ if the slope is positive and $-1$ if the slope is negative.

Silverfish
quelle

1

You might find this answer of mine to be of interest even though it does not explicitly address the question asked here.

Dilip Sarwate

Warum ist das Produkt der bivariaten Regressionskoeffizienten der

Antworten: