Warum liefert die Inversion einer Kovarianzmatrix teilweise Korrelationen zwischen Zufallsvariablen?

Ich habe gehört, dass partielle Korrelationen zwischen Zufallsvariablen gefunden werden können, indem die Kovarianzmatrix invertiert und entsprechende Zellen aus dieser resultierenden Präzisionsmatrix entnommen werden (diese Tatsache wird in http://en.wikipedia.org/wiki/Partial_correlation erwähnt , aber ohne Beweis). .

Warum ist das so?

covariance covariance-matrix linear-algebra partial-correlation matrix-inverse michal
quelle

Wenn Sie eine partielle Korrelation in einer Zelle erhalten möchten, die für alle anderen Variablen gesteuert wird, kann der letzte Absatz hier Aufschluss geben.

TTNPHNS

Antworten:

Wenn eine multivariate Zufallsvariable $(X_1,X_2,\ldots,X_n)$ eine nicht entartete Kovarianzmatrix $\mathbb{C} = (\gamma_{ij}) = (\text{Cov}(X_i,X_j))$ , ist die Menge aller reellen linearen Kombinationen von $X_i$ bildet einen $n$ dimensionalen reellen Vektorraum mit der Basis $E=(X_1,X_2,\ldots, X_n)$ und ein nicht entartetes inneres Produkt gegeben durch

⟨ X_{i}, X_{j} ⟩ = γ_{i j} .

$\langle X_i,X_j \rangle = \gamma_{ij}\ .$

Seine doppelte Basis in Bezug auf dieses innere Produkt , $E^{*} = (X_1^{*},X_2^{*}, \ldots, X_n^{*})$ , ist eindeutig durch die Beziehungen definiert

⟨ X_{i}^{*}, X_{j} ⟩ = δ_{i j},

$\langle X_i^{*}, X_j \rangle = \delta_{ij}\ ,$

das Kronecker-Delta (gleich wenn und sonst ). $1$ $i=j$ $0$

Die duale Basis ist hier von Interesse, weil die partielle Korrelation von und als die Korrelation zwischen dem Teil von wird, der übrig bleibt, nachdem er in den von allen anderen Vektoren aufgespannten Raum projiziert wurde (nennen wir es einfach seine " Residuum ", ) und der vergleichbare Teil von , sein Residuum . Dennoch ist ein Vektor, der zu allen Vektoren außer orthogonal ist und ein positives inneres Produkt mit aus dem $X_i$ $X_j$ $X_i$ $X_{i\circ}$ $X_j$ $X_{j\circ}$ $X_i^{*}$ $X_i$ $X_i$ muss ein nicht negatives Vielfaches von , ebenso für . Lassen Sie uns deshalb schreiben $X_{i\circ}$ $X_i^{*}$ $X_j$

X_{i \circ} = λ_{i} X_{i}^{*}, X_{j \circ} = λ_{j} X_{j}^{*}

$X_{i\circ} = \lambda_i X_i^{*},\ X_{j\circ} = \lambda_j X_j^{*}$

für positive reelle Zahlen und . $\lambda_i$ $\lambda_j$

Die partielle Korrelation ist das normalisierte Punktprodukt der Residuen, das durch Neuskalierung unverändert bleibt:

ρ_{i j \circ} = \frac{⟨ X_{i \circ}, X_{j \circ} ⟩}{\sqrt{⟨ X_{i \circ}, X_{i \circ} ⟩ ⟨ X_{j \circ}, X_{j \circ} ⟩}} = \frac{λ_{i} λ_{j} ⟨ X_{i}^{*}, X_{j}^{*} ⟩}{\sqrt{λ_{i}^{2} ⟨ X_{i}^{*}, X_{i}^{*} ⟩ λ_{j}^{2} ⟨ X_{j}^{*}, X_{j}^{*} ⟩}} = \frac{⟨ X_{i}^{*}, X_{j}^{*} ⟩}{\sqrt{⟨ X_{i}^{*}, X_{i}^{*} ⟩ ⟨ X_{j}^{*}, X_{j}^{*} ⟩}} .

$\rho_{ij\circ} = \frac{\langle X_{i\circ}, X_{j\circ} \rangle}{\sqrt{\langle X_{i\circ}, X_{i\circ} \rangle\langle X_{j\circ}, X_{j\circ} \rangle}} = \frac{\lambda_i\lambda_j\langle X_{i}^{*}, X_{j}^{*} \rangle}{\sqrt{\lambda_i^2\langle X_{i}^{*}, X_{i}^{*} \rangle\lambda_j^2\langle X_{j}^{*}, X_{j}^{*} \rangle}} = \frac{\langle X_{i}^{*}, X_{j}^{*} \rangle}{\sqrt{\langle X_{i}^{*}, X_{i}^{*} \rangle\langle X_{j}^{*}, X_{j}^{*} \rangle}}\ .$

(In beiden Fällen ist die partielle Korrelation immer dann Null, wenn die Residuen orthogonal sind, unabhängig davon, ob sie nicht Null sind oder nicht.)

Wir müssen die inneren Produkte der dualen Basiselemente finden. Erweitern Sie zu diesem Zweck die dualen Basiselemente in Bezug auf die ursprüngliche Basis : $E$

X_{i}^{*} = \sum_{j = 1}^{n} β_{i j} X_{j} .

$X_i^{*} = \sum_{j=1}^n \beta_{ij} X_j\ .$

Dann per definitionem

δ_{i k} = ⟨ X_{i}^{*}, X_{k} ⟩ = \sum_{j = 1}^{n} β_{i j} ⟨ X_{j}, X_{k} ⟩ = \sum_{j = 1}^{n} β_{i j} γ_{j k} .

$\delta_{ik} = \langle X_i^{*}, X_k \rangle = \sum_{j=1}^n \beta_{ij}\langle X_j, X_k \rangle = \sum_{j=1}^n \beta_{ij}\gamma_{jk}\ .$

In matrix notation with $\mathbb{I} = (\delta_{ij})$ the identity matrix and $\mathbb{B} = (\beta_{ij})$ the change-of-basis matrix, this states

I = B C .

$\mathbb{I} = \mathbb{BC}\ .$

Das heißt, , genau das, was der Wikipedia - Artikel behauptet. Die vorige Formel für die Teilkorrelation gibt $\mathbb{B} = \mathbb{C}^{-1}$

ρ_{i j \cdot} = \frac{β_{i j}}{\sqrt{β_{i i} β_{j j}}} = \frac{C_{i j}^{- 1}}{\sqrt{C_{i i}^{- 1} C_{j j}^{- 1}}} .

$\rho_{ij\cdot} = \frac{\beta_{ij}}{\sqrt{\beta_{ii} \beta_{jj}}} = \frac{\mathbb{C}^{-1}_{ij}}{\sqrt{\mathbb{C}^{-1}_{ii} \mathbb{C}^{-1}_{jj}}}\ .$

whuber
quelle

+1, great answer. But why do you call this dual basis "dual basis with respect to this inner product" -- what does "with respect to this inner product" exactly mean? It seems that you use the term "dual basis" as defined here mathworld.wolfram.com/DualVectorSpace.html in the second paragraph ("Given a vector space basis

v_{1}, . . ., v_{n}

$v_1, ..., v_n$ for

V

$V$ there exists a dual basis...") or here en.wikipedia.org/wiki/Dual_basis, and it's independent of any scalar product.

amoeba says Reinstate Monica

@amoeba There are two kinds of duals. The (natural) dual of any vector space

V

$V$ over a field

R

$R$ is the set of linear functions

ϕ : V \to R

$\phi:V\to R$ , called

V^{*}

$V^*$ . There is no canonical way to identify

V^{*}

$V^*$ with

V

$V$ , even though they have the same dimension when

V

$V$ is finite-dimensional. Any inner product

γ

$\gamma$ corresponds to such a map

g : V \to V^{*}

$g:V\to V^*$ , and vice versa, via

g (v) (w) = γ (v, w) .

$g(v)(w)=\gamma(v,w).$ (Nondegeneracy of

γ

$\gamma$ ensures

g

$g$ is a vector space isomorphism.) This gives a way to view elements of

V

$V$ as if they were elements of the dual

V^{*}

$V^*$ --but it depends on

γ

$\gamma$ .

whuber

@mpettis Those dots were hard to notice. I have replaced them with small open circles to make the notation easier to read. Thanks for pointing this out.

whuber

@Andy Ron Christensen's Plane Answers to Complex Questions might be the sort of thing you are looking for. Unfortunately, his approach makes (IMHO) undue reliance on coordinate arguments and calculations. In the original introduction (see p. xiii), Christensen explains that's for pedagogical reasons.

whuber

@whuber, Your proof is awesome. I wonder whether any book or article contains such a proof so that I can cite.

Harry

Here is a proof with just matrix calculations.

I appreciate the answer by whuber. It is very insightful on the math behind the scene. However, it is still not so trivial how to use his answer to obtain the minus sign in the formula stated in the wikipediaPartial_correlation#Using_matrix_inversion.

ρ_{X_{i} X_{j} \cdot V ∖ {X_{i}, X_{j}}} = - \frac{p_{i j}}{\sqrt{p_{i i} p_{j j}}}

$\rho_{X_iX_j\cdot \mathbf{V} \setminus \{X_i,X_j\}} = - \frac{p_{ij}}{\sqrt{p_{ii}p_{jj}}}$

To get this minus sign, here is a different proof I found in "Graphical Models Lauriten 1995 Page 130". It is simply done by some matrix calculations.

The key is the following matrix identity:

{(\begin{matrix} A & B \\ C & D \end{matrix})}^{- 1} = (\begin{matrix} E^{- 1} & - E^{- 1} G \\ - F E^{- 1} & D^{- 1} + F E^{- 1} G \end{matrix})

$\begin{pmatrix} A & B \\ C & D \end{pmatrix}^{-1} = \begin{pmatrix} E^{-1} & -E^{-1}G \\ -FE^{-1} & D^{-1}+FE^{-1}G \end{pmatrix}$ where

E = A - B D^{- 1} C

$E = A - BD^{-1}C$ ,

F = D^{- 1} C

$F = D^{-1}C$ and

G = B D^{- 1}

$G = BD^{-1}$ .

Write down the covariance matrix as

Ω = (\begin{matrix} Ω_{11} & Ω_{12} \\ Ω_{21} & Ω_{22} \end{matrix})

$\Omega = \begin{pmatrix} \Omega_{11} & \Omega_{12} \\ \Omega_{21} & \Omega_{22} \end{pmatrix}$ where

Ω_{11}

$\Omega_{11}$ is covariance matrix of

(X_{i}, X_{j})

$(X_i, X_j)$ and

Ω_{22}

$\Omega_{22}$ is covariance matrix of

V ∖ {X_{i}, X_{j}}

$\mathbf{V} \setminus \{X_i, X_j \}$ .

Let $P = \Omega^{-1}$ . Similarly, write down $P$ as

P = (\begin{matrix} P_{11} & P_{12} \\ P_{21} & P_{22} \end{matrix})

$P = \begin{pmatrix} P_{11} & P_{12} \\ P_{21} & P_{22} \end{pmatrix}$

By the key matrix identity,

P_{11}^{- 1} = Ω_{11} - Ω_{12} Ω_{22}^{- 1} Ω_{21}

$P_{11}^{-1} = \Omega_{11} - \Omega_{12}\Omega_{22}^{-1}\Omega_{21}$

We also know that $\Omega_{11} - \Omega_{12}\Omega_{22}^{-1}\Omega_{21}$ is the covariance matrix of $(X_i, X_j) | \mathbf{V} \setminus \{X_i, X_j\}$ (from Multivariate_normal_distribution#Conditional_distributions). The partial correlation is therefore

ρ_{X_{i} X_{j} \cdot V ∖ {X_{i}, X_{j}}} = \frac{[P_{11}^{- 1}]_{12}}{\sqrt{[P_{11}^{- 1}]_{11} [P_{11}^{- 1}]_{22}}} .

$\rho_{X_iX_j\cdot \mathbf{V} \setminus \{X_i,X_j\}} = \frac{[P_{11}^{-1}]_{12}}{\sqrt{[P_{11}^{-1}]_{11}[P_{11}^{-1}]_{22}}}.$ I use the notation that the

(k, l)

$(k,l)$ th entry of the matrix

M

$M$ is denoted by

[M]_{k l}

$[M]_{kl}$ .

Just simple inversion formula of 2-by-2 matrix,

(\begin{matrix} [P_{11}^{- 1}]_{11} & [P_{11}^{- 1}]_{12} \\ [P_{11}^{- 1}]_{21} & [P_{11}^{- 1}]_{22} \end{matrix}) = P_{11}^{- 1} = \frac{1}{det P_{11}} (\begin{matrix} [P_{11}]_{22} & - [P_{11}]_{12} \\ - [P_{11}]_{21} & [P_{11}]_{11} \end{matrix})

$\begin{pmatrix} [P_{11}^{-1}]_{11} & [P_{11}^{-1}]_{12} \\ [P_{11}^{-1}]_{21} & [P_{11}^{-1}]_{22} \\ \end{pmatrix} = P_{11}^{-1} = \frac{1}{\text{det} P_{11}} \begin{pmatrix} [P_{11}]_{22} & -[P_{11}]_{12} \\ -[P_{11}]_{21} & [P_{11}]_{11} \\ \end{pmatrix}$

Therefore,

ρ_{X_{i} X_{j} \cdot V ∖ {X_{i}, X_{j}}} = \frac{[P_{11}^{- 1}]_{12}}{\sqrt{[P_{11}^{- 1}]_{11} [P_{11}^{- 1}]_{22}}} = \frac{- \frac{1}{det P_{11}} [P_{11}]_{12}}{\sqrt{\frac{1}{det P_{11}} [P_{11}]_{22} \frac{1}{det P_{11}} [P_{11}]_{11}}} = \frac{- [P_{11}]_{12}}{\sqrt{[P_{11}]_{22} [P_{11}]_{11}}}

$\rho_{X_iX_j\cdot \mathbf{V} \setminus \{X_i,X_j\}} = \frac{[P_{11}^{-1}]_{12}}{\sqrt{[P_{11}^{-1}]_{11}[P_{11}^{-1}]_{22}}} = \frac{- \frac{1}{\text{det}P_{11}}[P_{11}]_{12}}{\sqrt{\frac{1}{\text{det}P_{11}}[P_{11}]_{22}\frac{1}{\text{det}P_{11}}[P_{11}]_{11}}} = \frac{-[P_{11}]_{12}}{\sqrt{[P_{11}]_{22}[P_{11}]_{11}}}$ which is exactly what the Wikipedia article is asserting.

Po C.
quelle

If we let i=j, then rho_ii V\{X_i, X_i} = -1, How do we interpret those diagonal elements in the precision matrix?

Jason

Good point. The formula should be only valid for i=/=j. From the proof, the minus sign comes from the 2-by-2 matrix inversion. It would not happen if i=j.

Po C.

So the diagonal numbers can't be associated with partial correlation. What do they represent? They are not just inverses of the variances, are they?

Jason

This formula is valid for i=/=j. It is meaningless for i=j.

Po C.

Note that the sign of the answer actually depends on how you define partial correlation. There is a difference between regressing $X_i$ and $X_j$ on the other $n - 1$ variables separately vs. regressing $X_i$ and $X_j$ on the other $n - 2$ variables together. Under the second definition, let the correlation between residuals $\epsilon_i$ and $\epsilon_j$ be $\rho$ . Then the partial correlation of the two (regressing $\epsilon_i$ on $\epsilon_j$ and vice versa) is $-\rho$ .

This explains the confusion in the comments above, as well as on Wikipedia. The second definition is used universally from what I can tell, so there should be a negative sign.

I originally posted an edit to the other answer, but made a mistake - sorry about that!

Johnny Ho
quelle