Dichte der Normalverteilung mit zunehmenden Abmessungen

Die Frage, die ich stellen möchte, lautet: Wie ändert sich der Anteil der Stichproben innerhalb von 1 SD des Mittelwerts einer Normalverteilung, wenn die Anzahl der Variablen zunimmt?

(Fast) jeder weiß, dass in einer eindimensionalen Normalverteilung 68% der Proben innerhalb einer Standardabweichung vom Mittelwert gefunden werden können. Was ist mit 2, 3, 4, ... Dimensionen? Ich weiß, es wird weniger ... aber um wie viel (genau)? Es wäre praktisch, eine Tabelle mit den Zahlen für 1, 2, 3 ... 10 Dimensionen sowie 1, 2, 3 ... 10 SDs zu haben. Kann jemand auf eine solche Tabelle verweisen?

Ein bisschen mehr Kontext - Ich habe einen Sensor, der Daten auf bis zu 128 Kanälen liefert. Jeder Kanal ist (unabhängigen) elektrischen Störungen ausgesetzt. Wenn ich ein Kalibrierungsobjekt wahrnehme, kann ich eine ausreichende Anzahl von Messungen mitteln und einen Mittelwert über die 128 Kanäle sowie 128 einzelne Standardabweichungen erhalten.

ABER ... wenn es um die einzelnen Momentanablesungen geht, antworten die Daten nicht so sehr wie bei 128 Einzelablesungen, sondern wie bei einer einzelnen Ablesung einer (bis zu) 128-dimensionalen Vektorgröße. Dies ist sicherlich die beste Methode, um die wenigen kritischen Messwerte zu behandeln (normalerweise 4-6 von 128).

Ich möchte ein Gefühl dafür bekommen, was "normale" Variation ist und was "Ausreißer" in diesem Vektorraum ist. Ich bin mir sicher, dass ich einen Tisch wie den von mir beschriebenen gesehen habe, der auf diese Art von Situation zutrifft - kann jemand auf einen zeigen?

normal-distribution multivariate-analysis omatai
quelle

Bitte - kann ich nur empirische Antworten haben - ich verstehe die meisten mathematischen Notationen nicht.

Omatai

Nehmen wir : Jedes ist normal und das ist unabhängig - ich denke, das ist, was Sie mit höheren Dimensionen meinen. $X = (X_1,\dots,X_d) \sim N(0,I)$ $X_i$ $N(0,1)$ $X_i$

Man würde sagen, dass innerhalb von 1 sd des Mittelwerts liegt, wenn (der Abstand zwischen X und seinem Mittelwert ist kleiner als 1). Jetzt also geschieht dies mit der Wahrscheinlichkeit wobei $X$ $||X|| < 1$ $||X||^2 = X_1^2 +\cdots+X_d^2\sim \chi^2(d)$ $P( \xi < 1 )$ $\xi\sim\chi^2(d)$ . Sie können dies in guten Chi-Quadrat-Tischen finden ...

Hier sind einige Werte:

\begin{array}{ll} d & P (ξ < 1) \\ 1 & 0.68 \\ 2 & 0.39 \\ 3 & 0.20 \\ 4 & 0.090 \\ 5 & 0.037 \\ 6 & 0.014 \\ 7 & 0.0052 \\ 8 & 0.0018 \\ 9 & 0.00056 \\ 10 & 0.00017 \end{array}

$\begin{array}{ll} d& P(\xi < 1)\\ 1 & 0.68\\ 2 & 0.39 \\ 3 & 0.20 \\ 4 & 0.090 \\ 5 & 0.037 \\ 6 & 0.014 \\ 7 & 0.0052 \\ 8 & 0.0018\\ 9 & 0.00056\\ 10& 0.00017\\ \end{array}$

Und für 2 sd:

\begin{array}{ll} d & P (ξ < 4) \\ 1 & 0.95 \\ 2 & 0.86 \\ 3 & 0.74 \\ 4 & 0.59 \\ 5 & 0.45 \\ 6 & 0.32 \\ 7 & 0.22 \\ 8 & 0.14 \\ 9 & 0.089 \\ 10 & 0,053 \end{array}

$\begin{array}{ll} d & P(\xi < 4)\\ 1 & 0.95\\ 2 & 0.86\\ 3 & 0.74\\ 4 & 0.59\\ 5 & 0.45\\ 6 & 0.32\\ 7 & 0.22\\ 8 & 0.14\\ 9 & 0.089\\ 10 & 0.053\\ \end{array}$

Sie können diese Werte in R mit commads wie erhalten pchisq(1,df=1:10), pchisq(4,df=1:10)usw.

Post Scriptum Wie Kardinal in den Kommentaren betont hat, kann man das asymptotische Verhalten dieser Wahrscheinlichkeiten abschätzen. Die CDF einer -Variablen ist $\chi^2(d)$ wobeiist dieunvollständige -functionund classicaly.

F_{d} (x) = P (d / 2, x / 2) = \frac{γ (d / 2, x / 2)}{Γ (d / 2)}

$F_d(x) = P(d/2,x/2) = {\gamma(d/2, x/2) \over \Gamma(d/2)}$

γ (s, y) = \int_{0}^{y} t^{s - 1} e^{- t} d t

$\gamma(s,y) = \int_0^y t^{s-1} e^{-t} \mathrm d t$

γ

$\gamma$

Γ (s) = \int_{0}^{\infty} t^{s - 1} e^{- t} d t

$\Gamma(s) = \int_0^\infty t^{s-1} e^{-t} \mathrm d t$

$s$

P (s, y) = e^{- y} \sum_{k = s}^{\infty} \frac{y^{k}}{k!},

$P(s,y) = e^{-y} \sum_{k=s}^\infty {y^k \over k!},$

$P(s,y) \sim {y^s \over s!} e^{-y}$ $s$ $d$

P (ξ < x) = P (d / 2, x / 2) \sim \frac{1}{(d / 2)!} {(\frac{x}{2})}^{d / 2} e^{- x / 2} \sim \frac{1}{\sqrt{π d}} e^{\frac{1}{2} (d - x)} {(\frac{x}{d})}^{\frac{d}{2}} \sim \frac{1}{\sqrt{π}} e^{- \frac{1}{2} x} d^{- \frac{1}{2} d},

$P(\xi < x) = P(d/2,x/2) \sim {1 \over (d/2)!} \left({x\over 2}\right)^{d/2} e^{-x/2} \sim {1\over\sqrt{\pi d}}e^{{1\over 2}(d-x)} \left({x\over d}\right)^{d\over 2} \sim {1\over\sqrt\pi} e^{-{1\over 2}x} d^{-{1\over 2}d},$ for big even

d

$d$ , the penultimate equivalence using Stirling formula. From this formula we see that the asymptotic decay is very fast as

d

$d$ increase.

Elvis
quelle

Welcome to our site, Elvis! Nice answer. (+1)

whuber

(+1) Good answer. Here are a couple suggestions for your consideration: (1) It might help to make explicit what

ξ

$\xi$ is for clarity's sake, (2) briefly give an intuitive argument for the choice you've made for the meaning of "one standard deviation" in this context and why it is even well-defined in the first place, and (3) add a statement regarding the growth of this quantity as a function of

d

$d$ . (The OP asks for only "empirical" answers, but other readers might appreciate a small mathematical addendum.)

cardinal

Thank you for your comments. I didn’t think this answer would receive much attention! It is true that this is a nice form of the curse of dimensionality... @cardinal concerning (3) I don’t know any asymptotic equivalent of the incomplete gamma function when the first parameters goes to infinity, the second being fixed, this is not easy! A rough majoration could be done, I may write that later.

Elvis

Regarding (3), to avoid a computation, you can employ the following argument: Let

d

$d$ be even and such that

d = 2 k

$d = 2 k$ . Note that

Z_{i} = X_{2 i - 1}^{2} + X_{2 i}^{2}

$Z_i = X_{2i-1}^2 + X_{2i}^2$ is an

E x p (1 / 2)

$\mathrm{Exp}(1/2)$ random variable. So

‖ X ‖^{2} = \sum_{i = 1}^{k} Z_{i}

$\|X\|^2 = \sum_{i=1}^k Z_i$ . But, then

‖ X ‖^{2}

$\|X\|^2$ is just the time until the

k

$k$ th renewal of a Poisson process with rate 1/2. So

P (‖ X ‖^{2} < 1) = P (N_{1 / 2} (0, 1) \geq k) = e^{- 1 / 2} \sum_{x = k}^{\infty} 2^{- x} / x!

$\mathbb P(\|X\|^2 < 1 ) = \mathbb P( N_{1/2}(0,1) \geq k) = e^{-1/2} \sum_{x=k}^\infty 2^{-x}/x!$ . The tail of the Poisson is dominated by the leading term, so

P (‖ X ‖^{2} < 1) \sim e^{- 1 / 2} 2^{- k} / Γ (k + 1)

$\mathbb P(\|X\|^2 < 1) \sim e^{-1/2} 2^{-k} / \Gamma(k+1)$ as

d \to \infty

$d\to\infty$ (Again:

k = d / 2

$k = d/2$ ).

cardinal

Part of the point of the foregoing comment is that we get an exact answer for all even

d

$d$ . Also, using Stirling's approximation, we get that

P (‖ X ‖^{2} < 1) \sim e^{- 1 / 2} 2^{- k} / Γ (k + 1) \sim e^{(d - 1) / 2} d^{- (d + 1) / 2} / \sqrt{π}

$\mathbb P(\|X\|^2 < 1 ) \sim e^{-1/2} 2^{-k} / \Gamma(k+1) \sim e^{(d-1)/2} d^{-(d+1)/2} / \sqrt{\pi}$ .

cardinal

Dichte der Normalverteilung mit zunehmenden Abmessungen

Antworten: