Verständnis des Beta-Konjugats vor der Bayes'schen Folgerung über eine Frequenz

Es folgt ein Auszug aus Bolstads Einführung in die Bayes'sche Statistik .

Für alle Experten da draußen mag dies trivial sein, aber ich verstehe nicht, wie der Autor zu dem Schluss kommt, dass wir keine Integration durchführen müssen, um die hintere Wahrscheinlichkeit für einen Wert von zu berechnen $\pi$ . Ich verstehe den zweiten Ausdruck, der die Verhältnismäßigkeit ist und von dem alle Begriffe stammen ( Wahrscheinlichkeit x Prior) . Ich verstehe außerdem, dass wir uns keine Sorgen um den Nenner machen müssen, da nur der Zähler direkt proportional ist. Aber wenn wir zur dritten Gleichung übergehen, vergessen wir nicht den Nenner der Bayes-Regel? Wo ist es hin ? Und der von den Gammafunktionen berechnete Wert ist das nicht eine Konstante? Heben sich Konstanten im Bayes-Theorem nicht auf?

distributions bayesian beta-distribution conjugate-prior Jenna Maiz
quelle

Es gibt nur eine mögliche Konstante, nämlich die, die die Funktion zu einer Wahrscheinlichkeitsdichte macht.

Xi'an

Antworten:

Der Punkt ist, dass wir wissen, wozu der Posterior proportional ist, und dass wir die Integration nicht durchführen müssen, um den (konstanten) Nenner zu erhalten, da wir erkennen, dass eine Verteilung mit einer Wahrscheinlichkeitsdichtefunktion proportional zu (wie der hintere) ist eine Beta-Verteilung. Da die Normalisierungskonstante für ein solches Beta-PDF $x^{\alpha-1} \times (1-x)^{\beta-1}$ wir das hintere PDF ohne Integration. Und ja, die Normalisierungskonstante im Bayes-Theorem ist eine Konstante (unter Berücksichtigung der beobachteten Daten und der zuvor angenommenen) genau wie die Normalisierungskonstante für die hintere Dichte. $\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}$

Björn
quelle

Die Einrichtung

Sie haben dieses Modell: Die Dichten, für die

\begin{aligned} p & \sim beta (α, β) \\ x | p & \sim binomial (n, p) \end{aligned}

$\begin{align*} p & \, \sim \, \text{beta}(\alpha, \beta) \\ x \, | \, p & \, \sim \, \text{binomial}(n, p) \end{align*}$

f (p) = \frac{1}{B (α, β)} p^{α - 1} (1 - p)^{β - 1}

$\begin{equation*} f(p) = \frac{1}{B(\alpha, \beta)} p^{\alpha - 1} (1 - p)^{\beta - 1} \end{equation*}$

und beachten Sie insbesondere, dass

g (x | p) = (\binom{n}{x}) p^{x} (1 - p)^{n - x}

$\begin{equation*} g(x \, | \, p) = {n \choose x} p^x (1 - p)^{n - x} \end{equation*}$

\frac{1}{B (α, β)} = \frac{Γ (α + β)}{Γ (α) Γ (β)} .

$\begin{equation*} \frac{1}{B(\alpha, \beta)} = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}. \end{equation*}$

Die implizite Version

Jetzt. Die posteriore Verteilung ist proportional zum vorherigen multipliziert mit der Wahrscheinlichkeit . Wir können Konstanten (dh Dinge, die nicht ) ignorieren und ergeben: $f$ $g$ $p$

\begin{aligned} h (p | x) & \propto f (p) g (p | x) \\ = p^{α - 1} (1 - p)^{β - 1} p^{x} p^{n - x} \\ = p^{α + x - 1} (1 - p)^{β + n - x - 1} . \end{aligned}

$\begin{align*} h(p \, | \, x) & \propto f(p) g(p \, | \, x) \\ & = p^{\alpha - 1} (1 - p)^{\beta - 1} p^x p^{n - x} \\ & = p^{\alpha + x - 1} (1 - p)^{\beta + n - x - 1}. \end{align*}$

$\alpha + x$ $\beta + n - x$ $1 / B(\alpha + x, \beta + n - x)$

\frac{1}{B (α + x, β + n - x)} = \frac{Γ (n + α + β)}{Γ (α + x) Γ (β + n - x)} .

$\begin{equation*} \frac{1}{B(\alpha + x, \beta + n - x)} = \frac{\Gamma(n + \alpha + \beta)}{\Gamma(\alpha + x)\Gamma(\beta + n - x)}. \end{equation*}$

h (p | x) = \frac{Γ (n + α + β)}{Γ (α + x) Γ (β + n - x)} p^{α + x - 1} (1 - p)^{β + n - x - 1} .

$\begin{equation*} h(p \, | \, x) = \frac{\Gamma(n + \alpha + \beta)}{\Gamma(\alpha + x)\Gamma(\beta + n - x)} p^{\alpha + x - 1} (1 - p)^{\beta + n - x - 1}. \end{equation*}$

Man kann also das Wissen über die Struktur einer Beta-Distribution nutzen, um leicht einen Ausdruck für den Seitenzahn wiederherzustellen, anstatt eine unordentliche Integration und dergleichen zu durchlaufen.

Es kommt irgendwie zum vollen posterioren Bereich, indem implizit die Normalisierungskonstanten der Gelenkverteilung aufgehoben werden, was verwirrend sein kann.

Die explizite Version

Sie können die Dinge auch prozedural ausarbeiten, was klarer sein kann.

Es ist eigentlich gar nicht mehr so lange. Beachten Sie, dass wir die gemeinsame Verteilung als ausdrücken können

\begin{aligned} f (p) g (x | p) = \frac{1}{B (α, β)} (\binom{n}{x}) p^{α + x - 1} (1 - p)^{β + n - x - 1} \end{aligned}

$\begin{align*} f(p)g(x \, | \, p) = \frac{1}{B(\alpha, \beta)}{n \choose x} p^{\alpha + x - 1} (1 - p)^{\beta + n - x - 1} \end{align*}$

x

$x$

\begin{aligned} \int_{0}^{1} f (p) g (x | p) d p & = \frac{1}{B (α, β)} (\binom{n}{x}) \int_{0}^{1} p^{α + x - 1} (1 - p)^{β + n - x - 1} d p \\ = \frac{1}{B (α, β)} (\binom{n}{x}) \frac{Γ (α + x) Γ (β + n - x)}{Γ (α + β + n - x)} \end{aligned}

$\begin{align*} \int_{0}^{1}f(p)g(x \, | \, p)dp & = \frac{1}{B(\alpha, \beta)}{n \choose x} \int_{0}^{1} p^{\alpha + x - 1} (1 - p)^{\beta + n - x - 1} dp \\ & = \frac{1}{B(\alpha, \beta)}{n \choose x} \frac{\Gamma(\alpha + x)\Gamma(\beta + n - x)}{\Gamma(\alpha + \beta + n - x)} \end{align*}$

\begin{aligned} h (p | x) & = \frac{f (p) g (x | p)}{\int_{0}^{1} f (p) g (x | p) d p} \\ = \frac{\frac{1}{B (α, β)} (\binom{n}{x}) p^{α + x - 1} (1 - p)^{β + n - x - 1}}{\frac{1}{B (α, β)} (\binom{n}{x}) \frac{Γ (α + x) Γ (β + n - x)}{Γ (α + β + n)}} \\ = \frac{Γ (n + α + β)}{Γ (α + x) Γ (β + n - x)} p^{α + x - 1} (1 - p)^{β + n - x - 1} \end{aligned}

$\begin{align*} h(p \, | \, x) & = \frac{f(p) g(x \, | \, p)}{\int_{0}^{1}f(p) g(x \, | \, p)dp} \\ & = \frac{\frac{1}{B(\alpha, \beta)}{n \choose x} p^{\alpha + x - 1} (1 - p)^{\beta + n - x - 1}}{\frac{1}{B(\alpha, \beta)}{n \choose x} \frac{\Gamma(\alpha + x)\Gamma(\beta + n - x)}{\Gamma(\alpha + \beta + n)}} \\ & = \frac{\Gamma(n + \alpha + \beta)}{\Gamma(\alpha + x)\Gamma(\beta + n - x)} p^{\alpha + x - 1} (1 - p)^{\beta + n - x - 1} \end{align*}$ which is the same thing we got previously.

jtobin
quelle

General Remarks

To make the answer given by @Björn a bit more explicit and in the same time more general, we should remember that we arrived at the Bayes Theorem from

$p(\theta|X) \times p(X) = p(X,\theta)=p(X|\theta)\times p(\theta)$

$\implies p(\theta|X) = \frac{p(X|\theta)\times p(\theta)}{p(X)}$ (Bayes Thereom)

where $X$ represents the observed data and $\theta$ our unknown parameter we would like to make probabilistic inferences about -- in the question's case the parameter is an unknown frequency $\pi$ . Let's not worry for now whether we are talking about vectors or scalars to keep it simple.

Marginalization in the continuous case leads to

$p(X) = \int_{-\infty}^{+\infty}{p(X,\theta)d\theta}=\int_{-\infty}^{+\infty}{p(X|\theta)\times p(\theta)d\theta}$

where the joint distribution $p(X,\theta)$ equals $likelihood \times prior$ as we have seen above. It is a constant since after 'integrating out' the parameter it only depends on constant terms.

Therefore we can reformulate the Bayes Theorem as

$p(\theta|X) = Const. \times p(X|\theta)\times p(\theta)$ with $Const. = \frac{1}{p(X)} = \frac{1}{\int{p(X|\theta)\times p(\theta)d\theta}}$

and thus arrive at the usual proportionality form of Bayes Theorem.

Application to the problem a hand

Now we are ready to simply plug in what we know since $likelihood \times prior$ in the question's case is of the form

$p(X,\theta)= p(X|\theta)\times p(\theta) = A \cdot \theta^{\,a + y - 1}(1-\theta)^{b + n - y - 1} = A\cdot \theta^{\,a' - 1}(1-\theta)^{b' - 1}$

where $a' = a+y$ , $b' = b+n-y$ and where $A = \frac{1}{B(a,b)}\binom{n}{y}$ collects the constant terms from the binomial likelihood and the beta prior.

We can now use the answer given by @Björn to find that this integrates to the Beta function $B(a',b')$ times the collection of constant terms $A$ so that

$p(X) = A\cdot\int_0^1{\theta^{\,a' - 1}(1-\theta)^{b' - 1}d\theta}=A\cdot B(a',b')$

$\implies p(\theta|X) = \frac{A\cdot\theta^{\,a' - 1}(1-\theta)^{b' - 1}}{A\cdot B(a',b')}=\frac{\theta^{\,a' - 1}(1-\theta)^{b' - 1}}{B(a',b')}$

Note, that any constant term in the joint distribution will allways cancel out, since it will appear in the nominator and the denominator at the same time (cf. the answer given by @jtobin) so we really do not have to bother.

Thus we recognize that our posterior distribution is in fact a beta distribution where we can simply update the prior's parameters $a' = a+y$ and $b' = b+n-y$ to arrive at the posterior. This is why the beta distributed prior is called a conjugate prior.

gwr
quelle

This reasoning is similar to the implicit version of jtobin. We only look at parts of likelihood times prior that contain the parameter and collect everything else in the normalization constant. Thus we look at integration only as a final step which is legitimate, because the constants cancel out as jtobin has shown in his explicit version.

gwr