Laplace Glättung und Dirichlet vor

11

In dem Wikipedia-Artikel über Laplace-Glättung (oder additive Glättung) heißt es aus Bayes-Sicht:

Dies entspricht dem erwarteten Wert der posterioren Verteilung unter Verwendung einer symmetrischen Dirichlet-Verteilung mit dem Parameter als Prior. $\alpha$

Ich bin verwirrt darüber, wie das tatsächlich stimmt. Könnte mir jemand helfen zu verstehen, wie diese beiden Dinge gleichwertig sind?

Vielen Dank!

bayesian smoothing dirichlet-distribution laplace-smoothing DanielX2010
quelle

10

Sicher. Dies ist im Wesentlichen die Beobachtung, dass die Dirichlet-Verteilung ein konjugiertes Prior ist der Multinomialverteilung ist. Dies bedeutet, dass sie die gleiche funktionale Form haben. Der Artikel erwähnt es, aber ich möchte nur betonen, dass dies aus dem multinomialen Stichprobenmodell folgt. Also, los geht's ...

Bei der Beobachtung geht es um den posterioren Bereich. Lassen Sie uns also einige Daten einführen , bei denen es sich um verschiedene Elemente handelt. Wir beobachten insgesamt Proben. Wir nehmen an, dass aus einer unbekannten Verteilung (auf die wir ein vor das $x$ $K$ $N = \sum_{i=1}^K x_i$ $x$ $\pi$ $\mathrm{Dir}(\alpha)$ $K$ Implex setzen).

Die hintere Wahrscheinlichkeit von bei und Daten ist $\pi$ $\alpha$ $x$

p (π | x, α) = p (x | π) p (π | α)

$p(\pi | x, \alpha) = p(x | \pi) p(\pi|\alpha)$

Die Wahrscheinlichkeit ist die Multinomialverteilung. Schreiben wir nun die PDFs aus: $p(x|\pi)$

p (x | π) = \frac{N!}{x_{1}! \dots x_{k}!} π_{1}^{x_{1}} \dots π_{k}^{x_{k}}

$p(x|\pi) = \frac{N!}{x_1!\cdots x_k!} \pi_1^{x_1} \cdots \pi_k^{x_k}$

und

p (π | α) = \frac{1}{B (α)} \prod_{i = 1}^{K} π_{i}^{α - 1}

$p(\pi|\alpha) = \frac{1}{\mathrm{B}(\alpha)} \prod_{i=1}^K \pi_i^{\alpha - 1}$

wobei . Multiplizieren, das finden wir, $\mathrm{B}(\alpha) = \frac{\Gamma(\alpha)^K}{\Gamma(K\alpha)}$

p (π | α, x) = p (x | π) p (π | α) \propto \prod_{i = 1}^{K} π_{i}^{x_{i} + α - 1} .

$p(\pi|\alpha,x) = p(x | \pi) p(\pi|\alpha) \propto \prod_{i=1}^K \pi_i^{x_i + \alpha - 1}.$

Mit anderen Worten, der hintere ist auch Dirichlet. Die Frage betraf den hinteren Mittelwert. Da der hintere Dirichlet ist, können wir die Formel für den Mittelwert eines Dirichlets anwenden , um dies zu finden:

E [π_{i} | α, x] = \frac{x_{i} + α}{N + K α} .

$E[\pi_i | \alpha, x] = \frac{x_i + \alpha}{N + K\alpha}.$

Hoffe das hilft!

ja
quelle

p (π | α, x) = p (x | π) p (π | α) / p (x | α),

$p(\pi | \alpha, x) = p(x | \pi)p(\pi | \alpha)/p(x | \alpha),$

p (π | α, x) = p (x | π) p (π | α) ?

$p(\pi | \alpha, x) = p(x | \pi)p(\pi | \alpha)?$

π

$\pi$ , but writing an equality is not true I think.

michal

I was confused about this for a long time, and I want to share my realization. These folks motivating Laplace smoothing by Dirichlet are using the Posterior Mean, not the MAP. For simplicity, assume the Beta distribution (simplest case of Dirichlet) The posterior mean is

\frac{α + n_{s u c c e s s}}{α + β + n_{s u c c e s s} + n_{f a i l u r e s}}

$\frac{\alpha + n_{success}}{\alpha + \beta + n_{success} + n_{failures}}$ whereas the MAP is

\frac{α + n_{s u c c e s s} - 1}{α + β + n_{s u c c e s s} + n_{f a i l u r e s} - 2}

$\frac{\alpha + n_{success} - 1}{\alpha + \beta + n_{success} + n_{failures} - 2}$ . So if someone says

α = β = 1

$\alpha = \beta = 1$ corresponds to adding 1 to numerator and 2 to denominator, it's because they are using the Posterior Mean.

RMurphy

0

As a side note, I would also like to add another point to the above derivation, which it's not really concerning the main question. However, talking about Dirichlet priors on multinomial distribution, I thought it worth to mention that what would be the form of likelihood function if we're going to take probabilities as nuisance variables.

As it's correctly pointed out by by sydeulissie, the $p(\pi | \alpha, x)$ is proportional to $\prod_{i=1}^{K} \, \pi_i^{x_i+\alpha-1}$ . Now here I would like to calculate $p(x|\alpha)$ .

p (x | α) = \int \prod_{i = 1}^{K} p (x | π_{i}, α) p (π | α) d π_{1} d π_{2} . . . d π_{K}

$\begin{equation} p(x | \alpha) = \int \prod_{i=1}^{K}p(x | \pi_i, \alpha)p(\pi|\alpha) \mathrm{d} \pi_1 \mathrm{d} \pi_2 ...\mathrm{d} \pi_K \end{equation}$

Using an integral identity for gamma functions, we have:

p (x | α) = \frac{Γ (K α)}{Γ (N + K α)} \prod_{i = 1}^{K} \frac{Γ (x_{i} + α)}{Γ (α)}

$\begin{equation} p(x|\alpha) = \frac{\Gamma(K\alpha)}{\Gamma(N + K\alpha)} \prod_{i=1}^{K} \frac{\Gamma(x_i + \alpha)}{\Gamma(\alpha)} \end{equation}$

The above derivation of the likelihood for categorical data proposes a more robust way of dealing with this data for cases that the sample size $N$ is not so big enough.

omidi
quelle

Laplace Glättung und Dirichlet vor

Antworten: