Wahrscheinlichkeit vs. bedingte Verteilung für die Bayes'sche Analyse

11

Wir können den Satz von Bayes als schreiben

p(θ|x)=f(X|θ)p(θ)θf(X|θ)p(θ)dθ

wobei p(θ|x) der hintere ist, f(X|θ) die bedingte Verteilung ist und p(θ) der Prior ist.

oder

p(θ|x)=L(θ|x)p(θ)θL(θ|x)p(θ)dθ

wobei p(θ|x) der hintere ist, L(θ|x) die Wahrscheinlichkeitsfunktion ist und p(θ) der Prior ist.

Meine Frage ist

  1. Warum wird die Bayes'sche Analyse unter Verwendung der Wahrscheinlichkeitsfunktion und nicht der bedingten Verteilung durchgeführt?
  2. Können Sie in Worten sagen, was der Unterschied zwischen der Wahrscheinlichkeit und der bedingten Verteilung ist? Ich weiß, dass die Wahrscheinlichkeit keine Wahrscheinlichkeitsverteilung ist und L(θ|x)f(X|θ) .
kzoo
quelle
1
Es gibt keinen Unterschied! Die Wahrscheinlichkeit ist, dass die bedingte Verteilung f(X|θ) proportional zu ist, was alles ist, was zählt.
kjetil b halvorsen
Der vorherige Parameter hat die Dichte p Θ ( θ ) . wenn die Realisierung von Θ Wert θ während x der beobachtete Wert einer Zufallsvariablen ist X , dann wird der Wert der Wahrscheinlichkeitsfunktion L ( θ | x ) ist genau f ( x | θ ) , der Wert der Bedingungsdichte f X Θ ( x Θ = θ ) von X.ΘpΘ(θ)ΘθxXL(θx) f(xθ)fXΘ(xΘ=θ)X. Der Unterschied ist , dass für alle Realisierungen Θ . In Abhängigkeit von θ (und festem x ) ist L ( θ x ) jedoch keine Dichte: L ( θ x ) d θ 1
fXΘ(xΘ=θ)dx=1
ΘθxL(θx)
L(θx)dθ1
Dilip Sarwate

Antworten:

10

Angenommen, Sie haben Zufallsvariablen (deren Werte in Ihrem Experiment beobachtet werden), die bedingt unabhängig sind, vorausgesetzt, Θ = θ , mit bedingten Dichten f X iΘ (X1,,XnΘ=θ , für i = 1 , ... , n . Dies ist Ihr (postuliertes) statistisches (bedingtes) Modell, und die bedingten Dichten drücken für jeden möglichen Wert θ des (zufälligen) Parameters Θ Ihre Unsicherheit über die Werte der X i aus ,bevorSie Zugriff auf einen Realwert haben Daten. Mit Hilfe der bedingten Dichten können Sie beispielsweise bedingte Wahrscheinlichkeiten wie P { X 1B 1 , , X nB n berechnen fXiΘ(θ)i=1,,nθΘXi Für jedes θ .

P{X1B1,,XnBnΘ=θ}=B1××Bni=1nfXiΘ(xiθ)dx1dxn,
θ

Nachdem Sie Zugriff auf eine tatsächliche Stichprobe von Werten (Realisierungen) der X i haben, die in einem Durchlauf Ihres Experiments beobachtet wurden, ändert sich die Situation: Es besteht keine Unsicherheit mehr über die Observablen X 1 , , X n . Angenommen, der Zufall Θ nimmt Werte in einem Parameterraum Π an . Nun definieren Sie für diese bekannten (festen) Werte ( x 1 , , x n ) eine Funktion L.(x1,,xn)XiX1,,XnΘΠ(x1,,xn)

Lx1,,xn:ΠR
by
Lx1,,xn(θ)=i=1nfXiΘ(xiθ).
Note that Lx1,,xn, known as the "likelihood function" is a function of θ. In this "after you have data" situation, the likelihood Lx1,,xn contains, for the particular conditional model that we are considering, all the information about the parameter Θ contained in this particular sample (x1,,xn). In fact, it happens that Lx1,,xn is a sufficient statistic for Θ.

Answering your question, to understand the differences between the concepts of conditional density and likelihood, keep in mind their mathematical definitions (which are clearly different: they are different mathematical objects, with different properties), and also remember that conditional density is a "pre-sample" object/concept, while the likelihood is an "after-sample" one. I hope that all this also help you to answer why Bayesian inference (using your way of putting it, which I don't think is ideal) is done "using the likelihood function and not the conditional distribution": the goal of Bayesian inference is to compute the posterior distribution, and to do so we condition on the observed (known) data.

Zen
quelle
I think Zen is correct when he says that the likelihood and conditional probability are different. In the likelihood function θ is not a random variable, thus it is different from conditional probability.
Martine
1

Proportionality is used to simplify analysis

Bayesian analysis is generally done via an even simpler statement of Bayes' theorem, where we work only in terms of proportionality with respect to the parameter of interest. For a standard IID model with sampling density f(X|θ) we can express this as:

p(θ|x)Lx(θ)p(θ)Lx(θ)i=1nf(xi|θ).

This statement of Bayesian updating works in terms of proportionality with respect to the parameter θ. It uses two proportionality simplifications: one in the use of the likelihood function (proportional to the sampling density) and one in the posterior (proportional to the product of likelihood and prior). Since the posterior is a density function (in the continuous case), the norming rule then sets the multiplicative constant that is required to yield a valid density (i.e., to make it integrate to one).

This method use of proportionality has the advantage of allowing us to ignore any multiplicative elements of the functions that do not depend on the parameter θ. This tends to simplify the problem by allowing us to sweep away unnecessary parts of the mathematics, and get simpler statements of the updating mechanism. This is not a mathematical requirement (since Bayes' rule works in its non-proportional form too), but it makes things simpler for our tiny animal brains.

An applied example: Consider an IID model with observed data X1,...,XnIID N(θ,1). To facilitate our analysis we define the statistics x¯=1ni=1nxi and x¯¯=1ni=1nxi2, which are the first two sample moments. For this model we have sampling density:

f(x|θ)=i=1nf(xi|θ)=i=1nN(xi|θ,1)=i=1n12πexp(12(xiθ)2)=(2π)n/2exp(12i=1n(xiθ)2).=(2π)n/2exp(n2(θ22x¯θ+x¯¯))=(2π)n/2exp(nx¯¯2)exp(n2(θ22x¯θ))

Now, we can work directly with this sampling density if we want to. But notice that the first two terms in this density are multiplicative constants that do not depend on θ. It is annoying to have to keep track of these terms, so let's just get rid of them, so we have the likelihood function:

Lx(θ)=exp(n2(θ22x¯θ)).

That simplifies things a little bit, since we don't have to keep track of an additional term. Now, we could apply Bayes' rule using its full equation-version, including the integral denominator. But again, this requires us to keep track of another annoying multiplicative constant that does not depend on θ (more annoying because we have to solve an integral to get it). So let's just apply Bayes' rule in its proportional form. Using the conjugate prior θN(0,λ0), with some known precision parameter λ0>0, we get the following result (by completing the square):

p(θ|x)Lx(θ)p(θ)=exp(n2(θ22x¯θ))N(θ|0,λ0)exp(n2(θ22x¯θ))exp(λ02θ2)=exp(12(nθ22nx¯θ+λ0θ2))=exp(12((n+λ0)θ22nx¯θ))=exp(n+λ02(θ22nx¯n+λ0θ))exp(n+λ02(θnn+λ0x¯)2)N(θ|nn+λ0x¯,n+λ0).

So, from this working we can see that the posterior distribution is proportional to a normal density. Since the posterior must be a density, this implies that the posterior is that normal density:

p(θ|x)=N(θ|nn+λ0x¯,n+λ0).

Hence, we see that a posteriori the parameter θ is normally distributed with posterior mean and variance given by:

E(θ|x)=nn+λ0x¯V(θ|x)=1n+λ0.

Now, the posterior distribution we have derived has a constant of integration out the front of it (which we can find easily by looking up the form of the normal distribution). But notice that we did not have to worry about this multiplicative constant - all our working removed (or brought in) multiplicative constants whenever this simplified the mathematics. The same result can be derived while keeping track of the multiplicative constants, but this is a lot messier.

Reinstate Monica
quelle
0

I think Zen's answer really tells you how conceptually the likelihood function and the joint density of values of random variables differ. Still mathematically as a function of both the xis and θ they are the same and in that sense the likelihood can be looked at as a probability density. The difference you point to in the formula for the Bayes posterior distribution is just a notational difference. But the subtlety of the difference is nicely explained in Zen's answer.

This issue has come up in other questions discussed on this site regarding the likelihood function. Also other comments by kjetil and Dilip seem to support what I am saying.

Michael R. Chernick
quelle