Wir können den Satz von Bayes als schreiben
wobei der hintere ist, die bedingte Verteilung ist und der Prior ist.
oder
wobei der hintere ist, die Wahrscheinlichkeitsfunktion ist und der Prior ist.
Meine Frage ist
- Warum wird die Bayes'sche Analyse unter Verwendung der Wahrscheinlichkeitsfunktion und nicht der bedingten Verteilung durchgeführt?
- Können Sie in Worten sagen, was der Unterschied zwischen der Wahrscheinlichkeit und der bedingten Verteilung ist? Ich weiß, dass die Wahrscheinlichkeit keine Wahrscheinlichkeitsverteilung ist und .
bayesian
likelihood
kzoo
quelle
quelle
Antworten:
Angenommen, Sie haben Zufallsvariablen (deren Werte in Ihrem Experiment beobachtet werden), die bedingt unabhängig sind, vorausgesetzt, Θ = θ , mit bedingten Dichten f X i ∣ Θ (X1,…,Xn Θ=θ , für i = 1 , ... , n . Dies ist Ihr (postuliertes) statistisches (bedingtes) Modell, und die bedingten Dichten drücken für jeden möglichen Wert θ des (zufälligen) Parameters Θ Ihre Unsicherheit über die Werte der X i aus ,bevorSie Zugriff auf einen Realwert haben Daten. Mit Hilfe der bedingten Dichten können Sie beispielsweise bedingte Wahrscheinlichkeiten wie P { X 1 ∈ B 1 , … , X n ∈ B n berechnen
fXi∣Θ(⋅∣θ) i=1,…,n θ Θ Xi
Für jedes θ .
Nachdem Sie Zugriff auf eine tatsächliche Stichprobe von Werten (Realisierungen) der X i haben, die in einem Durchlauf Ihres Experiments beobachtet wurden, ändert sich die Situation: Es besteht keine Unsicherheit mehr über die Observablen X 1 , … , X n . Angenommen, der Zufall Θ nimmt Werte in einem Parameterraum Π an . Nun definieren Sie für diese bekannten (festen) Werte ( x 1 , … , x n ) eine Funktion L.(x1,…,xn) Xi X1,…,Xn Θ Π (x1,…,xn)
Answering your question, to understand the differences between the concepts of conditional density and likelihood, keep in mind their mathematical definitions (which are clearly different: they are different mathematical objects, with different properties), and also remember that conditional density is a "pre-sample" object/concept, while the likelihood is an "after-sample" one. I hope that all this also help you to answer why Bayesian inference (using your way of putting it, which I don't think is ideal) is done "using the likelihood function and not the conditional distribution": the goal of Bayesian inference is to compute the posterior distribution, and to do so we condition on the observed (known) data.
quelle
Proportionality is used to simplify analysis
Bayesian analysis is generally done via an even simpler statement of Bayes' theorem, where we work only in terms of proportionality with respect to the parameter of interest. For a standard IID model with sampling densityf(X|θ) we can express this as:
This statement of Bayesian updating works in terms of proportionality with respect to the parameterθ . It uses two proportionality simplifications: one in the use of the likelihood function (proportional to the sampling density) and one in the posterior (proportional to the product of likelihood and prior). Since the posterior is a density function (in the continuous case), the norming rule then sets the multiplicative constant that is required to yield a valid density (i.e., to make it integrate to one).
This method use of proportionality has the advantage of allowing us to ignore any multiplicative elements of the functions that do not depend on the parameterθ . This tends to simplify the problem by allowing us to sweep away unnecessary parts of the mathematics, and get simpler statements of the updating mechanism. This is not a mathematical requirement (since Bayes' rule works in its non-proportional form too), but it makes things simpler for our tiny animal brains.
An applied example: Consider an IID model with observed dataX1,...,Xn∼IID N(θ,1) . To facilitate our analysis we define the statistics x¯=1n∑ni=1xi and x¯¯=1n∑ni=1x2i , which are the first two sample moments. For this model we have sampling density:
Now, we can work directly with this sampling density if we want to. But notice that the first two terms in this density are multiplicative constants that do not depend onθ . It is annoying to have to keep track of these terms, so let's just get rid of them, so we have the likelihood function:
That simplifies things a little bit, since we don't have to keep track of an additional term. Now, we could apply Bayes' rule using its full equation-version, including the integral denominator. But again, this requires us to keep track of another annoying multiplicative constant that does not depend onθ (more annoying because we have to solve an integral to get it). So let's just apply Bayes' rule in its proportional form. Using the conjugate prior θ∼N(0,λ0) , with some known precision parameter λ0>0 , we get the following result (by completing the square):
So, from this working we can see that the posterior distribution is proportional to a normal density. Since the posterior must be a density, this implies that the posterior is that normal density:
Hence, we see that a posteriori the parameterθ is normally distributed with posterior mean and variance given by:
Now, the posterior distribution we have derived has a constant of integration out the front of it (which we can find easily by looking up the form of the normal distribution). But notice that we did not have to worry about this multiplicative constant - all our working removed (or brought in) multiplicative constants whenever this simplified the mathematics. The same result can be derived while keeping track of the multiplicative constants, but this is a lot messier.
quelle
I think Zen's answer really tells you how conceptually the likelihood function and the joint density of values of random variables differ. Still mathematically as a function of both the xi s and θ they are the same and in that sense the likelihood can be looked at as a probability density. The difference you point to in the formula for the Bayes posterior distribution is just a notational difference. But the subtlety of the difference is nicely explained in Zen's answer.
This issue has come up in other questions discussed on this site regarding the likelihood function. Also other comments by kjetil and Dilip seem to support what I am saying.
quelle