Ist dies eine korrekte Methode, um eine Wahrscheinlichkeit mithilfe des Bayes-Theorems kontinuierlich zu aktualisieren?

Nehmen wir an, ich versuche herauszufinden, mit welcher Wahrscheinlichkeit Vanille das beliebteste Eis ist.

Ich weiß, dass die Person auch Horrorfilme mag.

Ich möchte die Wahrscheinlichkeit herausfinden, dass das Lieblingseis der Person Vanille ist, da sie Horrorfilme mag.

Ich weiß folgendes:

$5\%$ der Menschen wählen Vanille als ihren Lieblingseisgeschmack. (Dies ist mein ) $P(A)$
$10\%$ der Menschen, deren Favorit Vanilleeis ist, lieben auch Horrorfilme. (Dies ist mein ) $P(B|A)$
$1\%$ der Leute, deren Favorit nicht Vanilleeis ist, lieben auch Horrorfilme (Dies ist mein ) $P(B|\lnot A)$

Also berechne ich es so: Ich finde, dass (auf das nächste Zehntausendstel gerundet). Es besteht eine Wahrscheinlichkeit von dass Vanille das beliebteste Eis eines Horrorfilmfans ist.

P (A | B) = \frac{0.05 \times 0.1}{(0.05 \times 0.1) + (0.01 \times (1 - 0.05))}

$P(A|B)=\frac{0.05\times0.1}{(0.05 \times 0.1)+(0.01 \times(1-0.05))}$

P (A | B) = 0.3448

$P(A|B) = 0.3448$

34.48 %

$34.48\%$

Aber dann erfahre ich, dass die Person in den letzten 30 Tagen einen Horrorfilm gesehen hat. Folgendes weiß ich:

$34.48\%$ ist die aktualisierte hintere Wahrscheinlichkeit, dass Vanille das beliebteste Eis der Person ist - das in diesem nächsten Problem. $P(A)$
$20\%$ der Menschen, deren Favorit Vanilleeis ist, haben in den letzten 30 Tagen einen Horrorfilm gesehen.
$5\%$ der Menschen, deren Favorit nicht Vanilleeis ist, haben in den letzten 30 Tagen einen Horrorfilm gesehen.

Dies ergibt: wenn gerundet.

\frac{0.3448 \times 0.2}{(0.3448 \times 0.2) + (0.05 \times (1 - 0.3448))} = 0.6779

$\frac{0.3448\times0.2}{(0.3448\times0.2)+(0.05\times(1-0.3448))} = 0.6779$

Jetzt glaube ich, dass es eine gibt, dass der Horrorfilmfan Eis liebt, da er in den letzten 30 Tagen einen Horrorfilm gesehen hat. $67.79\%$

Aber warte, da ist noch etwas. Ich habe auch erfahren, dass die Person eine Katze besitzt.

Folgendes weiß ich:

$67.79\%$ ist die aktualisierte hintere Wahrscheinlichkeit, dass Vanille das beliebteste Eis der Person ist - das in diesem nächsten Problem $P(A)$
$40\%$ der Menschen, deren Favorit Vanilleeis ist, besitzen auch Katzen
$10\%$ of people whose favorite is not vanilla ice cream also own cats

This gives:

\frac{0.6779 \times 0.4}{(0.6779 \times 0.4) + (0.1 \times (1 - 0.6779))} = 0.8938

$\frac{0.6779\times0.4}{(0.6779\times 0.4)+(0.1\times(1-0.6779))} = 0.8938$ when rounded.

My question basically boils down to this: Am I correctly updating probability using Bayes' theorem? Am I getting anything else wrong in my methods?

probability bayes user1626730
quelle

love = favorite? you're not posting degrees of loving. if you love it, it is your favorite. clarify if needed.

generic_user

Good point. I changed "love" to "favorite." It's not grammatically correct, but it's less wordy than saying "choose vanilla for their favorite ice cream flavor." I hope that clears things up.

user1626730

Antworten:

This is not correct. Sequential updating of this type only works when the information you are receiving sequentially is independent (e.g. iid observations of a random variable). If each observation is not independent, as in this case, you need to consider the joint probability distribution. The correct way to update would be to go back to the prior, find the joint probability that someone loves horror movies, has seen a horror movie in the last 30 days, and owns a cat given that they do or do not choose vanilla as their favorite ice cream flavor, and then update in a single step.

Updating sequentially like this when your data are not independent will rapidly drive your posterior probability much higher or lower than it ought to be.

Jonathan Christensen
quelle

How do you mean by "when the information you are receiving sequentially is independent?" If you mean "independent of the event you're trying to predict," do you know how I can tell if the info I'm getting is independent?

user1626730

Conditionally independent given the event you are trying to predict. If they were independent of the event you're trying to predict then they wouldn't do you any good. As for how you can tell--you have to think about what your data is. In this case, whether someone has watched a horror film in the last 30 days is clearly not independent of whether they love horror films.

Jonathan Christensen

When you say "conditionally independent," I'm guessing you mean that each P(B) (i.e., horror-movie-loving, cat-ownership) aren't related to one another? If so, wouldn't the cat-ownership variable be independent of the horror-movie-loving?

user1626730

Yes, you can make an argument that cat-ownership is independent of horror-movie-loving. It's not, necessarily, though--e.g., maybe women are both more likely to love cats and less likely to love horror movies.

Jonathan Christensen

Hm, I'm not quite sure what you mean by adding in that bit about women and cats. Could you explain further, please?

user1626730