Ein technisches Lemma

Ich bin mir nicht sicher, wie intuitiv dies ist, aber das wichtigste technische Ergebnis, das Ihrer Aussage des Halmos-Savage-Theorems zugrunde liegt, ist Folgendes:

Lemma. Sei $\mu$ ein $\sigma$ endliches Maß für $(S, \mathcal{A})$ . Angenommen, $\aleph$ ist eine Sammlung von für $(S, \mathcal{A})$ so dass für jedes $\nu \in \aleph$ , $\nu \ll \mu$ . Dann existiert eine Folge von nichtnegativen Zahlen $\{c_i\}_{i=1}^\infty$ und eine Folge von Elementen von $\aleph$ , $\{\nu_i\}_{i=1}^\infty$ so dass $\sum_{i=1}^\infty c_i = 1$ und $\nu \ll \sum_{i=1}^\infty c_i \nu_i$ für jedes $\nu \in \aleph$ .

Dies ist wörtlich aus Satz A.78 in Schervishs Statistiktheorie (1995) entnommen . Darin schreibt er es Lehmanns Testing Statistical Hypotheses (1986) zu ( Link zur dritten Ausgabe ), wobei das Ergebnis Halmos und Savage selbst zugeschrieben wird (siehe Lemma 7). Eine weitere gute Referenz ist Shaos Mathematical Statistics (zweite Ausgabe, 2003) , deren relevante Ergebnisse Lemma 2.1 und Theorem 2.2 sind.

Das obige Lemma besagt, dass Sie, wenn Sie mit einer Familie von Maßeinheiten beginnen, die von einem $\sigma$ endlichen Maß dominiert werden , das dominierende Maß tatsächlich durch eine abzählbare konvexe Kombination von Maßeinheiten innerhalb der Familie ersetzen können. Schervish schreibt vor dem Satz A.78:

"In statistischen Anwendungen werden wir oft eine Klasse von Maßen haben, von denen jede in Bezug auf ein einzelnes $\sigma$ endliches Maß absolut stetig ist . Es wäre schön, wenn das einzelne dominierende Maß in der ursprünglichen Klasse wäre oder aus dem konstruiert werden könnte Das folgende Theorem behandelt dieses Problem. "

Ein konkretes Beispiel

Angenommen, wir messen eine Größe $X$ der wir glauben, dass sie für ein unbekanntes gleichmäßig auf das Intervall $[0, \theta]$ . In diesem statistischen Problem betrachten wir implizit die Menge der Borel-Wahrscheinlichkeitsmaße für die aus den Gleichverteilungen in allen Intervallen der Form . Das heißt, wenn Bezeichnet Lebesguemaß und für , bezeichnet die $\theta > 0$ $\mathcal{P}$ $\mathbb{R}$ $[0, \theta]$ $\lambda$ $\theta > 0$ $P_\theta$ $\operatorname{Uniform}([0, \theta])$ Verteilung (dh

P θ (A) = 1 θ λ (A \cap [0, θ]) = \int A 1 θ 1 [0, θ] (x) d x

$P_\theta(A) = \frac{1}{\theta} \lambda(A \cap [0, \theta]) = \int_A \frac{1}{\theta} \mathbf{1}_{[0, \theta]}(x) \, dx$ für jedes Borel

A⊆R $A \subseteq \mathbb{R}$ ), dann haben wir einfach

P = {P θ : θ > 0} .

$\mathcal{P} = \{P_\theta : \theta > 0\}.$ Dies ist der Satz von Kandidatenverteilungen für unsere Messung

X $X$ .

Die Familie $\mathcal{P}$ wird eindeutig vom Lebesgue-Maß $\lambda$ (das $\sigma$ endlich ist) dominiert, daher garantiert das obige Lemma (mit $\aleph = \mathcal{P}$ ) die Existenz einer Folge $\{c_i\}_{i=1}^\infty$ von nichtnegativen Zahlen, die sich zu $1$ und a summieren Sequenz $\{Q_i\}_{i=1}^\infty$ gleichmäßiger Verteilung in $\mathcal{P}$ , so daß

P θ ≪ \sum i = 1 \infty c i Q i

$P_\theta \ll \sum_{i=1}^\infty c_i Q_i$ für jedes

θ>0 $\theta > 0$ . In diesem Beispiel können wir solche Sequenzen explizit konstruieren!

Zunächst sei $(\theta_i)_{i=1}^\infty$ eine Aufzählung der positiven rationalen Zahlen ( dies kann explizit erfolgen ) und sei $Q_i = P_{\theta_i}$ für jedes $i$ . Als nächstes sei $c_i = 2^{-i}$ , so dass $\sum_{i=1}^\infty c_i = 1$ . Ich behaupte, dass diese Kombination von $\{c_i\}_{i=1}^\infty$ und $\{Q_i\}_{i=1}^\infty$ funktioniert.

Um dies zu sehen, fix $\theta > 0$ und lassen $A$ eine Teilmenge von Borel sein $\mathbb{R}$ , so daß $\sum_{i=1}^\infty c_i Q_i(A) = 0$ . Wir müssen zeigen, dass $P_\theta(A) = 0$ . Da $\sum_{i=1}^\infty c_i Q_i(A) = 0$ und jeder Summand nicht negativ ist, folgt daraus, dass $c_i Q_i(A) = 0$ für jedes $i$ . Da darüber hinaus jedes $c_i$ positiv ist, folgt, dass $Q_i(A) = 0$ für jedes $i$ . Das heißt, für alle $i$ gilt

Q i (A) = P θ i (A) = 1 θ i λ (A \cap [0, θ i]) = 0.

$Q_i(A) = P_{\theta_i}(A) = \frac{1}{\theta_i} \lambda(A \cap [0, \theta_i]) = 0.$ Since each

θi $\theta_i$ is positive, it follows that

λ(A∩[0,θi])=0 $\lambda(A \cap [0, \theta_i]) = 0$ for each

i $i$ .

Now choose a subsequence $\{\theta_{i_k}\}_{k=1}^\infty$ of $\{\theta_i\}_{i=1}^\infty$ which converges to $\theta$ from above (this can be done since $\mathbb{Q}$ is dense in $\mathbb{R}$ ). Then $A \cap [0, \theta_{\theta_{i_k}}] \downarrow A \cap [0, \theta]$ as $k \to \infty$ , so by continuity of measure we conclude that

λ (A \cap [0, θ]) = lim k \to \infty λ (A \cap [0, θ i k]) = 0,

$\lambda(A \cap [0, \theta]) = \lim_{k \to \infty} \lambda(A \cap [0, \theta_{i_k}]) = 0,$ and so

Pθ(A)=0 $P_\theta(A) = 0$ . This proves the claim.

Thus, in this example we were able to explicitly construct a countable convex combination of probability measures from our dominated family which still dominates the entire family. The Lemma above guarantees that this can be done for any dominated family (at least as long as the dominating measure is $\sigma$ -finite).

The Halmos-Savage Theorem

So now on to the Halmos-Savage Theorem (for which I will use slightly different notation than in the question due to personal preference). Given the Halmos-Savage Theorem, the Fisher-Neyman factorization theorem is just one application of the Doob-Dynkin lemma and the chain rule for Radon-Nikodym derivatives away!

Halmos-Savage Theorem. Let $(\mathcal{X}, \mathcal{B}, \mathcal{P})$ be a dominated statistical model (meaning that $\mathcal{P}$ is a set of probability measures on $\mathcal{B}$ and there is a $\sigma$ -finite measure $\mu$ on $\mathcal{B}$ such that $P \ll \mu$ for all $P \in \mathcal{P}$ ). Let $T : (\mathcal{X}, \mathcal{B}) \to (\mathcal{T}, \mathcal{C})$ be a measurable function, where $(T, \mathcal{C})$ is a standard Borel space. Then the following are equivalent:

$T$ is sufficient for $\mathcal{P}$ (meaning that there is a probability kernel $r : \mathcal{B} \times \mathcal{T} \to [0, 1]$ such that $r(B, T)$ is a version of $P(B \mid T)$ for all $B \in \mathcal{B}$ and $P \in \mathcal{P}$ ).

There exists a sequence $\{c_i\}_{i=1}^\infty$ of nonnegative numbers such that $\sum_{i=1}^\infty c_i = 1$ and a sequence $\{P_i\}_{i=1}^\infty$ of probability measures in $\mathcal{P}$ such that $P \ll P^*$ for all $P \in \mathcal{P}$ , where $P^* = \sum_{i=1}^\infty c_i P_i$ , and for each $P \in \mathcal{P}$ there exists a $T$ -measurable version of $dP/dP^*$ .

Proof. By the lemma above, we may immediately replace $\mu$ by $P^* = \sum_{i=1}^\infty c_i P_i$ for some sequence $\{c_i\}_{i=1}^\infty$ of nonnegative numbers such that $\sum_{i=1}^\infty c_i = 1$ and a sequence $\{P_i\}_{i=1}^\infty$ of probability measures in $\mathcal{P}$ .

(1. implies 2.) Suppose $T$ is sufficient. Then we must show that there are $T$ -measurable versions of $dP/dP^*$ for all $P \in \mathcal{P}$ . Let $r$ be the probability kernel in the statement of the theorem. For each $A \in \sigma(T)$ and $B \in \mathcal{B}$ we have

$\begin{aligned} P^*(A \cap B) &= \sum_{i=1}^\infty c_i P_i(A \cap B) \\ &= \sum_{i=1}^\infty c_i \int_A P_i(B \mid T) \, dP_i \\ &= \sum_{i=1}^\infty c_i \int_A r(B, T) \, dP_i \\ &= \int_A r(B, T) \, dP^*. \end{aligned}$ Thus

$r(B, T)$ is a version of

$P^*(B \mid T)$ for all

$B \in \mathcal{B}$ .

For each $P \in \mathcal{P}$ , let $f_P$ denote a version of the Radon-Nikodym derivative $dP/dP^*$ on the measurable space $(\mathcal{X}, \sigma(T))$ (so in particular $f_P$ is $T$ -measurable). Then for all $B \in \mathcal{B}$ and $P \in \mathcal{P}$ we have

$\begin{aligned} P(B) &= \int_{\mathcal{X}} P(B \mid T) \, dP \\ &= \int_{\mathcal{X}} r(B, T) \, dP \\ &= \int_{\mathcal{X}} r(B, T) f_P \, dP^* \\ &= \int_{\mathcal{X}} P^*(B \mid T) f_P \, dP^* \\ &= \int_{\mathcal{X}} E_{P^*}[\mathbf{1}_B f_P \mid T] \, dP^* \\ &= \int_B f_P \, dP^*. \end{aligned}$ Thus in fact

$f_P$ is a

$T$ -measurable version of

$dP/dP^*$ on

$(\mathcal{X}, \mathcal{B})$ . This proves that the first condition of the theorem implies the second.

(2. implies 1.) Suppose one can choose a $T$ -measurable version $f_P$ of $dP/dP^*$ for each $P \in \mathcal{P}$ . For each $B \in \mathcal{B}$ , let $r(B, t)$ denote a particular version of $P^*(B \mid T = t)$ (e.g., $r(B, t)$ is a function such that $r(B, T)$ is a version of $P^*(B \mid T)$ ). Since $(T, \mathcal{C})$ is a standard Borel space, we may choose $r$ in a way that makes it a probability kernel (see, e.g., Theorem B.32 in Schervish's Theory of Statistics (1995)). We will show that $r(B, T)$ is a version of $P(B \mid T)$ for any $P \in \mathcal{P}$ and any $B \in \mathcal{B}$ . Thus, let $A \in \sigma(T)$ and $B \in \mathcal{B}$ be given. Then for all $P \in \mathcal{P}$ we have

$\begin{aligned} P(A \cap B) &= \int_A \mathbf{1}_B f_P \, dP^* \\ &= \int_A E_{P^*}[\mathbf{1}_B f_P \mid T] \, dP^* \\ &= \int_A P^*(B \mid T) f_P \, dP^* \\ &= \int_A r(B, T) f_P \, dP^* \\ &= \int_A r(B, T) \, dP. \end{aligned}$ This shows that

$r(B, T)$ is a version of

$P(B \mid T)$ for any

$P \in \mathcal{P}$ and any

$B \in \mathcal{B}$ , and the proof is done.

Summary. The important technical result underlying the Halmos-Savage theorem as presented here is the fact that a dominated family of probability measures is actually dominated by a countable convex combination of probability measures from that family. Given that result, the rest of the Halmos-Savage theorem is mostly just manipulations with basic properties of Radon-Nikodym derivatives and conditional expectations.

Artem Mavrin
quelle

Intuitives Verständnis des Halmos-Savage-Theorems

Antworten:

Ein technisches Lemma

Ein konkretes Beispiel

The Halmos-Savage Theorem