Eine Metaanalyse umfasst eine Reihe von Studien, von denen alle einen P-Wert von mehr als 0,05 berichteten. Kann die gesamte Metaanalyse einen P-Wert von weniger als 0,05 ausweisen? Unter welchen Umständen?
(Ich bin mir ziemlich sicher, dass die Antwort ja lautet, aber ich hätte gerne eine Referenz oder Erklärung.)
statistical-significance
meta-analysis
combining-p-values
Harvey Motulsky
quelle
quelle
Antworten:
In theory, yes...
The results of individual studies may be insignificant but viewed together, the results may be significant.
In theory you can proceed by treating the resultsyi of study i like any other random variable.
Letyi be some random variable (eg. the estimate from study i ). Then if yi are independent and E[yi]=μ , you can consistently estimate the mean with:
Adding more assumptions, letσ2i be the variance of estimate yi . Then you can efficiently estimate μ with inverse variance weighting:
In either of these cases,μ^ may be statistically significant at some confidence level even if the individual estimates are not.
BUT there may be big problems, issues to be cognizant of...
IfE[yi]≠μ then the meta-analysis may not converge to μ (i.e. the mean of the meta-analysis is an inconsistent estimator).
For example, if there's a bias against publishing negative results, this simple meta-analysis may be horribly inconsistent and biased! It would be like estimating the probability that a coin flip lands heads by only observing the flips where it didn't land tails!
Combining (1) and (2) can be especially bad.
For example, the meta-analysis of averaging polls together tends to be more accurate than any individual poll. But averaging polls together is still vulnerable to correlated error. Something that has come up in past elections is that young exit poll workers may tend to interview other young people rather than old people. If all the exit polls make the same error, then you have a bad estimate which you may think is a good estimate (the exit polls are correlated because they use the same approach to conduct exit polls and this approach generates the same error).
Undoubtedly people more familiar with meta-analysis may come up with better examples, more nuanced issues, more sophisticated estimation techniques, etc..., but this gets at some of the most basic theory and some of the bigger problems. If the different studies make independent, random error, then meta-analysis may be incredibly powerful. If the error is systematic across studies (eg. everyone undercounts older voters etc...), then the average of the studies will also be off. If you underestimate how correlated studies are or how correlated errors are, you effectively over estimate your aggregate sample size and underestimate your standard errors.
There are also all kinds of practical issues of consistent definitions etc...
quelle
Yes. Suppose you haveN p-values from N independent studies.
Fisher's test
(EDIT - in response to @mdewey's useful comment below, it is relevant to distinguish between different meta tests. I spell out the case of another meta test mentioned by mdewey below)
The classical Fisher meta test (see Fisher (1932), "Statistical Methods for Research Workers" ) statistic
Letχ22N(1−α) denote the (1−α) -quantile of the null distribution.
Suppose all p-values are equal toc , where, possibly, c>α . Then, F=−2Nln(c) and F>χ22N(1−α) when
Of course, what the meta statistic tests is "only" the "aggregate" null that all individual nulls are true, which is to be rejected as soon as only one of theN nulls is false.
EDIT:
Here is a plot of the "admissible" p-values againstN , which confirms that c grows in N , although it seems to level off at c≈0.36 .
I found an upper bound for the quantiles of theχ2 distribution
Inverse Normal test (Stouffer et al., 1949)
The test statistic is given by
More specifically,Z<−1.645 if c<Φ(−1.645/N−−√) , which tends to Φ(0)=0.5 from below as N→∞ .
quelle
The answer to this depends on what method you use for combiningp -values. Other answers have considered some of these but here I focus on one method for which the answer to the original question is no.
The minimump method, also known as Tippett's method,
is usually described in terms
of a rejection at the α∗ level of the
null hypothesis.
Define
It is easy to see the since thek th root of a number less than unity is closer to unity the last term is greater than α∗ and hence the overall result will be non-significant unless p[1] is already less than α∗ .
It is possible to work out the critical value and for example if we have ten primary studies each with ap -values of 00.05 so as close to significant as can be then the overall critical value is 0.40. The method can be seen as a special case of Wilkinson's method which uses p[r] for 1≤r≤k and in fact for the particular set of primary studies even r=2 is not significant (p=0.09 )
L H C Tippett's method is described in a book The methods of statistics. 1931 (1st ed) and Wilkinson's method is here in an article "A statistical consideration in psychological research"
quelle