Ich untersuche das mediane Überleben mit Kaplan-Meier in verschiedenen Staaten für eine Krebsart. Es gibt ziemlich große Unterschiede zwischen den Staaten. Wie kann ich das mediane Überleben aller Bundesstaaten vergleichen und feststellen, welche signifikant vom mittleren medianen Überleben im ganzen Land abweichen?
multiple-comparisons
survival
Mischa
quelle
quelle
Antworten:
Eine Sache, die bei der Kaplan-Meier-Überlebenskurve zu beachten ist, ist, dass sie grundsätzlich beschreibend und nicht inferentiell ist . Es ist nur eine Funktion der Daten, und dahinter verbirgt sich ein unglaublich flexibles Modell. Dies ist eine Stärke, weil dies bedeutet, dass es praktisch keine Annahmen gibt, die gebrochen werden könnten, sondern eine Schwäche, weil es schwierig ist, sie zu verallgemeinern, und dass sie sowohl zu "Rauschen" als auch zu "Signal" passt. Wenn Sie eine Schlussfolgerung ziehen möchten, müssen Sie im Grunde genommen etwas Unbekanntes einführen, das Sie wissen möchten.
Ein Weg, um die medianen Überlebenszeiten zu vergleichen, besteht darin, die folgenden Annahmen zu treffen:
Der "konservativste" Weg, diese Annahmen zu verwenden, ist das Prinzip der maximalen Entropie. Sie erhalten also:
Wobei und λ so gewählt werden, dass das PDF normalisiert wird und der erwartete Wert istK λ . Jetzt haben wir:ti
= K [ - e x p ( - λ T i )
Sie haben also eine Reihe von Wahrscheinlichkeitsverteilungen für jeden Zustand.
Welche geben eine gemeinsame Wahrscheinlichkeitsverteilung von:
Nun hört es sich so an, als ob Sie die Hypothese testen möchten : T 1 = T 2 = ⋯ = T N = ¯ t , wobei ¯ t = 1H0:T1=T2=⋯=TN=t¯ ist die mittlere mediane Überlebenszeit. Die schwerwiegende alternative Hypothese zum Testen ist die Hypothese "Jeder Zustand ist eine einzigartige und schöne Schneeflocke"HA:T1=t1,...,TN=tN,da dies die wahrscheinlichste Alternative ist und somit die verlorene Information darstellt im Übergang zur einfacheren Hypothese (ein "Minimax" -Test). Das Maß für die Evidenz gegen die einfachere Hypothese ergibt sich aus der Odds Ratio:t¯=1N∑Ni=1ti HA:T1=t1,…,TN=tN
Where
is the harmonic mean. Note that the odds will always favour the perfect fit, but not by much if the median survival times are reasonably close. Further, this gives you a direct way to state the evidence of this particular hypothesis test:
assumptions 1-3 give maximum odds ofO(HA|H0):1 against equal median survival times across all states
Combine this with a decision rule, loss function, utility function, etc. which says how advantageous it is to accept the simpler hypothesis, and you've got your conclusion!
There is no limit to the amount of hypothesis you can test for, and give similar odds for. Just changeH0 to specify a different set of possible "true values". You could do "significance testing" by choosing the hypothesis as:
So this hypothesis is verbally "statei has different median survival rate, but all other states are the same". And then re-do the odds ratio calculation I did above. Although you should be careful about what the alternative hypothesis is. For any one of these below is "reasonable" in the sense that they might be questions you are interested in answering (and they will generally have different answers)
Now one thing which has been over-looked here is correlations between states - this structure assumes that knowing the median survival rate in one state tells you nothing about the median survival rate in another state. While this may seem "bad" it is not to difficult to improve on, and the above calculations are good initial results which are easy to calculate.
Adding connections between states will change the probability models, and you will effectively see some "pooling" of the median survival times. One way to incorporate correlations into the analysis is to separate the true survival times into two components, a "common part" or "trend" and an "individual part":
And then constrain the individual partUi to have average zero over all units and unknown variance σ to be integrated out using a prior describing what knowledge you have of the individual variability, prior to observing the data (or jeffreys prior if you know nothing, and half cauchy if jeffreys causes problems).
quelle
Thought I just add to this topic that you might be interested in quantile regression with censoring. Bottai & Zhang 2010 proposed a "Laplace Regression" that can do just this task, you can find a PDF on this here. There is a package for Stata for this, it has yet not been translated to R although the quantreg package in R has a function for censored quantile regression, crq, that could be an option.
I think the approach is very interesting and might be much more intuitive to patients that hazards ratios. Knowing for instance that 50 % on the drug survive 2 more months than ones that don't take the drug and the side effects force you to stay 1-2 months at the hospital might make the choice of treatment much easier.
quelle
First I would visualize the data: calculate confidence intervals and standard errors for the median survivals in each state and show CIs on a forest plot, medians and their SEs using a funnel plot.
The “mean median survival all across the country” is a quantity that is estimated from the data and thus has uncertainty so you can not take it as a sharp reference value during significance testing. An other difficulty with the mean-of-all approach is that when you compare a state median to it you are comparing the median to a quantity that already includes that quantity as a component. So it is easier to compare each state to all other states combined. This can be done by performing a log rank test (or its alternatives) for each state.
(Edit after reading the answer of probabilityislogic: the log rank test does compare survival in two (or more) groups, but it is not strictly the median that it is comparing. If you are sure it is the median that you want to compare, you may rely on his equations or use resampling here, too)
You labelled your question [multiple comparisons], so I assume you also want to adjust (increase) your p values in a way that if you see at least one adjusted p value less than 5% you could conclude that “median survival across states is not equal” at the 5% significance level. You may use generic and overly conservative methods like Bonferroni, but the optimal correction scheme will take the correlations of the p values into consideration. I assume that you don't want to build any a priori knowledge into the correction scheme, so I will discuss a scheme where the adjustment is multiplying each p value by the same C constant.
As I don't know how to derive the formula to obtain the optimal C multiplyer, I would use resampling. Under the null hypothesis that the survival characteristics are the same across all states, so you can permutate the state labels of the cancer cases and recalculate medians. After obtaining many resampled vectors of state p values I would numerically find the C multiplyer below which less than 95% of the vectors include no significant p values and above which more then 95%. While the range looks wide I would repeatedly increase the number of resamples by an order of magnitude.
quelle