Logistische Koeffizienten von Modellen mit unterschiedlichen abhängigen Variablen vergleichen?

14

Dies ist eine Folgefrage von der Frage, die ich vor ein paar Tagen gestellt habe . Meines Erachtens ist das Thema anders, weshalb eine neue Frage aufgeführt wurde.

Die Frage ist: Kann ich die Größe von Koeffizienten zwischen Modellen mit unterschiedlichen abhängigen Variablen vergleichen? Anhand einer Stichprobe möchte ich beispielsweise wissen, ob die Wirtschaft ein stärkerer Prädiktor für Stimmen im Repräsentantenhaus oder für den Präsidenten ist. In diesem Fall wären meine beiden abhängigen Variablen die Abstimmung im Repräsentantenhaus (1 für Demokraten und 0 für Republikaner) und die Abstimmung für den Präsidenten (1 für Demokraten und 0 für Republikaner). Meine unabhängige Variable ist die Wirtschaft. Ich würde in beiden Büros ein statistisch signifikantes Ergebnis erwarten, aber wie schätze ich ein, ob es in einem mehr als im anderen einen "größeren" Effekt hat? Dies ist vielleicht kein besonders interessantes Beispiel, aber ich bin gespannt, ob es einen Vergleich gibt. Ich weiß, man kann sich nicht nur die 'Größe' des Koeffizienten ansehen. So, Ist der Vergleich von Koeffizienten an Modellen mit unterschiedlichen abhängigen Variablen möglich? Und wenn ja, wie geht das?

Wenn irgendetwas davon keinen Sinn ergibt, lass es mich wissen. Alle Ratschläge und Kommentare sind willkommen.

regression logistic Ejs
quelle

2

Woher weißt du, dass man nicht nur die 'Größe' des Koeffizienten betrachten kann?

am

Ich habe deine beiden Konten zusammengelegt. Sie müssen sich noch registrieren, wie in den FAQ angegeben . (@onestop Thx für den Hinweis auf das Duplikat.)

chl

Ich nahm an, dass ich den Effekt von Prädiktoren nicht modellübergreifend vergleichen konnte, indem ich die Koeffizienten aus den Antworten auf meine vorherige Frage untersuchte. Sind die Dinge für mein Beispiel oben anders?

Ejs

2

Kopfgeld anfangen - scheint eine wichtige Frage mit drei sehr unterschiedlichen Antworten zu sein, von denen keine eine einzige Stimme hat . Wir können es besser machen. Andy Ws Papierlink zu dieser verwandten Frage scheint relevant zu sein.

Matt Parker

4

Die kurze Antwort lautet "Ja, Sie können" - aber Sie sollten die Maximum Likelihood Estimates (MLEs) des "großen Modells" mit allen Co-Variablen in beiden Modellen vergleichen.

Dies ist eine "quasi-formale" Methode, um die Wahrscheinlichkeitstheorie zu veranlassen, Ihre Frage zu beantworten

Im Beispiel sind und die gleichen Variablentypen (Brüche / Prozentsätze), sodass sie vergleichbar sind. Ich gehe davon aus, dass Sie für beide dasselbe Modell verwenden. Wir haben also zwei Modelle: $Y_{1}$ $Y_{2}$

M_{1} : Y_{1 i} \sim B i n (n_{1 i}, p_{1 i})

$M_{1}:Y_{1i}\sim Bin(n_{1i},p_{1i})$

l o g (\frac{p_{1 i}}{1 - p_{1 i}}) = α_{1} + β_{1} X_{i}

$log\left(\frac{p_{1i}}{1-p_{1i}}\right)=\alpha_{1}+\beta_{1}X_{i}$

M_{2} : Y_{2 i} \sim B i n (n_{2 i}, p_{2 i})

$M_{2}:Y_{2i}\sim Bin(n_{2i},p_{2i})$

l o g (\frac{p_{2 i}}{1 - p_{2 i}}) = α_{2} + β_{2} X_{i}

$log\left(\frac{p_{2i}}{1-p_{2i}}\right)=\alpha_{2}+\beta_{2}X_{i}$

Sie haben also die Hypothese, die Sie bewerten möchten:

H_{0} : β_{1} > β_{2}

$H_{0}:\beta_{1}>\beta_{2}$

Und Sie haben einige Daten und einige vorherige Informationen (wie die Verwendung eines logistischen Modells). So berechnen Sie die Wahrscheinlichkeit: $\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n}$

P = P r (H_{0} | {Y_{1 i}, Y_{2 i}, X_{i}}_{i = 1}^{n}, I)

$P=Pr(H_0|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I)$

Jetzt hängt nicht von dem tatsächlichen Wert von einem des Regressionsparameters, so dass sie durch Marginalisierung entfernt werden müssen. $H_0$

P = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} P r (H_{0}, α_{1}, α_{2}, β_{1}, β_{2} | {Y_{1 i}, Y_{2 i}, X_{i}}_{i = 1}^{n}, I) d α_{1} d α_{2} d β_{1} d β_{2}

$P=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} Pr(H_0,\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I) d\alpha_{1}d\alpha_{2}d\beta_{1}d\beta_{2}$

Die Hypothese schränkt lediglich den Integrationsbereich ein. Wir haben also:

P = \int_{- \infty}^{\infty} \int_{β_{2}}^{\infty} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} P r (α_{1}, α_{2}, β_{1}, β_{2} | {Y_{1 i}, Y_{2 i}, X_{i}}_{i = 1}^{n}, I) d α_{1} d α_{2} d β_{1} d β_{2}

$P=\int_{-\infty}^{\infty} \int_{\beta_{2}}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} Pr(\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I) d\alpha_{1}d\alpha_{2}d\beta_{1}d\beta_{2}$

Because the probability is conditional on the data, it will factor into the two separate posteriors for each model

P r (α_{1}, β_{1} | {Y_{1 i}, X_{i}, Y_{2 i}}_{i = 1}^{n}, I) P r (α_{2}, β_{2} | {Y_{2 i}, X_{i}, Y_{1 i}}_{i = 1}^{n}, I)

$Pr(\alpha_{1},\beta_{1}|\{Y_{1i},X_{i},Y_{2i}\}_{i=1}^{n},I)Pr(\alpha_{2},\beta_{2}|\{Y_{2i},X_{i},Y_{1i}\}_{i=1}^{n},I)$

Now because there is no direct links between $Y_{1i}$ and $\alpha_{2},\beta_{2}$ , only indirect links through $X_{i}$ , which is known, it will drop out of the conditioning in the second posterior. same for $Y_{2i}$ in the first posterior.

From standard logistic regression theory, and assuming uniform prior probabilities, the posterior for the parameters is approximately bi-variate normal with mean equal to the MLEs, and variance equal to the information matrix, denoted by $V_{1}$ and $V_{2}$ - which do not depend on the parameters, only the MLEs. so you have straight-forward normal integrals with known variance matrix. $\alpha_{j}$ marginalises out with no contribution (as would any other "common variable") and we are left with the usual result (I can post the details of the derivation if you want, but its pretty "standard" stuff):

P = Φ (\frac{{\hat{β}}_{2, M L E} - {\hat{β}}_{1, M L E}}{\sqrt{V_{1 : β, β} + V_{2 : β, β}}})

$P=\Phi\left(\frac{\hat{\beta}_{2,MLE}-\hat{\beta}_{1,MLE}}{\sqrt{V_{1:\beta,\beta}+V_{2:\beta,\beta}}}\right)$

Where $\Phi()$ is just the standard normal CDF. This is the usual comparison of normal means test. But note that this approach requires the use of the same set of regression variables in each. In the multivariate case with many predictors, if you have different regression variables, the integrals will become effectively equal to the above test, but from the MLEs of the two betas from the "big model" which includes all covariates from both models.

probabilityislogic
quelle

3

Why not? The models are estimating how much 1 unit of change in any model predictor will influence the probability of "1" for the outcome variable. I'll assume the models are the same-- that they have the same predictors in them. The most informative way to compare the relative magnitudes of any given predictor in the 2 models is to use the models to calculate (either deterministically or better by simulation) how much some meaningful increment of change (e.g., +/- 1 SD) in the predictor affects the probabilities of the respective outcome variables--& compare them! You'll want to determine confidence intervals for the two estimates as well as so you can satisfy yourself that the difference is "significant," practically & statistically.

dmk38
quelle

Thanks dmk8, very useful. Some follow-up points/questions: is this what is often meant when referring to varying the variable of interest (the economy from bad to good for example) while holding all control variables at their means? What do you mean by deterministically? How do I determine the confidence intervals around the probabilities?

Ejs

2

Consult the King. He will not disappoint. King, G., Tomz, M., & Wittenberg., J. (2000). Making the Most of Statistical Analyses: Improving Interpretation and Presentation. Am. J. Pol. Sci, 44(2), 347-361.

dmk38

2

I assume that by "my independent variable is the economy" you're using shorthand for some specific predictor.

At one level, I see nothing wrong with making a statement such as

X predicts Y1 with an odds ratio of _ and a 95% confidence interval of [ _ , _ ] while X predicts Y2 with an odds ratio of _ and a 95% confidence interval of [ _ , _ ].

@dmk38's recent suggestions look very helpful in this regard.

You might also want to standardize the coefficients to facilitate comparison.

At another level, beware of taking inferential statistics (standard errors, p-values, CIs) literally when your sample constitutes a nonrandom sample of the population of years to which you might want to generalize.

rolando2
quelle

Yes, 'the economy' is shorthand for perceptions of national economic conditions. Does the same advice apply when other predictors (controls) are included in the model?

Ejs

@Ejs - I'm afraid there's no short answer to your last question. You're getting into what it means to assess relationships when using statistical control - a fabulously intricate topic worthy of extensive study. You're also probably getting into the topic of variable selection, which is a big one as well. Imho the best source for the committed student of these topics is Pedhazur's amazon.com/Multiple-regression-behavioral-research-Pedhazur/…

rolando2

1

Let us say the interest lies in comparing two groups of people: those with $X_{1} = 1$ and those with $X_{1} = 0$ .

The exponential of $\beta_{1}$ , the corresponding coefficient, is interpreted as the ratio of the odds of success for those with $X_{1} = 1$ over the odds of success for those with $X_{1} = 0$ , conditional on the other variables in the model.

So, if you have two models with different dependend variables then the interpretation of $\beta_{1}$ changes since it is not conditioned upon the same set of variables. As a consequence, the comparison is not direct...

ocram
quelle

Does this have any implications for roland2's suggestion?

Ejs

@Ejs. Do you refer to the standardisation step? By the way, does my answer help ? Have I misunderstood the question ?

ocram

Logistische Koeffizienten von Modellen mit unterschiedlichen abhängigen Variablen vergleichen?

Antworten: