So halten Sie zeitinvariante Variablen in einem Modell mit festen Effekten

Ich habe Daten über die Angestellten eines großen italienischen Unternehmens über zehn Jahre und möchte sehen, wie sich das geschlechtsspezifische Gefälle zwischen Männern und Frauen im Laufe der Zeit verändert hat. Zu diesem Zweck laufen I OLS gepoolt:

y_{i t} = X_{i t}^{'} β + δ {m a l e}_{i} + \sum_{t = 1}^{10} γ_{t} d_{t} + ε_{i t}

$y_{it} = X'_{it}\beta + \delta {\rm male}_i + \sum^{10}_{t=1}\gamma_t d_t + \varepsilon_{it}$ wobei

y

$y$ log Gewinn pro Jahr ist,

X_{i t}

$X_{it}$ enthält Kovariaten, die sich nach Individuum und Zeit unterscheiden,

d_{t}

$d_t$ sind Jahrattrappen und

{m a l e}_{i}

${\rm male}_i$ gleich eins, wenn ein Arbeiter männlich ist und sonst null.

Jetzt habe ich Bedenken, dass einige der Kovariaten möglicherweise mit nicht beobachteten Fixeffekten korreliert werden können. Wenn ich jedoch den Schätzer für feste Effekte (innerhalb des Schätzers) oder die ersten Unterschiede verwende, verliere ich den Gender-Dummy, da sich diese Variable im Laufe der Zeit nicht ändert. Ich möchte den Zufallseffektschätzer nicht verwenden, weil ich oft Leute sagen höre, dass er Annahmen aufstellt, die sehr unrealistisch sind und wahrscheinlich nicht gelten.

Gibt es Möglichkeiten, um den Gender-Dummy beizubehalten und festgelegte Effekte gleichzeitig zu kontrollieren? Wenn es eine Möglichkeit gibt, muss ich andere Probleme mit den Fehlern für Hypothesentests für die Geschlechtsvariable in Gruppen zusammenfassen oder mich darum kümmern?

hypothesis-testing estimation econometrics fixed-effects-model user42263
quelle

Antworten:

Es gibt einige Möglichkeiten, wie Sie den Gender-Dummy in einer Regression mit festen Effekten belassen können.

Innerhalb Estimator
Angenommen, ein ähnliches Modell haben im Vergleich zum Modell gepoolten OLS welches , wobei die Variablen wie zuvor. Man beachte nun, dass und nicht identifiziert werden können, da der Schätzer innerhalb sie nicht von dem festen Effekt . Da der Achsenabschnitt für das Basisjahr , ist der geschlechtsspezifische Effekt auf das Einkommen in diesem Zeitraum. Was können wir in diesem Fall identifizieren sind

y_{i t} = β_{1} + \sum_{t = 2}^{10} β_{t} d_{t} + γ_{1} (m a l e_{i}) + \sum_{t = 1}^{10} γ_{t} (d_{t} \cdot m a l e_{i}) + X_{i t}^{'} θ + c_{i} + ϵ_{i t}

$y_{it} = \beta_1 + \sum^{10}_{t=2} \beta_t d_t + \gamma_1 (male_i) + \sum^{10}_{t=1} \gamma_t (d_t \cdot male_i) + X'_{it}\theta + c_i + \epsilon_{it}$

β_{1}

$\beta_1$

β_{1} + γ_{1} (m a l e_{i})

$\beta_1 + \gamma_1 (male_i)$

c_{i}

$c_i$

β_{1}

$\beta_1$

t = 1

$t=1$

γ_{1}

$\gamma_1$

weil sie mit Ihren Zeitattrappen interagieren und die Unterschiede in den Teileffekten Ihrer geschlechtsspezifischen Variablen im Vergleich zum ersten Zeitraum messen. Dies bedeutetwenn Sie eine Erhöhung Ihrer beobachten

im Zeitverlauf ist dies ein Hinweis auf eine Ausweitung des Verdienstgefälles zwischen Männern und Frauen.

γ_{2}, . . ., γ_{10}

$\gamma_2, ..., \gamma_{10}$

γ_{2}, . . ., γ_{10}

$\gamma_2,...,\gamma_{10}$

First-Difference Estimator
Wenn Sie die Gesamtwirkung der Unterschied zwischen Männern und Frauen im Laufe der Zeit erfahren möchten, können Sie das folgende Modell versuchen: wobei die Variable

y_{i t} = β_{1} + \sum_{t = 2}^{10} β_{t} d_{t} + γ (t \cdot m a l e_{i}) + X_{i t}^{'} θ + c_{i} + ϵ_{i t}

$y_{it} = \beta_1 + \sum^{10}_{t=2} \beta_t d_t + \gamma (t\cdot male_i) + X'_{it}\theta + c_i + \epsilon_{it}$

t = 1, 2, . . ., 10

$t = 1, 2,...,10$ is interacted with the time-invariant gender dummy. Now if you take first differences

β_{1}

$\beta_1$ and

c_{i}

$c_i$ drop out and you get

y_{i t} - y_{i (t - 1)} = \sum_{t = 3}^{10} β_{t} (d_{t} - d_{(t - 1)}) + γ (t \cdot m a l e_{i} - [(t - 1) m a l e_{i}]) + (X_{i t}^{'} - X_{i (t - 1)}^{'}) θ + ϵ_{i t} - ϵ_{i (t - 1)}

$y_{it} - y_{i(t-1)} = \sum^{10}_{t=3} \beta_t (d_t - d_{(t-1)}) + \gamma (t\cdot male_i - [(t-1)male_i]) + (X'_{it}-X'_{i(t-1)})\theta + \epsilon_{it}-\epsilon_{i(t-1)}$ Then

γ (t \cdot m a l e_{i} - [(t - 1) m a l e_{i}]) = γ [(t - (t - 1)) \cdot m a l e_{i}] = γ (m a l e_{i})

$\gamma(t\cdot male_i - [(t-1)male_i]) = \gamma[(t - (t-1))\cdot male_i] = \gamma (male_i)$ and you can identify the gender difference in earnings

γ

$\gamma$ . So the final regression equation will be:

Δ y_{i t} = \sum_{t = 3}^{10} β_{t} Δ d_{t} + γ (m a l e_{i}) + Δ X_{i t}^{'} θ + Δ ϵ_{i t}

$\Delta y_{it} = \sum_{t=3}^{10}\beta_t \Delta d_t + \gamma(male_i) + \Delta X'_{it}\theta + \Delta \epsilon_{it}$ and you get your effect of interest. The nice thing is that this is easily implemented in any statistical software but you lose a time period.

Hausman-Taylor Estimator
This estimator distinguishes between regressors that you can assume to be uncorrelated with the fixed effect $c_i$ and those that are potentially correlated with it. It further distinguishes between time-varying and time-invariant variables. Let $1$ denote variables that are uncorrelated with $c_i$ and $2$ those who are and let's say your gender variable is the only time-invariant variable. The Hausman-Taylor estimator then applies the random effects transformation:

{\tilde{y}}_{i t} = {\tilde{X}}_{1 i t}^{'} + {\tilde{X}}_{2 i t}^{'} + γ ({\tilde{m a l e}}_{i 2}) + {\tilde{c}}_{i} + {\tilde{ϵ}}_{i t}

$\tilde{y}_{it} = \tilde{X}'_{1it} + \tilde{X}'_{2it} + \gamma (\widetilde{male}_{i2}) + \tilde{c}_i + \tilde{\epsilon}_{it}$

{\tilde{X}}_{1 i t} = X_{1 i t} - {\hat{θ}}_{i} {\bar{X}}_{1 i}

$\tilde{X}_{1it} = X_{1it} - \hat{\theta}_i \overline{X}_{1i}$ where

{\hat{θ}}_{i}

$\hat{\theta}_i$ is used for the random effects transformation and

{\bar{X}}_{1 i}

$\overline{X}_{1i}$

2

$2$

c_{i}

$c_i$

{\tilde{X}}_{2 i t}

$\tilde{X}_{2it}$ the instrument is

X_{2 i t} - {\bar{X}}_{2 i}

$X_{2it} - \overline{X}_{2i}$ . The same is done for the time-invariant variables, so if you specify the gender variable to be potentially correlated with the fixed effect it gets instrumented with

{\bar{X}}_{1 i}

$\overline{X}_{1i}$ , so you must have more time-varying than time-invariant variables.

All of this might sound a little complicated but there are canned packages for this estimator. For instance, in Stata the corresponding command is xthtaylor. For further information on this method you could read Cameron and Trivedi (2009) "Microeconometrics Using Stata". Otherwise you can just stick with the two previous methods which are a bit easier.

Inference
For your hypothesis tests there is not much that needs to be considered other than what you would need to do anyway in a fixed effects regression. You need to take care for the autocorrelation in the errors, for example by clustering on the individual ID variable. This allows for an arbitrary correlation structure among clusters (individuals) which deals with autocorrelation. For a reference see again Cameron and Trivedi (2009).

Andy
quelle

Another potential way for you to keep the gender dummy is the the Mundlak's (1978) approach for a fixed effect model with time invariant variables. The Mundlak's approach would posit that the gender effect can be projected upon the group means of the time-varying variables.

Mundlak, Y. 1978: On the pooling of time series and cross section data. Econometrica 46:69-85.

emeryville
quelle

Another method is to estimate the time-invariant coefficients in a second stage equation, using the mean error as the dependent variable.

First, estimate the model with FE. From here you get an estimation of $\beta$ and $\gamma_{t}$ . For simplicity, let's forget about the year-effects. Define the estimation error $\hat{u}_{it}$ as before:

{\hat{u}}_{i t} \equiv y_{i t} - X_{i t} \hat{β}

$\hat{u}_{it} \equiv y_{it} - X_{it}\hat{\beta}$

The linear predictor $\bar{u}_{i}$ is:

{\bar{u}}_{i} \equiv \frac{\sum_{t = 1}^{T} {\hat{u}}_{i}}{T} = \bar{y_{i t}} - {\bar{x}}_{i} \hat{β}

$\bar{u}_{i} \equiv \frac{\sum_{t=1}^{T}\hat{u}_{i}}{T} = \bar{y_{it}} - \bar{x}_{i}\hat{\beta}$

Now, consider the following second stage equation:

{\bar{u}}_{i} = δ m a l e_{i} + c_{i}

$\begin{equation} \bar{u}_{i} = \delta male_{i} + c_{i} \end{equation}$

Assuming that gender is uncorrelated with unobserved factors $c_{i}$ . Then, the OLS estimator of $\delta$ is unbiased and time-consistent (this is, it is consistent when $T \rightarrow \infty$ ).

To prove the above, replace the original model into the estimator $\bar{u}_{i}$ :

{\bar{u}}_{i} = {\bar{x}}_{i} β - {\bar{x}}_{i} \hat{β} + δ m a l e_{i} + c_{i} + \frac{\sum_{t = 1}^{T} ϵ_{i t}}{T}

$\bar{u}_{i} = \bar{x}_{i}\beta - \bar{x}_{i}\hat{\beta} + \delta male_{i} + c_{i} + \frac{\sum_{t=1}^{T}\epsilon_{it}}{T}$

The expectation of this estimator is:

E ({\bar{u}}_{i}) = {\bar{x}}_{i} β - {\bar{x}}_{i} E (\hat{β}) + δ m a l e_{i} + E (c_{i}) + \frac{\sum_{t = 1}^{T} E (ϵ_{i t})}{T}

$E(\bar{u}_{i}) = \bar{x}_{i}\beta - \bar{x}_{i}E(\hat{\beta}) + \delta male_{i} + E(c_{i}) + \frac{\sum_{t=1}^{T}E(\epsilon_{it})}{T}$

If assumptions for FE consistency hold, $\hat{\beta}$ is an unbiased estimator of $\beta$ , and $E(\epsilon_{it}) = 0$ . Thus:

E ({\bar{u}}_{i}) = δ m a l e_{i} + E (c_{i})

$E(\bar{u}_{i}) = \delta male_{i} + E(c_{i})$

This is, our predictor is an unbiased estimator of the time-invariant components of the model.

Regarding consistency, the probability limit of this predictor is:

p lim_{T \to \infty} {\bar{u}}_{i} = p lim_{T \to \infty} ({\bar{x}}_{i} β) - p lim_{T \to \infty} ({\bar{x}}_{i} \hat{β}) + p lim_{T \to \infty} δ m a l e_{i} + p lim_{T \to \infty} c_{i} + p lim_{T \to \infty} (\frac{\sum_{t = 1}^{T} ϵ_{i t}}{T})

$p \lim\limits_{T \rightarrow \infty} \bar{u}_{i} = p \lim\limits_{T \rightarrow \infty} \left( \bar{x}_{i}\beta\right) - p \lim\limits_{T \rightarrow \infty} \left(\bar{x}_{i}\hat{\beta}\right) + p \lim\limits_{T \rightarrow \infty} \delta male_{i} + p \lim\limits_{T \rightarrow \infty} c_{i} + p \lim\limits_{T \rightarrow \infty} \left( \frac{\sum_{t=1}^{T}\epsilon_{it}}{T}\right)$

Again, given FE assumptions, $\hat{\beta}$ is a consistent estimator of $\beta$ , and the error term converges to its mean, which is zero. Therefore:

p lim_{T \to \infty} {\bar{u}}_{i} = δ m a l e_{i} + c_{i}

$p \lim\limits_{T \rightarrow \infty} \bar{u}_{i} = \delta male_{i} + c_{i}$

Again, our predictor is a consistent estimator of the time-invariant components of the model.

luchonacho
quelle

The Mundlak chamberlain device is a perfect tool for this. It is usually referred to as the correlated random effects model because it uses the random effect model to implicitly estimate fixed effects for time variant variables while also estimating the random effects for time invariant variables.

However, in statistical softwares, you implement it thesame as the random effect model but you have to add the means of all time variant covariates.

Martin Paul
quelle