Geht die Korrelation von einer Stationarität der Daten aus?

Die Inter-Market-Analyse ist eine Methode zur Modellierung des Marktverhaltens durch Auffinden von Beziehungen zwischen verschiedenen Märkten. Häufig wird eine Korrelation zwischen zwei Märkten berechnet, z. B. S & P 500 und 30-jährigen US-Treasuries. Diese Berechnungen basieren zumeist auf Preisdaten, was für jedermann offensichtlich ist, dass sie nicht zur Definition von stationären Zeitreihen passen.

Abgesehen von möglichen Lösungen (stattdessen Rückgaben verwenden) ist die Berechnung der Korrelation, deren Daten nicht stationär sind, sogar eine gültige statistische Berechnung?

Würden Sie sagen, dass eine solche Korrelationsberechnung etwas unzuverlässig oder einfach nur Unsinn ist?

correlation stationarity Milktrader
quelle

Was meinst du mit "gültiger statistischer Berechnung"? Hier ist das Etwas sehr wichtig. Die Korrelation ist eine gültige Berechnung der linearen Beziehung zwischen zwei Datensätzen. Ich verstehe nicht, warum Sie Stationarität brauchen. Meinten Sie Autokorrelation?

Robin Girard

Es gibt eine neue Website, die möglicherweise besser zu Ihrer Frage passt : quant.stackexchange.com . Jetzt verwechseln Sie Berechnung und Interpretation eindeutig.

mpiktas

@mpiktas, die Quant-Community verwendet Renditen im Vergleich zu Preisen aufgrund der Stationarität der Renditen und der Nichtstationarität der Preise. Ich bitte hier um etwas mehr als eine intuitive Erklärung, warum dies so sein sollte.

Milktrader

@robin, es gibt verschiedene Dinge, die Sie zu einer statistischen Analyse veranlassen können. Die Stichprobengröße fällt mir ebenso ein wie offensichtliche Dinge wie manipulierte Daten. Wird durch die Nichtstationarität der Daten eine Korrelationsberechnung in Frage gestellt?

Milktrader

Nicht die Berechnung, vielleicht die Interpretation, wenn die Korrelation nicht hoch ist. Wenn es hoch ist, bedeutet es eine hohe Korrelation (dh eine hohe lineare Beziehung), und zwei nichtstationäre Zeitreihen, nämlich

und

können möglicherweise stark korreliert sein (zum Beispiel, wenn

(X_{t})

$(X_t)$

(Y_{t})

$(Y_t)$

X_{t} = Y_{t}

$X_t=Y_t$

Robin Girard,

Antworten:

Die Korrelation misst die lineare Beziehung. Im informellen Kontext bedeutet Beziehung etwas Stabiles. Wenn wir die Probenkorrelation für stationäre Variablen berechnen und die Anzahl der verfügbaren Datenpunkte erhöhen, tendiert diese Probenkorrelation zur wahren Korrelation.

Es kann gezeigt werden, dass bei Preisen, bei denen es sich in der Regel um Zufallsbewegungen handelt, die Stichprobenkorrelation eher zu Zufallsvariablen tendiert. Dies bedeutet, dass das Ergebnis immer unterschiedlich ist, egal wie viele Daten wir haben.

Anmerkung Ich habe versucht, mathematische Intuition ohne Mathematik auszudrücken. Aus mathematischer Sicht ist die Erklärung sehr klar: Stichprobenmomente von stationären Prozessen konvergieren mit hoher Wahrscheinlichkeit zu Konstanten. Beispielmomente von Zufallsläufen konvergieren zu Integralen der Brownschen Bewegung, die Zufallsvariablen sind. Da die Beziehung normalerweise als Zahl und nicht als Zufallsvariable ausgedrückt wird, wird der Grund für die Nichtberechnung der Korrelation für nicht stationäre Variablen offensichtlich.

Update Da wir an der Korrelation zweier Variablen interessiert sind, nehmen wir zunächst an, dass sie aus dem stationären Prozess . Stationarität impliziert, dass und nicht von abhängen . Also Korrelation $Z_t=(X_t,Y_t)$ $EZ_t$ $cov(Z_t,Z_{t-h})$ $t$

c O r r (X_{t}, {Y.}_{t}) = \frac{c O v (X_{t}, {Y.}_{t})}{\sqrt{D X_{t} D {Y.}_{t}}}

$corr(X_t,Y_t)=\frac{cov(X_t,Y_t)}{\sqrt{DX_tDY_t}}$

hängt auch nicht von , da alle Größen in der Formel aus der Matrix , die nicht von abhängt . Also die Berechnung der Probenkorrelation $t$ $cov(Z_t)$ $t$

macht Sinn, da wir begründete Hoffnung haben könnendassProbe Korrelation abschätzen wird. Es stellt sich herausdass diese Hoffnung nicht unbegründet ist, da für stationäre Prozesse erfüllen bestimmte Bedingungen wir diese haben

\hat{ρ} = \frac{\frac{1}{T} \sum_{t = 1}^{T} (X_{t} - \bar{X}) ({Y.}_{t} - \bar{Y.})}{\sqrt{\frac{1}{T^{2}} \sum_{t = 1}^{T} (X_{t} - \bar{X})^{2} \sum_{t = 1}^{T} ({Y.}_{t} - \bar{Y.})^{2}}}

$\hat{\rho}=\frac{\frac{1}{T}\sum_{t=1}^T(X_t-\bar{X})(Y_t-\bar{Y})}{\sqrt{\frac{1}{T^2}\sum_{t=1}^T(X_t-\bar{X})^2\sum_{t=1}^T(Y_t-\bar{Y})^2}}$

ρ = c o r r (X_{t}, Y_{t})

$\rho=corr(X_t,Y_t)$

, wie

in Wahrscheinlichkeit. Außerdem

\hat{ρ} \to ρ

$\hat{\rho}\to\rho$

T \to \infty

$T\to\infty$

inVerteilung, so dass wir die Hypothesen über testen

\sqrt{T} (\hat{ρ} - ρ) \to N (0, σ_{ρ}^{2})

$\sqrt{T}(\hat{\rho}-\rho)\to N(0,\sigma_{\rho}^2)$

ρ

$\rho$

Nehmen wir nun an, dass nicht stationär ist. Dann kann davon abhängen , . Wenn wir also eine Stichprobe der Größe wir möglicherweise verschiedene Korrelationen abschätzen . Dies ist natürlich nicht möglich, daher können wir im besten Fall nur einige Funktionen von abschätzen, wie den Mittelwert oder die Varianz. Aber das Ergebnis kann nicht sinnvoll interpretiert werden. $Z_t$ $corr(X_t,Y_t)$ $t$ $T$ $T$ $\rho_t$ $\rho_t$

Lassen Sie uns nun untersuchen, was mit der Korrelation des wahrscheinlich am häufigsten untersuchten instationären Prozess-Random-Walks geschieht. Wir nennen Prozess eine Zufallsbewegung, wenn , wobei ist ein stationärer Vorgang. Der Einfachheit halber sei . Dann $Z_t=(X_t,Y_t)$ $Z_t=\sum_{s=1}^t(U_t,V_t)$ $C_t=(U_t,V_t)$ $EC_t=0$

\begin{aligned} c o r r (X_{t} Y_{t}) = \frac{E X_{t} Y_{t}}{\sqrt{D X_{t} D Y_{t}}} = \frac{E \sum_{s = 1}^{t} U_{t} \sum_{s = 1}^{t} V_{t}}{\sqrt{D \sum_{s = 1}^{t} U_{t} D \sum_{s = 1}^{t} V_{t}}} \end{aligned}

$\begin{align} corr(X_tY_t)=\frac{EX_tY_t}{\sqrt{DX_tDY_t}}=\frac{E\sum_{s=1}^tU_t\sum_{s=1}^tV_t}{\sqrt{D\sum_{s=1}^tU_tD\sum_{s=1}^tV_t}} \end{align}$

Zur weiteren Vereinfachung sei angenommen, dass ein weißes Rauschen ist. Dies bedeutet, dass alle Korrelationen für Null sind . Man beachte , dass dies nicht einschränkt auf Null zurück . $C_t=(U_t,V_t)$ $E(C_tC_{t+h})$ $h>0$ $corr(U_t,V_t)$

Dann

\begin{aligned} c o r r (X_{t}, Y_{t}) = \frac{t E U_{t} V_{t}}{\sqrt{t^{2} D U_{t} D V_{t}}} = c o r r (U_{0}, V_{0}) . \end{aligned}

$\begin{align} corr(X_t,Y_t)=\frac{tEU_tV_t}{\sqrt{t^2DU_tDV_t}}=corr(U_0,V_0). \end{align}$

So weit so gut, obwohl der Prozess nicht stationär ist, ist die Korrelation sinnvoll, obwohl wir dieselben restriktiven Annahmen treffen mussten.

Um nun zu sehen, was mit der Stichprobenkorrelation geschieht, müssen wir die folgende Tatsache über Zufallsbewegungen verwenden, die als funktionaler zentraler Grenzwertsatz bezeichnet wird:

inVerteilung, wobeiundist eine bivariateBrownsche Bewegung(zweidimensionaler Wiener-Prozess). Der Einfachheit halber Definition einführen

\begin{aligned} \frac{1}{\sqrt{T}} Z_{[T s]} = \frac{1}{\sqrt{T}} \sum_{t = 1}^{[T s]} C_{t} \to (c o v (C_{0}))^{- 1 / 2} W_{s}, \end{aligned}

$\begin{align} \frac{1}{\sqrt{T}}Z_{[Ts]}=\frac{1}{\sqrt{T}}\sum_{t=1}^{[Ts]}C_t\to (cov(C_0))^{-1/2}W_s, \end{align}$

s \in [0, 1]

$s\in[0,1]$

W_{s} = (W_{1 s}, W_{2 s})

$W_s=(W_{1s},W_{2s})$

M_{s} = (M_{1 s}, M_{2 s}) = (c o v (C_{0}))^{- 1 / 2} W_{s}

$M_s=(M_{1s},M_{2s})=(cov(C_0))^{-1/2}W_s$ .

Again for simplicity let us define sample correlation as

\begin{aligned} \hat{ρ} = \frac{\frac{1}{T} \sum_{t = 1}^{T} X_{t} Y_{t}}{\sqrt{\frac{1}{T} \sum_{t = 1}^{T} X_{t}^{2} \frac{1}{T} \sum_{t = 1}^{T} Y_{t}^{2}}} \end{aligned}

$\begin{align} \hat{\rho}=\frac{\frac{1}{T}\sum_{t=1}^TX_{t}Y_t}{\sqrt{\frac{1}{T}\sum_{t=1}^TX_t^2\frac{1}{T}\sum_{t=1}^TY_t^2}} \end{align}$

Let us start with the variances. We have

\begin{aligned} E \frac{1}{T} \sum_{t = 1}^{T} X_{t}^{2} = \frac{1}{T} E \sum_{t = 1}^{T} {(\sum_{s = 1}^{t} U_{t})}^{2} = \frac{1}{T} \sum_{t = 1}^{T} t σ_{U}^{2} = σ_{U} \frac{T + 1}{2} . \end{aligned}

$\begin{align} E\frac{1}{T}\sum_{t=1}^TX_t^2=\frac{1}{T}E\sum_{t=1}^T\left(\sum_{s=1}^tU_t\right)^2=\frac{1}{T}\sum_{t=1}^Tt\sigma_U^2=\sigma_U\frac{T+1}{2}. \end{align}$

This goes to infinity as $T$ increases, so we hit the first problem, sample variance does not converge. On the other hand continuous mapping theorem in conjunction with functional central limit theorem gives us

\begin{aligned} \frac{1}{T^{2}} \sum_{t = 1}^{T} X_{t}^{2} = \sum_{t = 1}^{T} \frac{1}{T} {(\frac{1}{\sqrt{T}} \sum_{s = 1}^{t} U_{t})}^{2} \to \int_{0}^{1} M_{1 s}^{2} d s \end{aligned}

$\begin{align} \frac{1}{T^2}\sum_{t=1}^TX_t^2=\sum_{t=1}^T\frac{1}{T}\left(\frac{1}{\sqrt{T}}\sum_{s=1}^tU_t\right)^2\to \int_0^1M_{1s}^2ds \end{align}$ where convergence is convergence in distribution, as

T \to \infty

$T\to \infty$ .

Similarly we get

\begin{aligned} \frac{1}{T^{2}} \sum_{t = 1}^{T} Y_{t}^{2} \to \int_{0}^{1} M_{2 s}^{2} d s \end{aligned}

$\begin{align} \frac{1}{T^2}\sum_{t=1}^TY_t^2\to \int_0^1M_{2s}^2ds \end{align}$ and

\begin{aligned} \frac{1}{T^{2}} \sum_{t = 1}^{T} X_{t} Y_{t} \to \int_{0}^{1} M_{1 s} M_{2 s} d s \end{aligned}

$\begin{align} \frac{1}{T^2}\sum_{t=1}^TX_tY_t\to \int_0^1M_{1s}M_{2s}ds \end{align}$

So finally for sample correlation of our random walk we get

\begin{aligned} \hat{ρ} \to \frac{\int_{0}^{1} M_{1 s} M_{2 s} d s}{\sqrt{\int_{0}^{1} M_{1 s}^{2} d s \int_{0}^{1} M_{2 s}^{2} d s}} \end{aligned}

$\begin{align} \hat{\rho}\to \frac{\int_0^1M_{1s}M_{2s}ds}{\sqrt{\int_0^1M_{1s}^2ds\int_0^1M_{2s}^2ds}} \end{align}$ in distribution as

T \to \infty

$T\to \infty$ .

So although correlation is well defined, sample correlation does not converge towards it, as in stationary process case. Instead it converges to a certain random variable.

mpiktas
quelle

The mathematical point of view explanation is what I was looking for. It gives me something to contemplate and explore further. Thanks.

Milktrader

This response seems to sidestep the original question: Aren't you just saying that yes, calculating correlation makes sense for stationary processes?

whuber

@whuber, I was answering the question having in mind the comment, but I reread the question again and as far as I understand the OP asks about calculation of correlation for non-stationary data. Calculation of correlation for stationary processes makes sense, all the macroeconometric analysis (VAR, VECM) relies on that.

mpiktas

Ich werde versuchen, meine Frage mit einer Antwort zu klären.

Whuber

@whuber my take away from the answer is that a correlation based on non-stationary data yields a random variable, which may or may not be useful. Correlation based on stationary data converges to a constant. This may explain why traders are attracted to "x-day rolling correlation" because the correlated behavior is fleeting and spurious. Whether "x-day rolling correlation" is valid or useful is for another question.

Milktrader

...is the computation of correlation whose data is non-stationary even a valid statistical calculation?

Let $W$ be a discrete random walk. Pick a positive number $h$ . Define the processes $P$ and $V$ by $P(0) = 1$ , $P(t+1) = -P(t)$ if $V(t) > h$ , and otherwise $P(t+1) = P(t)$ ; and $V(t) = P(t)W(t)$ . In other words, $V$ starts out identical to $W$ but every time $V$ rises above $h$ , it switches signs (otherwise emulating $W$ in all respects).

enter image description here

(In this figure (for $h=5$ ) $W$ is blue and $V$ is red. There are four switches in sign.)

In effect, over short periods of time $V$ tends to be either perfectly correlated with $W$ or perfectly anticorrelated with it; however, using a correlation function to describe the relationship between $V$ and $W$ wouldn't be useful (a word that perhaps more aptly captures the problem than "unreliable" or "nonsense").

Mathematica code to produce the figure:

With[{h=5},
pv[{p_, v_}, w_] := With[{q=If[v > h, -p, p]}, {q, q w}];
w = Accumulate[RandomInteger[{-1,1}, 25 h^2]];
{p,v} = FoldList[pv, {1,0}, w] // Transpose;
ListPlot[{w,v}, Joined->True]]

whuber
quelle

it is good that your answer points that out but I wouldn't say the process are correlated, I would say they are dependent. This is the point. Calculation of correlation is valide and here it will say "no correlation" and we all know this does not mean "no dependence".

robin girard

@robin That's a good point, but I constructed this example specifically so that for potentially long periods of time these two processes are perfectly correlated. The issue is not one of dependence versus correlation but inherently is related to a subtler phenomenon: that the relationship between the processes changes at random periods. That, in a nutshell, is exactly what can happen in real markets (or at least we ought to worry that it can happen!).

whuber

@whubert yes, and this is a very good example showing that there are processes that have very high correlation for potentially long periods of time and still are not correlated at all (but highly dependent) when regarding the larger temporal scale.

robin girard

@robin girard, I think the key here is that for non-stationary processes the theoretical correlation varies with time, when for the stationary processes theoretical correlation stays the same. So with sample correlation which basically is one number, it is impossible to capture the variation of true correlations in case of non-stationary processes.

mpiktas