Thursday 30 March 2017

Cointegration and Correlation: Explaining the Differences with Data Simulations in R

On the outset, let me make it clear that cointegration and correlation are two different metrics and they have nothing to do with each-other.

Many researchers, market analysts, policymakers, and traders are very much interested in deciphering the association between two time-series, such as prices of two indices, prices of stocks and market index, exchange rate and GDP rate, short-term and long-term interest rates, or prices in spot and futures markets, etc. The association between two price series may be checked by correlation or cointegration techniques. Sometimes, correlation or cointegration are used interchangeably. But they are two distinct things. In this post, I will share the differences between them as follows:

  1. Number of Time-series

    Correlation shows the linear association between two time-series (the number of series cannot be more than two). Whereas, while examining the cointegration, the number of series can be two or more than two as well.
  2. Understanding of Stationarity Concept

    In order to be cointegrated, all the time-series should be integrated of the same order (Engle and Granger, 1987). However, Autoregressive and Distributed Lag models can be used to examine the cointegration between stationary and non-stationary variables (I do not want to go in detail here, as it is beyond the scope of this post). However, there is no need to learn the concept of stationarity while examining the correlation between two time-series.
  3. Measurement of the Metric

    The coefficient of correlation may range from negative one to positive one, which implies that the correlation and its direction can be quantified. On the contrary, there is no range for the cointegration metric, which implies that we cannot quantify the cointegration relationship or its magnitude. We can only identify the existence of the cointegration, such as two or more series are cointegrated or not.
  4. Period of Interest (Horizon)

    Correlation could be used for the short-run or the long-run analysis. But cointegration is the concept used for long-run only.
  5. Minimum Number of Observations (Sample Size)

    Cointegration analysis requires at least 20 or more observations (as used in some research papers). However, correlation can be used for any number of observations (no restriction of minimum 20 observations in the sample).
  6. Time-series Method or Not

    Correlation is a property of number of observations (it could be a time-series or cross-section). But cointegration applies on a time-series only, not on cross-sectional observations.
  7. Level of Analysis

    When we talk about financial time-series, we talk about the correlation between log returns (not the price level), But when we talk about cointegration, it is always about prices at level (log prices, not the returns).

A case of two time-series:

After describing the differences between the correlation and the cointegration, suppose that there are only two time-series (For example, daily price data of two series). Now, we would discuss the differences of these terms. If both the price series are positively correlated, it means that both series move in synchrony. Both the prices will go up or go down every day in synchrony. Whereas, if they are cointegrated, both the prices cannot wander off in opposite direction for very long. They will have to revert back to an equilibrium price eventually. It means that both the prices need not move in the same direction every day for the existence of the cointegration. If two price series are cointegrated, the spread between these two price series (or the difference between two price-series) should be constant (or evolving gradually over the time). If any deviation occurs, both price series will correct this deviation to revert back to the mean-spread (spread is assumed to be mean reverting or stationary). Pair-trading or statistical arbitrage is used to exploit such deviations in the spread.

Overall, if two time-series are cointegrated, we can say that there exists a linear combination of both series, which is stationary. However, cointegration does not say anything about the correlation between the time-series (The correlation could be positive, negative, or no correlation at all). On the other hand, correlation is the measure of co-movement. If one price goes up (down), other price also goes up (down). So, cointegration and cointegration are very different things. All concerned people should keep it in mind. Two correlated time series can be cointegrated or not cointegrated. Two cointegrated time series can be correlated or not correlated.

I have made some videos to explain these differences by simulating some time-series in R. Please watch these videos to understand better these concepts and their differences: