Thursday 30 March 2017

Cointegration and Correlation: Explaining the Differences with Data Simulations in R

On the outset, let me make it clear that cointegration and correlation are two different metrics and they have nothing to do with each-other.

Many researchers, market analysts, policymakers, and traders are very much interested in deciphering the association between two time-series, such as prices of two indices, prices of stocks and market index, exchange rate and GDP rate, short-term and long-term interest rates, or prices in spot and futures markets, etc. The association between two price series may be checked by correlation or cointegration techniques. Sometimes, correlation or cointegration are used interchangeably. But they are two distinct things. In this post, I will share the differences between them as follows:

  1. Number of Time-series

    Correlation shows the linear association between two time-series (the number of series cannot be more than two). Whereas, while examining the cointegration, the number of series can be two or more than two as well.
  2. Understanding of Stationarity Concept

    In order to be cointegrated, all the time-series should be integrated of the same order (Engle and Granger, 1987). However, Autoregressive and Distributed Lag models can be used to examine the cointegration between stationary and non-stationary variables (I do not want to go in detail here, as it is beyond the scope of this post). However, there is no need to learn the concept of stationarity while examining the correlation between two time-series.
  3. Measurement of the Metric

    The coefficient of correlation may range from negative one to positive one, which implies that the correlation and its direction can be quantified. On the contrary, there is no range for the cointegration metric, which implies that we cannot quantify the cointegration relationship or its magnitude. We can only identify the existence of the cointegration, such as two or more series are cointegrated or not.
  4. Period of Interest (Horizon)

    Correlation could be used for the short-run or the long-run analysis. But cointegration is the concept used for long-run only.
  5. Minimum Number of Observations (Sample Size)

    Cointegration analysis requires at least 20 or more observations (as used in some research papers). However, correlation can be used for any number of observations (no restriction of minimum 20 observations in the sample).
  6. Time-series Method or Not

    Correlation is a property of number of observations (it could be a time-series or cross-section). But cointegration applies on a time-series only, not on cross-sectional observations.
  7. Level of Analysis

    When we talk about financial time-series, we talk about the correlation between log returns (not the price level), But when we talk about cointegration, it is always about prices at level (log prices, not the returns).

A case of two time-series:

After describing the differences between the correlation and the cointegration, suppose that there are only two time-series (For example, daily price data of two series). Now, we would discuss the differences of these terms. If both the price series are positively correlated, it means that both series move in synchrony. Both the prices will go up or go down every day in synchrony. Whereas, if they are cointegrated, both the prices cannot wander off in opposite direction for very long. They will have to revert back to an equilibrium price eventually. It means that both the prices need not move in the same direction every day for the existence of the cointegration. If two price series are cointegrated, the spread between these two price series (or the difference between two price-series) should be constant (or evolving gradually over the time). If any deviation occurs, both price series will correct this deviation to revert back to the mean-spread (spread is assumed to be mean reverting or stationary). Pair-trading or statistical arbitrage is used to exploit such deviations in the spread.

Overall, if two time-series are cointegrated, we can say that there exists a linear combination of both series, which is stationary. However, cointegration does not say anything about the correlation between the time-series (The correlation could be positive, negative, or no correlation at all). On the other hand, correlation is the measure of co-movement. If one price goes up (down), other price also goes up (down). So, cointegration and cointegration are very different things. All concerned people should keep it in mind. Two correlated time series can be cointegrated or not cointegrated. Two cointegrated time series can be correlated or not correlated.

I have made some videos to explain these differences by simulating some time-series in R. Please watch these videos to understand better these concepts and their differences:

Friday 27 January 2017

The art of making lectures more engaging and entertaining by Garr Reynolds

Today, while preparing notes for my class, I came across a wonderful video by TEDx (The speaker is Garr Reynolds) on the subject of the presentation in lectures/teaching. He says, "No More Boring Lectures!" Therefore, he suggests some effective ways which can engage students or make your class more interesting. In other words, classes can be interesting and engaging not boring. Is it really possible? However, it is challenging, but professors/lecturers/teachers can do it with some extra efforts. Garr has written four best-selling books on presentations including Presentation Zen, an award-winning book that has been translated into 17 languages and available all over the world. The key learnings/takeaways from the video are:

  1. The entertaining lectures (Use videos/activities):
    A professor/teacher can learn from a musician/stand-up comedian/or any other artist, how to make the class more engaging. In the context of India, we can learn this from Kapil Sharma (comedian) or AR Rahman (Musician). Present the audience what they want in the way they want. For instance, while teaching accounting or finance (my subjects), you should cite live examples from the real business world. This would be interesting and helpful for the students for the preparation of the interviews/placements. Give the students recent updates of the government policies and your insights regarding these policies (For example, GST rollout and its implications for share market/different sectors). This is time-taking, but you will rock the class. Students will love you and cite you everywhere. Your goodwill will reach all management institutes' campuses by their grapevine (even without going there).

  2. The art of storytelling:
    Every topic should be discussed. Every class should begin with, why this topic is important? and what are the real life examples of that? Then the class should discuss the nitty-gritty of the topic. Everything should be described as a story.

  3. More engaging:
    Only PPTs may make the class drab and boring. Hence, videos, activities, Q&As, and discussion among students could be a great savior. As we all know that information transmission (PPTs and lectures) does not equal learning at all. Therefore, lectures must be so engaging and entertaining that students do not want to miss your class. As interesting as a blockbuster movie.

  4. Never consider yourself an expert:
    Never ever consider yourself the master of a subject because we all are just exploring the subject. Gaining expertise over a subject is like a continuum. Hence, you will always find people better or inferior to you. Try to improve yourself. Do not have a false ego. Sometimes our students might be more prepared as compared to us. Hence, allow students to disagree in the class. Make your students follow this route always: Curiosity --->Explore --> Imagination --> Learning.

    The tips for better preparation and delivery of the lectures can be found on the official website of Garr Reynolds. Similarly, there are many other websites, we can google them. But all lectures could be entertaining and engaging. YES, IT IS NOT  IMPOSSIBLE!

Friday 30 September 2016

Is sampling distribution of sample mean always NORMAL?

Hello friends,

First of all, I would like to express that I am not an expert in the statistical distributions or statistics. But, while conducting various hypotheses tests, I come across the terms: sample means, sampling distribution of mean, central limit theorem (CLT), the law of large numbers, sample statistic, population parameter, and various distributions.

Whenever we use the ordinary least square method to estimate the intercept and slope coefficients of the regression model, we assume that the sampling distribution of our sample estimates (intercept and slopes) follow a normal distribution, given that our sample size is large, i.e. more than 30 observations.

This post intends to deliver some takeaways from my learnings about the CLT and sampling distributions. Here, I would simulate and discuss that that the sampling distribution of the sample mean is always normal irrespective of the population distribution. I would show that by using population distributions from normal, exponential, and uniform distribution. I would use R Studio to explain it, and I assume that you have a basic understanding of the interface and commands of R. Let us start friends:



First, consider a normal distribution with mean 5 and standard deviation 6 
#The population distribution ~ N (5, 6)
#Take 100 samples of sample size 100 (n=100)
(I have shown execution of these commands in a video attached below)

set.seed(1)# You can replicate the same results from resampling, use any arbitrary number
rnorm(100, 5, 6) #Take 100 random normal values from population distribution
mean(rnorm(100, 5, 6)) #Take the mean of it

#Use replicate command to repeat 100 times (You can take 1000 or 100000 samples)
#And, save the sample means in vector a
a=replicate(100, mean(rnorm(100, 5, 6)))
mean(a) #Mean of sample means in vector a
sd(a) #Equals 6/sqrt(100)=0.6 [Also called standard error]
hist(a) #Make Histogram of the sampling distribution
#As per CLT Sample distribution ~ N (5, 6/sqrt(sample size))

#Use Jarque-Bera test for normality
library(moments) #the package for Jarque-Bera test
jarque.test(a)
###############################################
###############################################

Now consider an exponential population distribution with lambda 0.2 (Arrival rate)
The population distribution ~ Exponential (1/lambda,1/lambda)
Exponential ~ (5, 5) Because mean and std deviation are 5 as mean and std dev are (1/lambda) in the exponential distribution
We will follow the same process here as we did in the case of normal distribution
#Take 100 samples of sample size 100 (n=100)

set.seed(1)
a=replicate(100, mean(rexp(100, 0.2)))
mean(a)
sd(a) #Equals 5/sqrt(100)=0.5 [Also called standard error]
hist(a)

#Invoke CLT Sample distribution~N(5, 6/sqrt(sample size))
#Use Jarque-Bera test for normality
jarque.test(a)
###############################################
###############################################

Now consider an UNIFORM population distribution
The population distribution ~ Uniform
#Take 100 samples of sample size 100 (n=100)

set.seed(1)
runif(100) #Take 100 values, minimum 0 and maximum 1

a=replicate(100, mean(runif(100)))
mean(a)
sd(a)
hist(a)
#Use Jarque-Bera test for normality
jarque.test(a)
###############################################
###############################################

Key Takeaways

1. The sample mean ~ normal distribution(mu, sigma/sqrt(n))

2. We can invoke CLT whenever sample size is large (n>30)

3. Sampling distribution is always normal, even when the population is not normal (given large sample size)

4. You can also check whether the statistic is consistent or not by increasing the sample size (from 100 observations to 500 or 1000 or more). If sample size tends to infinity, Expectation of sample statistic converges towards population parameters.
#############################################################
#############################################################
I have shown execution of these commands in a video attached below






Thursday 29 September 2016

What is Optimal Hedge Ratio? A Simple Explanation

Hedge Ratio
Hedge ratio (for an asset, such as commodities, stocks) comes into the picture when a certain asset does not have futures contracts on it.  For example, an airline company wants to hedge against the price fluctuations of jet fuel. But there is no futures contract available on jet fuel. Therefore, the company can hedge (cross-hedge) by using the futures contract of another asset which is highly correlated with jet fuel, such as heating oil.
Another example could be, a farmer wants to hedge for the production of groundnuts, but he can trade futures contract of soybean due to the unavailability of the futures contract on groundnuts. There could be many similar examples from the perspectives of a trader, investor, farmer, manufacturing companies, and portfolio managers.

Put it simply, optimal or minimum variance HEDGE RATIO is the beta of the returns of the asset to be hedged (jet fuel) relative to the returns of the asset to be used for the hedge (futures contract on heating oil). Which is nothing but the correlation between the assets multiplied by the ratio of standard deviation of the returns of the asset to be hedged and the returns of the asset to be used for the hedge.
Correlation (jet fuel, futures heat oil)*std dev (jet fuel returns)/std dev(heating oil futures returns).

Hedge means making the beta zero. Meaning, If a portfolio is fully hedged, it implies that the beta of the portfolio has become zero (no effect of market movements on the returns of the portfolio). Hedge ratio is called efficient when the net gain is zero, i.e., loss in the spot market is set off by the gains in the futures market or vice versa. 

Minimum variance hedge ratio has been explained in a very simple manner in MS Excel in these youtube videos (links are given below):
https://www.youtube.com/watch?v=p-bBbdvy7r8
https://www.youtube.com/watch?v=NcdZtCBrJbo





Saturday 24 September 2016

GARCH Modelling

GARCH stands for Generalised Auto-Regressive Conditional Heteroscedasticity. It is a very simple technique for estimating and forecasting time-varying or conditional volatility. 
To put it simply, when higher (lower) volatility is followed by higher (lower) volatility the return series does have ARCH effect or volatility clustering. If there exists a first-order serial correlation in squared returns (Use Ljung-Box test), it indicates that volatility is clustered and ARCH effect exists. Hence, volatility is not constant but time-varying (conditional).


ARCH models were introduced by Engle (1982), and Generalized ARCH or GARCH were proposed by Bollerslev (1986). ARCH process is not used nowadays because GARCH can be used in place of ARCH with fewer parameters. GARCH (1,1) would be sufficient to handle most of the financial problems as it can capture the stylizes facts of the past volatility patterns. The Equation for GARCH (1,1) is as follows:

Therefore, while estimating GARCH models we need three things:
  1. Mean equation
  2. Variance equation
  3. Distributional assumption

GARCH is a very complex model, but it could be estimated easily in R, Eviews, or STATA.

The coefficient α indicates the reaction of volatility to the unexpected return or shocks, whereas, the coefficient β shows the persistence of the volatility, i.e. how long the volatility would take to revert back to long-run volatility {ω / (1 – α – β)}. 

Most of the time (α + β) < 1. If (α + β) > 1, we cannot use simple GARCH models and we would use Integrated GARCH (IGARCH) models in that case.

Equation (1) represents the simplest vanilla symmetric GARCH model. But sometimes the conditional volatility is not symmetric, that means the impact of positive and negative unexpected returns (or news) is not the same on the volatility. 

Generally, it is observed that the impact of the negative news is more on the volatility that the impact of the positive news. Because of the bad news the leverage (Debt/Equity ratio changes), which makes the company more susceptible to bankruptcy. If equity capital declines due to the negative news, Debt/Equity ratio increases which is termed as leverage effect, which can be captured by an Asymmetric GARCH (AGARCH) model. There are many asymmetric GARCH model, but Threshold GARCH (TGARCH or GJR GARCH) is used most often to capture such asymmetric conditional volatility.

Prerequisite for GARCH:
The data should be stationary

Advantages:
  1. Better understand the volatility patterns of a stationary time-series
  2. Volatility Forecasting based on the GARCH models

  

The procedure is as follows to fit GARCH models:
  1. Make a time-series plot of the squared returns. If you can see clustering in the squared returns, ARCH effect is there.
  2. Use Ljung-Box test at lag 1. If you can reject the null hypothesis of no serial correlation, ARCH effect exists.
  3. Fit the appropriate GARCH model (GARCH, TGARCH, or EGARCH)
  4. Find the residuals and do diagnostic tests. If the residuals are IID, then model is good. Otherwise repeat the process.
  5. Use the fitted model for the forecasting purpose.

Do it in R as shown in the video below:



Do it in Eviews as shown in the video below:




References
Alexander, C. (2001), Market Models: A Guide to Financial Data Analysis, John Wiley & Sons, available at: http://sro.sussex.ac.uk/40646/ (accessed 26 June 2014).

Bollerslev, T. (1986), “Generalized autoregressive conditional heteroskedasticity”, Journal of Econometrics, Vol. 31 No. 3, pp. 307–327.

 Engle, R.F. (1982), “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation”, Econometrica, Vol. 50 No. 4, pp. 987–1007.








Four moments of distribution: Mean, Variance, Skewness, and Kurtosis

The first moment of distribution is MEAN, the second moment is VARIANCE, the third is SKEWNESS, and the fourth one is KURTOSIS, and so on (Learning first four moments is enough). 

The mean and variance are raw moments, and the skewness and kurtosis are normalized/standardized moments (normalized with standard deviation). Unlike mean and variance, skewness and kurtosis are unit-free/dimensionless moments. For example, if our data is in inches (height), then mean and variance will be in inches, but skewness and kurtosis will be unit-free (not in inches). Therefore, skewness and kurtosis will not be affected by any linear change in the scale of the data (inches to centimeter).

The brief description of these four moments is shown in the Table below:












Mean and Variance are very popular metrics, but Skewness and Kurtosis are rarely discussed (but important attributes of a distribution). Therefore, we focus on these two measures.

Skewness

If the value of the skewness comes to be negative, it implies that the distribution is skewed negatively. That means that there are some extreme values in the data which drags the mean to the left side of the distribution. Similarly, the positive skewness indicates that there are some extreme values in the data which drags the mean to the right side of the distribution. Skewness is very useful measure when data has outliers. The positive and negatively skewed distributions are shown in the figure below.

Friday 23 September 2016

ARIMA Modelling in R

Hello friends,
I have tried to keep it very simple. ARIMA stand for Auto-Regressive Integrated Moving Average. It is a very simple technique of time-series forecasting. Here the terms are:
Auto-Regressive: Lags of the variable itself
Integrated: Differencing steps required to make stationary
Moving Average: Lags of previous information shocks
ARIMA(p,d,q)

Different Names of ARIMA
AR Model: If only AR terms are there, i.e. ARIMA(1,0,0) = AR (1)
MA Model: If only error terms are there, i.e. ARIMA(0,0,1) = MA (1)
ARMA: If both are there, i.e. ARIMA(1,0,1) = ARMA(1,1)
ARIMA: If differencing term is also included, i.e. ARIMA(1,1,1) = ARMA(1,1) with first differencing
ARIMAX: If some exogenous variables are also included.

Prerequisite
The data should be stationary

Pros
  1.  Better understand the time-series patterns
  2.   Forecasting based on the ARIMA

Cons
Captures only linear relationship, hence, Neural network models could be used if a non-linear  association is found in the variables. 


The procedure is as follows to fit ARIMA (Box-Jenkins Approach):
  1. Make correlograms (ACF and PACF): PACF will indicate AR terms and ACF will show MA terms.
  2. Fit the model
  3. Find the residuals and do diagnostic tests. If the residuals are IID, then the fitted model is good. Otherwise, repeat the same process.
  4. Use the fitted model for the forecasting purpose.
Please watch the video to learn how to do ARIMA modelling in R: