Friday, 30 September 2016

Is sampling distribution of sample mean always NORMAL?

Hello friends,

First of all, I would like to express that I am not an expert in the statistical distributions or statistics. But, while conducting various hypotheses tests, I come across the terms: sample means, sampling distribution of mean, central limit theorem (CLT), the law of large numbers, sample statistic, population parameter, and various distributions.

Whenever we use the ordinary least square method to estimate the intercept and slope coefficients of the regression model, we assume that the sampling distribution of our sample estimates (intercept and slopes) follow a normal distribution, given that our sample size is large, i.e. more than 30 observations.

This post intends to deliver some takeaways from my learnings about the CLT and sampling distributions. Here, I would simulate and discuss that that the sampling distribution of the sample mean is always normal irrespective of the population distribution. I would show that by using population distributions from normal, exponential, and uniform distribution. I would use R Studio to explain it, and I assume that you have a basic understanding of the interface and commands of R. Let us start friends:



First, consider a normal distribution with mean 5 and standard deviation 6 
#The population distribution ~ N (5, 6)
#Take 100 samples of sample size 100 (n=100)
(I have shown execution of these commands in a video attached below)

set.seed(1)# You can replicate the same results from resampling, use any arbitrary number
rnorm(100, 5, 6) #Take 100 random normal values from population distribution
mean(rnorm(100, 5, 6)) #Take the mean of it

#Use replicate command to repeat 100 times (You can take 1000 or 100000 samples)
#And, save the sample means in vector a
a=replicate(100, mean(rnorm(100, 5, 6)))
mean(a) #Mean of sample means in vector a
sd(a) #Equals 6/sqrt(100)=0.6 [Also called standard error]
hist(a) #Make Histogram of the sampling distribution
#As per CLT Sample distribution ~ N (5, 6/sqrt(sample size))

#Use Jarque-Bera test for normality
library(moments) #the package for Jarque-Bera test
jarque.test(a)
###############################################
###############################################

Now consider an exponential population distribution with lambda 0.2 (Arrival rate)
The population distribution ~ Exponential (1/lambda,1/lambda)
Exponential ~ (5, 5) Because mean and std deviation are 5 as mean and std dev are (1/lambda) in the exponential distribution
We will follow the same process here as we did in the case of normal distribution
#Take 100 samples of sample size 100 (n=100)

set.seed(1)
a=replicate(100, mean(rexp(100, 0.2)))
mean(a)
sd(a) #Equals 5/sqrt(100)=0.5 [Also called standard error]
hist(a)

#Invoke CLT Sample distribution~N(5, 6/sqrt(sample size))
#Use Jarque-Bera test for normality
jarque.test(a)
###############################################
###############################################

Now consider an UNIFORM population distribution
The population distribution ~ Uniform
#Take 100 samples of sample size 100 (n=100)

set.seed(1)
runif(100) #Take 100 values, minimum 0 and maximum 1

a=replicate(100, mean(runif(100)))
mean(a)
sd(a)
hist(a)
#Use Jarque-Bera test for normality
jarque.test(a)
###############################################
###############################################

Key Takeaways

1. The sample mean ~ normal distribution(mu, sigma/sqrt(n))

2. We can invoke CLT whenever sample size is large (n>30)

3. Sampling distribution is always normal, even when the population is not normal (given large sample size)

4. You can also check whether the statistic is consistent or not by increasing the sample size (from 100 observations to 500 or 1000 or more). If sample size tends to infinity, Expectation of sample statistic converges towards population parameters.
#############################################################
#############################################################
I have shown execution of these commands in a video attached below






Thursday, 29 September 2016

What is Optimal Hedge Ratio? A Simple Explanation

Hedge Ratio
Hedge ratio (for an asset, such as commodities, stocks) comes into the picture when a certain asset does not have futures contracts on it.  For example, an airline company wants to hedge against the price fluctuations of jet fuel. But there is no futures contract available on jet fuel. Therefore, the company can hedge (cross-hedge) by using the futures contract of another asset which is highly correlated with jet fuel, such as heating oil.
Another example could be, a farmer wants to hedge for the production of groundnuts, but he can trade futures contract of soybean due to the unavailability of the futures contract on groundnuts. There could be many similar examples from the perspectives of a trader, investor, farmer, manufacturing companies, and portfolio managers.

Put it simply, optimal or minimum variance HEDGE RATIO is the beta of the returns of the asset to be hedged (jet fuel) relative to the returns of the asset to be used for the hedge (futures contract on heating oil). Which is nothing but the correlation between the assets multiplied by the ratio of standard deviation of the returns of the asset to be hedged and the returns of the asset to be used for the hedge.
Correlation (jet fuel, futures heat oil)*std dev (jet fuel returns)/std dev(heating oil futures returns).

Hedge means making the beta zero. Meaning, If a portfolio is fully hedged, it implies that the beta of the portfolio has become zero (no effect of market movements on the returns of the portfolio). Hedge ratio is called efficient when the net gain is zero, i.e., loss in the spot market is set off by the gains in the futures market or vice versa. 

Minimum variance hedge ratio has been explained in a very simple manner in MS Excel in these youtube videos (links are given below):
https://www.youtube.com/watch?v=p-bBbdvy7r8
https://www.youtube.com/watch?v=NcdZtCBrJbo





Saturday, 24 September 2016

GARCH Modelling

GARCH stands for Generalised Auto-Regressive Conditional Heteroscedasticity. It is a very simple technique for estimating and forecasting time-varying or conditional volatility. 
To put it simply, when higher (lower) volatility is followed by higher (lower) volatility the return series does have ARCH effect or volatility clustering. If there exists a first-order serial correlation in squared returns (Use Ljung-Box test), it indicates that volatility is clustered and ARCH effect exists. Hence, volatility is not constant but time-varying (conditional).


ARCH models were introduced by Engle (1982), and Generalized ARCH or GARCH were proposed by Bollerslev (1986). ARCH process is not used nowadays because GARCH can be used in place of ARCH with fewer parameters. GARCH (1,1) would be sufficient to handle most of the financial problems as it can capture the stylizes facts of the past volatility patterns. The Equation for GARCH (1,1) is as follows:

Therefore, while estimating GARCH models we need three things:
  1. Mean equation
  2. Variance equation
  3. Distributional assumption

GARCH is a very complex model, but it could be estimated easily in R, Eviews, or STATA.

The coefficient α indicates the reaction of volatility to the unexpected return or shocks, whereas, the coefficient β shows the persistence of the volatility, i.e. how long the volatility would take to revert back to long-run volatility {ω / (1 – α – β)}. 

Most of the time (α + β) < 1. If (α + β) > 1, we cannot use simple GARCH models and we would use Integrated GARCH (IGARCH) models in that case.

Equation (1) represents the simplest vanilla symmetric GARCH model. But sometimes the conditional volatility is not symmetric, that means the impact of positive and negative unexpected returns (or news) is not the same on the volatility. 

Generally, it is observed that the impact of the negative news is more on the volatility that the impact of the positive news. Because of the bad news the leverage (Debt/Equity ratio changes), which makes the company more susceptible to bankruptcy. If equity capital declines due to the negative news, Debt/Equity ratio increases which is termed as leverage effect, which can be captured by an Asymmetric GARCH (AGARCH) model. There are many asymmetric GARCH model, but Threshold GARCH (TGARCH or GJR GARCH) is used most often to capture such asymmetric conditional volatility.

Prerequisite for GARCH:
The data should be stationary

Advantages:
  1. Better understand the volatility patterns of a stationary time-series
  2. Volatility Forecasting based on the GARCH models

  

The procedure is as follows to fit GARCH models:
  1. Make a time-series plot of the squared returns. If you can see clustering in the squared returns, ARCH effect is there.
  2. Use Ljung-Box test at lag 1. If you can reject the null hypothesis of no serial correlation, ARCH effect exists.
  3. Fit the appropriate GARCH model (GARCH, TGARCH, or EGARCH)
  4. Find the residuals and do diagnostic tests. If the residuals are IID, then model is good. Otherwise repeat the process.
  5. Use the fitted model for the forecasting purpose.

Do it in R as shown in the video below:



Do it in Eviews as shown in the video below:




References
Alexander, C. (2001), Market Models: A Guide to Financial Data Analysis, John Wiley & Sons, available at: http://sro.sussex.ac.uk/40646/ (accessed 26 June 2014).

Bollerslev, T. (1986), “Generalized autoregressive conditional heteroskedasticity”, Journal of Econometrics, Vol. 31 No. 3, pp. 307–327.

 Engle, R.F. (1982), “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation”, Econometrica, Vol. 50 No. 4, pp. 987–1007.








Four moments of distribution: Mean, Variance, Skewness, and Kurtosis

The first moment of distribution is MEAN, the second moment is VARIANCE, the third is SKEWNESS, and the fourth one is KURTOSIS, and so on (Learning first four moments is enough). 

The mean and variance are raw moments, and the skewness and kurtosis are normalized/standardized moments (normalized with standard deviation). Unlike mean and variance, skewness and kurtosis are unit-free/dimensionless moments. For example, if our data is in inches (height), then mean and variance will be in inches, but skewness and kurtosis will be unit-free (not in inches). Therefore, skewness and kurtosis will not be affected by any linear change in the scale of the data (inches to centimeter).

The brief description of these four moments is shown in the Table below:












Mean and Variance are very popular metrics, but Skewness and Kurtosis are rarely discussed (but important attributes of a distribution). Therefore, we focus on these two measures.

Skewness

If the value of the skewness comes to be negative, it implies that the distribution is skewed negatively. That means that there are some extreme values in the data which drags the mean to the left side of the distribution. Similarly, the positive skewness indicates that there are some extreme values in the data which drags the mean to the right side of the distribution. Skewness is very useful measure when data has outliers. The positive and negatively skewed distributions are shown in the figure below.

Friday, 23 September 2016

ARIMA Modelling in R

Hello friends,
I have tried to keep it very simple. ARIMA stand for Auto-Regressive Integrated Moving Average. It is a very simple technique of time-series forecasting. Here the terms are:
Auto-Regressive: Lags of the variable itself
Integrated: Differencing steps required to make stationary
Moving Average: Lags of previous information shocks
ARIMA(p,d,q)

Different Names of ARIMA
AR Model: If only AR terms are there, i.e. ARIMA(1,0,0) = AR (1)
MA Model: If only error terms are there, i.e. ARIMA(0,0,1) = MA (1)
ARMA: If both are there, i.e. ARIMA(1,0,1) = ARMA(1,1)
ARIMA: If differencing term is also included, i.e. ARIMA(1,1,1) = ARMA(1,1) with first differencing
ARIMAX: If some exogenous variables are also included.

Prerequisite
The data should be stationary

Pros
  1.  Better understand the time-series patterns
  2.   Forecasting based on the ARIMA

Cons
Captures only linear relationship, hence, Neural network models could be used if a non-linear  association is found in the variables. 


The procedure is as follows to fit ARIMA (Box-Jenkins Approach):
  1. Make correlograms (ACF and PACF): PACF will indicate AR terms and ACF will show MA terms.
  2. Fit the model
  3. Find the residuals and do diagnostic tests. If the residuals are IID, then the fitted model is good. Otherwise, repeat the same process.
  4. Use the fitted model for the forecasting purpose.
Please watch the video to learn how to do ARIMA modelling in R: