Friday 30 September 2016

Is sampling distribution of sample mean always NORMAL?

Hello friends,

First of all, I would like to express that I am not an expert in the statistical distributions or statistics. But, while conducting various hypotheses tests, I come across the terms: sample means, sampling distribution of mean, central limit theorem (CLT), the law of large numbers, sample statistic, population parameter, and various distributions.

Whenever we use the ordinary least square method to estimate the intercept and slope coefficients of the regression model, we assume that the sampling distribution of our sample estimates (intercept and slopes) follow a normal distribution, given that our sample size is large, i.e. more than 30 observations.

This post intends to deliver some takeaways from my learnings about the CLT and sampling distributions. Here, I would simulate and discuss that that the sampling distribution of the sample mean is always normal irrespective of the population distribution. I would show that by using population distributions from normal, exponential, and uniform distribution. I would use R Studio to explain it, and I assume that you have a basic understanding of the interface and commands of R. Let us start friends:



First, consider a normal distribution with mean 5 and standard deviation 6 
#The population distribution ~ N (5, 6)
#Take 100 samples of sample size 100 (n=100)
(I have shown execution of these commands in a video attached below)

set.seed(1)# You can replicate the same results from resampling, use any arbitrary number
rnorm(100, 5, 6) #Take 100 random normal values from population distribution
mean(rnorm(100, 5, 6)) #Take the mean of it

#Use replicate command to repeat 100 times (You can take 1000 or 100000 samples)
#And, save the sample means in vector a
a=replicate(100, mean(rnorm(100, 5, 6)))
mean(a) #Mean of sample means in vector a
sd(a) #Equals 6/sqrt(100)=0.6 [Also called standard error]
hist(a) #Make Histogram of the sampling distribution
#As per CLT Sample distribution ~ N (5, 6/sqrt(sample size))

#Use Jarque-Bera test for normality
library(moments) #the package for Jarque-Bera test
jarque.test(a)
###############################################
###############################################

Now consider an exponential population distribution with lambda 0.2 (Arrival rate)
The population distribution ~ Exponential (1/lambda,1/lambda)
Exponential ~ (5, 5) Because mean and std deviation are 5 as mean and std dev are (1/lambda) in the exponential distribution
We will follow the same process here as we did in the case of normal distribution
#Take 100 samples of sample size 100 (n=100)

set.seed(1)
a=replicate(100, mean(rexp(100, 0.2)))
mean(a)
sd(a) #Equals 5/sqrt(100)=0.5 [Also called standard error]
hist(a)

#Invoke CLT Sample distribution~N(5, 6/sqrt(sample size))
#Use Jarque-Bera test for normality
jarque.test(a)
###############################################
###############################################

Now consider an UNIFORM population distribution
The population distribution ~ Uniform
#Take 100 samples of sample size 100 (n=100)

set.seed(1)
runif(100) #Take 100 values, minimum 0 and maximum 1

a=replicate(100, mean(runif(100)))
mean(a)
sd(a)
hist(a)
#Use Jarque-Bera test for normality
jarque.test(a)
###############################################
###############################################

Key Takeaways

1. The sample mean ~ normal distribution(mu, sigma/sqrt(n))

2. We can invoke CLT whenever sample size is large (n>30)

3. Sampling distribution is always normal, even when the population is not normal (given large sample size)

4. You can also check whether the statistic is consistent or not by increasing the sample size (from 100 observations to 500 or 1000 or more). If sample size tends to infinity, Expectation of sample statistic converges towards population parameters.
#############################################################
#############################################################
I have shown execution of these commands in a video attached below






No comments:

Post a Comment