Cluster Robust Standard Errors(CRSE)¶

Wooyong Park

Yonsei University

1. What are cluster robust standard errors?¶

Suppose the observations could be divided into subgroups(e.g. students in school) Clustering is a situation where unobservable variables(i.e. the error term) are correlated within each group.

REMARK It doesn't mean that the standard errors are clustered. It means that the unobservables are correlated within each group!!

Generally, cluster dependence violates the assumption of (a) homoscedasticity and (b) non-correlation, and causes misleadingly small standard errors. $$Cov(u_i, u_j) = 0\text{ for }i\neq j$$

2. When does the clustering happen?¶

(a)Panel Study
The data is a panel data where each individual $i$ is observed for $t=1,2,...,T$.
The correlation of unobservable characteristics within each individual's observation is expected. $$Cov(u_{i,t}, u_{i, t'}) \neq 0$$

(b) Clustered Sampling(집락추출)
Population: Students in Bangladesh
Sampling design: 160 schools in Bangladesh $$Cov(u_{i,g}, u_{j, g}) \neq 0$$

(c) Experimental Design
If the observations are classified into groups(Treatment vs Control), it also often involves clustering and hence smaller standard errors.
e.g. Differences-in-Differences and Randomized Controlled Trial

3. CRSE is often associated with Fixed Effects.¶

Similarity: Both requires the researcher to consider adequate grouping.

	Issues	Outcomes
FE	Endogeneity	Biased Point Estimate
CRSE	Heterogeneity and Autocorrelation	1. Efficiency 2. biased t-value

4. Econometrics Revisited¶

Suppose we have all the OLS assumptions for BLUE(Gaussian assumptions afterwards) satisfied in the model below. $$y_i = \alpha + \beta X_i + u_i$$ Why can we perform t-test on $\hat{\beta} = \frac{\sum(X_i-\overline{X})(Y_i-\overline{Y})}{\sum(X_i-\overline{X})^2}$ in the first place?

4.1. t-test¶

$$t = \frac{\overline{X}-\mu}{\sqrt{s^2/n}}$$

where $X_i$ ~ $N(\mu, \sigma^2)$ and $s\to\sigma$ as $n\to\infty$.

$$\hat{\beta} = \frac{\sum(X_i-\overline{X})(Y_i-\overline{Y})}{\sum(X_i-\overline{X})^2}=\frac{\sum(X_i-\overline{X})Y_i}{\sum(X_i-\overline{X})^2}$$

Since $u_i$ ~ $N(0,\sigma^2)$ and explanatory variables are non-stochastic, $Y_i$ ~ $N(X'\beta, \sigma^2)$.
Thus, the linear combination of $Y_i$ would also follow normal distribution.

$\frac{\hat{\beta}}{\sqrt{Var(\hat{\beta})}}$ follows Z-distribution, where $$Var(\hat{\beta}) = Var(\frac{\sum(X_i-\overline{X})Y_i}{\sum(X_i-\overline{X})^2}) =\frac{1}{(\sum(X_i-\overline{X})^2)^2}\sum(Var(X_i-\overline{X})u_i) = \frac{1}{(\sum(X_i-\overline{X})^2)^2}\sum(X_i-\overline{X})^2\sigma^2$$

Since we don't know the value of $\sigma$, we substitute it with residual sum of squares. $$\hat{\sigma} = \sqrt{Var(\hat{u_i})} = \sqrt{\sum\hat{u_i}^2}$$

Then, the obtained $\frac{\hat{\beta}}{\hat{\sqrt{Var(\hat{\beta})}}}$ follows $t_{n-k}$ distribution.

4.2. Heteroscedasticity Revisited¶

$Var(u_i) =\sigma_i^2$

Erik-Huber-White(EHW) proposes correction that solves the bias of $Var(\hat{\beta})$, although it cannot solve inefficiency.

$$\hat{V}_{het}(\hat{\beta}) = \frac{1}{(\sum(X_i-\overline{X})^2)^2}\sum(Var(X_i-\overline{X})\hat{u_i})

The idea is to replace $Var((X_i-\overline{X})u_i)=(X_i-\overline{X})^2\sigma_i^2$ with $(X_i-\overline{X})^2\hat{u_i}^2$.
Put simply, the idea is to alter the timing we replace the parameter with the estimator.

5. Derivation of Cluster Robust Standard Errors¶

5.1. Extension of EHW to Autocorrelation¶

Consider using the same approach for autocorrelation.

$$V_{cor}(\hat{\beta}) = Var(\frac{\sum(X_i-\overline{X})Y_i}{\sum(X_i-\overline{X})^2})=Var(\frac{\sum(X_i-\overline{X})u_i}{\sum(X_i-\overline{X})^2})$$$$ = \frac{1}{(\sum(X_i-\overline{X})^2)^2}\sum_i\sum_j Cov[(X_i-\overline{X})u_i, (X_j-\overline{X})u_j]$$$$ = \frac{1}{(\sum(X_i-\overline{X})^2)^2}\sum_i\sum_j(X_i-\overline{X})(X_j-\overline{X})E[u_iu_j]$$

Remark $Cov(X, Y) = E[XY]-E[X]E[Y]$

The obvious extension of EHW correction to autocorrelation would be replacing $\hat{u_i}\hat{u_j}$ for $E[u_iu_j]$.
i.e. $\frac{1}{(\sum(X_i-\overline{X})^2)^2}\sum_i\sum_j(X_i-\overline{X})(X_j-\overline{X})\hat{u_i}\hat{u_j}$
However, this equals zero because $\sum_i(X_i-\overline{X})\hat{u_i}=0$.

5.2. Extension of EHW is possible for clusters.¶

Although EHW isn't directly applicable for autocorrelation (since it would lead to zero value), it is applicable for clustered autocorrelation.
Recall that in the clustering problem, $Cor(u_i,u_j)=E[u_iu_j]=0$ if individuals i and j are in different clusters.

$$V_{cluster}(\hat{\beta}) = \frac{1}{(\sum(X_i-\overline{X})^2)^2}\sum_i\sum_j(X_i-\overline{X})(X_j-\overline{X})E[u_iu_j]\mathbb{I}[\text{i, j are in the same cluster}]$$

Replacing $E[u_iu_j]$ with $\hat{u_i}\hat{u_j}$ is possible in this case.

$$\hat{V}_{cluster}(\hat{\beta}) = \frac{1}{(\sum(X_i-\overline{X})^2)^2}\sum_i\sum_j(X_i-\overline{X})(X_j-\overline{X})\hat{u_i}\hat{u_j}\mathbb{I}[\text{i, j are in the same cluster}]$$

5.3. CRSE Hand-by-Hand with Python¶

Here, the calculation of CRSE is derived hand-by-hand with python. However, the model here isn't directly derived by the CRSE illustrated above. Rather, more technical adjustments are introduced which are explained thoroughly in Cameron and Miller(2015).

In [ ]:

import numpy as np

In [ ]:

N = 500       #no. observations
k = 3         #no. columns
Nclusts = 50  #no. clusters

In [ ]:

X=np.hstack((np.random.random((N,k-1)), np.ones((N,1))))  #np.ones are for the constant term
y=np.random.random((N,1))
print(X)

[[0.49404476 0.88264371 1.        ]
 [0.76755593 0.73894938 1.        ]
 [0.93128279 0.27736681 1.        ]
 ...
 [0.12231961 0.30011186 1.        ]
 [0.43451876 0.17308107 1.        ]
 [0.03788844 0.47713934 1.        ]]

In [ ]:

XX_inv = np.linalg.inv(X.T.dot(X))

In [ ]:

beta = (XX_inv).dot(X.T.dot(y))
resid = y - X.dot(beta)
print(beta)

[[-0.09454155]
 [-0.02004941]
 [ 0.5563423 ]]

In [ ]:

ID = np.random.choice([x for x in range(1, Nclusts+1)], N) #vector of cluster IDs
c_list = np.unique(ID)
print(c_list)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
 49 50]

In [ ]:

sum_XuuTX = 0
for c in range(1,Nclusts+1):
  in_cluster = (ID==c_list[c-1])
  resid_c = resid[in_cluster]
  uuT = resid_c.dot(resid_c.T)
  Xc = X[in_cluster]
  XuuTX = Xc.T.dot(uuT).dot(Xc)
  sum_XuuTX += XuuTX

In [ ]:

adj = (Nclusts/Nclusts-1)*((N-1)/(N-k))  #adjustment for Degrees of Freedom
V_beta = adj*(XX_inv.dot(sum_XuuTX).dot(XX_inv))
se_beta=np.sqrt(V_beta)

In [ ]:

print('beta: ', beta)

beta:  [[-0.09454155]
 [-0.02004941]
 [ 0.5563423 ]]

In [ ]:

print('CRSE: ', se_beta)

CRSE:  [[ 0. -0. -0.]
 [-0.  0. -0.]
 [-0. -0.  0.]]

6. When and How¶

6.1. When to use CRSE?(Athey et al., 2022)¶

According to the paper, whether or not clustering makes a difference to the standard errors should not be the basis for deciding whether or not to cluster. They note there is a misconception that if clustering matters, one should cluster.Instead, under the sampling perspective, what matters for clustering is how the sample was selected and whether there are clusters in the population of interest that are not represented in the sample. (D. Mckenzie, World Bank Blogs)

If there are clusters in the population of interest that are not represented in the sample, you should use CRSE.
If treatment is assigned at the individual level(not clusters), there is no reason to cluster.

6.2. R Code¶

Here, I present the R code for performing CRSE in the fixest package.

# packages = c('tidyverse', 'fixest')
# install.packages(packages)

library(tidyverse)
library(fixest)

mtcars %>% head()

model <- feols(mpg ~ hp + wt, data = mtcars)
cl_model <- feols(mpg ~ hp + wt, cluster = ~ cyl, data = mtcars)

data1 <- tibble(coefficient=c('intercept', 'hp', 'wt'),
                type = rep('std_error', 3),
                     std_error = model$se)
data2 <- tibble(coefficient=c('intercept', 'hp', 'wt'),
                type = rep('cl_error', 3),
                std_error = cl_model$se)
se_tibble <- rbind(data1, data2)

ggplot(data = se_tibble) +
  geom_col(aes(coefficient, std_error, fill = type), position = 'dodge') +
  theme_bw()