How to do a placebo simulation in difference-in-differences designs (part 1)

Marianne Bertrand’s 2004 article “How much should we trust differences-in-differences estimates?” (appeared in QJE) outlines several tests that can be done to assess the robustness of difference-in-differences estimates given concerns of false positives.

One recommendation is to run a placebo simulation in which–in a first step–the treatment indicator is randomly assigned to observations in the data set and–in a second step–the regressions are run again with the goal to compare the main estimates with those from the placebo regression.

I have written a little Stata script that runs such a placebo simulation and compiles an Excel spreadsheet which gives the placebo coefficient estimates along with the confidence interval bounds.

Here’s that script. It assumes a panel dataset in use which observations take the form of unit-years (e.g., firm-years). The only thing necessary to adjust for your purposes is to set the parameters at the top.

global project_folder = `"C:\Users\path to project"'
global depvar = "dependent variable"
global treatment = "treatment binary"
global post = "time binary which is 1 for observations after the treatment"
global idvar = "unit identifier variable (e.g., id)"
global timevar = "time identifier variable (e.g., years)"
global controls = "list of control variables (e.g., age)"
global seed = "110" //sets the memory for reproducible random variable generations
global treatment_groupsize = "number of observations in the treatment group (e.g., 100)"
global numruns = "#runs of the simulation (e.g., 60)"

**set excel headers
putexcel set $project_folder, replace
putexcel A1=("DV Coefficient")
putexcel B1=("DV Lower CI")
putexcel C1=("DV Upper CI")
local cellcounter = 3
set seed $seed

*estimate "true" regression
xtset $idvar $timevar
xtreg $depvar i.$treatment##i.$post $controls $timevar, fe robust
putexcel A2=(_b[1.$treatment#1.$post])
putexcel B2=(_b[1.$treatment#1.$post] - invttail(e(df_r),0.025)*_se[1.$treatment#1.$post])
putexcel C2=(_b[1.$treatment#1.$post] + invttail(e(df_r),0.025)*_se[1.$treatment#1.$post])

forvalues i=1/$numruns {
	randomtag if $timevar == awardm-4, count($treatment_groupsize) gen(r) //ssc
	bys $idvar: egen placebo = max(r)
	drop r
	tab placebo
	capture xtreg $depvar i.placebo##i.$post $controls $timevar, fe robust
	putexcel A`cellcounter'=(_b[1.placebo#1.$post])
	putexcel B`cellcounter'=(_b[1.placebo#1.$post] - invttail(e(df_r),0.025)*_se[1.placebo#1.$post])
	putexcel C`cellcounter'=(_b[1.placebo#1.$post] + invttail(e(df_r),0.025)*_se[1.placebo#1.$post])
    if _rc!=0 {
      display "Error on run "`i'
	else {
	   estimates store result`i'
	drop placebo
	local cellcounter=`cellcounter'+1

In one of the next blog posts, I will show how to use this generated spreadsheet for plots of the placebo confidence intervals or simple tabulation summaries for your papers.