Ftools – One solution when Stata takes hours over hours to complete

Annoyed that Stata again took another 2 hours to complete your command, just to find out that you missed to something and need to rerun? And another two hours gone..

Experiences like this are typical for working on larger datasets. Long processing times for collapse, merge, and egen commands can take forever, at least it feels like this.

One approach to reduce frustration is to use the awesome ftools provided by Sergio Correia. Ftools are a reimplementation of sone of the most popular Stata data processing commands.

Currently, the following commands are available in a revised implementation:

  • egen group (now fegen group)
  • collapse (now fcollapse)
  • merge (now join)
  • levelsof (now flevelsof)

The time savings can be immense. From my experience in using ftools I experienced that my merges and collapse took only roughly a third of what they do with the regular Stata commands–immense productivity improvements!

Now how to get ftools? You get them via SSC, thus when in your Stata command window type the following command and hit enter:

ssc install ftools

Advertisements

How to do a placebo simulation in difference-in-differences designs (part 1)

Marianne Bertrand’s 2004 article “How much should we trust differences-in-differences estimates?” (appeared in QJE) outlines several tests that can be done to assess the robustness of difference-in-differences estimates given concerns of false positives.

One recommendation is to run a placebo simulation in which–in a first step–the treatment indicator is randomly assigned to observations in the data set and–in a second step–the regressions are run again with the goal to compare the main estimates with those from the placebo regression.

I have written a little Stata script that runs such a placebo simulation and compiles an Excel spreadsheet which gives the placebo coefficient estimates along with the confidence interval bounds.

Here’s that script. It assumes a panel dataset in use which observations take the form of unit-years (e.g., firm-years). The only thing necessary to adjust for your purposes is to set the parameters at the top.

global project_folder = `"C:\Users\path to project"'
global depvar = "dependent variable"
global treatment = "treatment binary"
global post = "time binary which is 1 for observations after the treatment"
global idvar = "unit identifier variable (e.g., id)"
global timevar = "time identifier variable (e.g., years)"
global controls = "list of control variables (e.g., age)"
global seed = "110" //sets the memory for reproducible random variable generations
global treatment_groupsize = "number of observations in the treatment group (e.g., 100)"
global numruns = "#runs of the simulation (e.g., 60)"

**set excel headers
putexcel set $project_folder, replace
putexcel A1=("DV Coefficient")
putexcel B1=("DV Lower CI")
putexcel C1=("DV Upper CI")
local cellcounter = 3
set seed $seed

*estimate "true" regression
xtset $idvar $timevar
xtreg $depvar i.$treatment##i.$post $controls $timevar, fe robust
putexcel A2=(_b[1.$treatment#1.$post])
putexcel B2=(_b[1.$treatment#1.$post] - invttail(e(df_r),0.025)*_se[1.$treatment#1.$post])
putexcel C2=(_b[1.$treatment#1.$post] + invttail(e(df_r),0.025)*_se[1.$treatment#1.$post])

forvalues i=1/$numruns {
	randomtag if $timevar == awardm-4, count($treatment_groupsize) gen(r) //ssc
	bys $idvar: egen placebo = max(r)
	drop r
	tab placebo
	
	capture xtreg $depvar i.placebo##i.$post $controls $timevar, fe robust
	putexcel A`cellcounter'=(_b[1.placebo#1.$post])
	putexcel B`cellcounter'=(_b[1.placebo#1.$post] - invttail(e(df_r),0.025)*_se[1.placebo#1.$post])
	putexcel C`cellcounter'=(_b[1.placebo#1.$post] + invttail(e(df_r),0.025)*_se[1.placebo#1.$post])
	
    if _rc!=0 {
      display "Error on run "`i'
    }
	else {
	   estimates store result`i'
	}
	drop placebo
	local cellcounter=`cellcounter'+1
}


In one of the next blog posts, I will show how to use this generated spreadsheet for plots of the placebo confidence intervals or simple tabulation summaries for your papers.