[Note for non-economist readers of Across Two Worlds, the following post may induce drowsiness and probably should not be ingested while driving a motor vehicle or operating heavy machinery.]
In development work we have a tendency to search for the elixir, the poverty intervention that will be effective anywhere and everywhere. Economists tend to view human beings and behavior through universal laws, and as such we envy medical researchers who sometimes with a single breakthrough can wipe out a terrible disease on a global scale. This is what development economists would like to do with poverty. But too often in development work we overlook the importance of diagnosis in both the implementation and study of poverty interventions. Diagnosis and the elixir lie at opposite ends of the treatment spectrum. The elixir says “this will cure what ails you”; diagnosis tries to carefully match conditions with treatments.
At a practitioner level, a failure to consider an adequate match between an individual subject and whether a particular treatment is likely to have a positive impact on that subject is likely to result in low impacts from interventions. For researchers, it will result in a failure in research studies to reject null hypotheses of zero impact when a positive impact on correctly diagnosed individuals indeed exists.
Consider the sample frame within which medical researchers carry out randomized trials, for say, a new cancer treatment. The sample frame within which the trial for this new treatment will be conducted is taken not from the general population, or even from a pool of subjects with cancer, but from a pool of subjects whom have been painstakingly diagnosed with the specific form of cancer for which researchers believe the particular treatment to be effective.
Our approach in development practice and development economics deviates greatly from this standard. We often 1) assume the treatment to be effective for everyone—as is often the case with many schooling inputs for children; 2) take an educated guess at a sub-group in the population where an intervention might be effective—as is often the case with clothing, animal, or other in-kind donations; or 3) allow people to mainly self-diagnose—as is the case with microfinance.
Moreover, our research agenda remains driven by trying to ask whether an intervention is effective and not when different interventions might be effective. The reporting of heterogeneous effects across different subject characteristics, which now commonly form ancillary components to randomized trials, tend to produce data-driven results, giving us little basis for causal inference and the creation of diagnostic tools. Frankly, knowing that cash transfers happened to increase schooling more among gender x in country y in randomized trial z, is not really that helpful.
Dani Rodrik’s important work on diagnostics and recent book Economic Rules highlights the dire consequences of diagnosis failure in development macroeconomics. In this work he warns the profession against adherence to various strains of economic dogma and against the temptation to view any particular approach to macroeconomic policy as an elixir for bringing about economic growth and poverty reduction across contexts. He argues that the science of diagnosis ought to be a primary instrument in the toolkit of the development macroeconomist.
Perhaps this is an area in which we in development microeconomics can learn something from our macro colleagues, because as development microeconomists, our failure to properly diagnose subjects before a controlled intervention (or to even attempt to) results in a loss of statistical power (our probability of rejecting false null hypotheses of zero impact) when a significant impact actually does exist on properly diagnosed subjects.
What development economists call “targetting” is related to diagnosis, but the difference might be compared to that between a public health worker who more generally targets a vaccine at a subset of the population believed to derive greater benefit from it and a physician who diagnoses individual patients before assigning treatment.
Microfinance provides one of the clearest examples of the consequences of diagnosis failure both at the practitioner and research levels. Microfinance borrowers are typically self-diagnosed, with a second screening carried out by the lender, not typically for impact, but for the probability that the borrower will repay. (The correlation between the two is clearly not zero, but it is far from perfect.) There is no study of microfinance impact of which I am aware in which subjects have been individually diagnosed with binding credit constraints or that have had been included in a randomized trial based on a measured shadow value of working capital in enterprises. Essentially researchers have simply followed practitioners and estimated the impacts of microfinance on everyone to whom microfinance lenders seem willing to lend. This approach gives us impact measurements that swallow the diagnosis failures of practitioners, but it does not tell us whether microfinance has an impact where it is supposed to have one.
Moreover, the most well-known studies on microfinance (for example, the six studies in last year’s AEJ: Applied Economics symposium) estimate intent-to-treat (ITT) effects in part so that studies are able to capture externalities at the local level. But ITT estimations in such a context yield a sample frame that is likely to include a large fraction of subjects for whom the intervention has little chance of demonstrating the desired effect. There is a trade-off between proper diagnosis as a prelude to an intervention and the externalities yielded by the intervention that can be illustrated in a simple diagram.
Suppose that a is the average treatment effect of an intervention on the properly diagnosed, e is the externality of the intervention to all others in the treatment group (with no externality to the control), and d is the percent of the treatment group that is diagnosed correctly. In this case, the ITT is just the weighted average of these effects between the properly diagnosed and others (the misdiagnosed) and is ad + e(1 – d). To estimate the ITT with 95% confidence and 80% power, the condition must hold that ad + e(1 – d) = 2.8SE , or that the percent correctly diagnosed must be equal to d = 2.8SE – e / a – e (where 2.8 is the sum of corresponding z-scores of 1.96 and 0.84 and SE is the standard error.) Assuming a > e, first differentiation shows this yields a negative relationship between d and e; analysis of the second derivative shows the relationship to be concave, as shown in the figure below. This yields a continuous set of statistical power contours that illustrate the trade-off in ITT estimations between correct diagnosis and the strength of externalities within the pool of subjects exposed to the intervention. Statistically speaking, the result is quite similar to the loss in statistical power one experiences with a treatment that is targeted at treatment group, but where very few people actually take up the treatment.
There are a few lessons that we learn from this framework about the importance of correct diagnosis in randomized evaluations of poverty interventions.
First, if externalities are very large, proper diagnosis is not as important to statistical power (note the increasing vertical distance between statistical power contours as e becomes larger), but…
Second, unless externalities are very strong, misdiagnosis (or failure to diagnose at all) is quite costly in terms of statistical power and our ability to identify interventions that are truly effective for individuals on whom they should be effective.
Third, in ITT estimates statistical power is very sensitive to misdiagnosis when externalities are weak. Taking the case when externalities are zero, if the accuracy of diagnosis falls from 100% to 70% (which doesn’t seem so bad really), statistical power falls from 80% to 50%. If the accuracy of diagnosis falls to 40%, statistical power falls to 20%. (One wonders if 40% of subjects in the recent randomized trials of microfinance had binding credit constraints. Less microfinance, but properly diagnosed, may be more.)
Fourth, the necessary sample size to maintain statistical power increases inversely in the squared of diagnosis failure. For example, consider a simple experiment with the dependent variable being a normalized index with mean zero and unit variance, and researchers desire a minimum detectible effect of 0.2 standard deviations. With 95% confidence, 80% power, and d = 100%, the required sample size is 196. If correct diagnosis falls from 100% to 50%, the sample size needed to reject a null hypothesis of zero effect increases to 784, fourfold.
How do we incorporate diagnosis into randomized evaluations? We can build in diagnosis into studies of many different kinds of interventions, including areas such as health, education, and microfinance. This can be done by using common sense and sound theory to form a diagnostic index created from a module of survey questions to be administered at baseline among both treatment and control groups. In the case of microfinance, for example, this index should be composed of a battery of questions that would seek to identify binding credit constraints within a sample of micro-entrepreneurs, gauging aspirations (Is it external or internal constraints that are likely to be binding?), current status of credit access, profitability of the marginal production unit, and so forth. Diagnostic indices should be tied as closely as possible to economic theory, eschewing an approach that suggests a miscellany moderating variables that may account for heterogeneous treatment effects. It is important to register a pre-analysis plan that includes our theory-based diagnostic that specifies where treatment is expected to realize significant effects.
In this type of analysis using diagnostics, instead of a regression of an impact outcome on treatment, where the regression coefficient on treatment becomes our principal focus of interest, we pre-specify adding our diagnostic index and the interaction between treatment status and our index as a moderating variable. If both the coefficients on treatment and the interaction between treatment and the diagnostic index are significant, then diagnosis is important, but the treatment also realizes significant impacts across diagnostic status. If the latter coefficient is significant at follow-up, but the former is insignificant, what it tells us is that impact is contingent on proper diagnosis. If we have the reverse, the treatment is more universally effective and diagnostics are not important. If neither is significant, we cannot reject the hypothesis that the treatment doesn’t have an impact on anybody in the sample frame, diagnosed or not.
In recent years empirical development economics has made immense strides in estimating treatment effects from poverty interventions. But as innovations in medical diagnostics constitute critical innovations in medical research, the proper formation of increasingly accurate and elegant diagnostic tests and indices for interventions such as microfinance should also form an integral part of our research agenda.
Looking at interventions through the lens of diagnosis may help us understand, for example, why cash grants given in the context of business plan competitions have been found to be so effective (McKensie, 2015) relative to microfinance. In the former, subjects are carefully diagnosed as to the impact that injections of working capital are likely to have on enterprise growth and income. In the latter they are not. As a result we have to be careful comparing interventions in cases where the sample in one study was well-diagnosed to benefit from treatment with a second study in which a treatment was administered broadly. Our impact results will reflect both factors: the accuracy of pre-intervention diagnosis as well as the impact of the intervention on those whom should benefit from it. Learning to properly utilize diagnostics in development practice and research will lead to both more efficient use of development resources and to poverty interventions with higher average impacts.
Follow AcrossTwoWorlds.net on Twitter @BruceWydick
- A Tribute to My Father, Professor Richard Wydick (1937-2016)
- San Francisco: The Hardest City to be Poor (or Middle Class)