Building Impact Evaluation into Your NGO
by Bruce Wydick and Jeff Bloem
We’re approaching an exciting new era in the relationship between development practitioners and development researchers. After decades of academic paternalism, market fundamentalism, and the more recent emphasis on randomized control trials that has swept through major NGOs, we might finally be approaching something that is both more useful and sustainable: genuine collaboration between practitioners and researchers to create highly effective non-governmental organizations.1

But let’s be honest about where we are–which starts by understanding where we’ve been. NGOs rose to prominence in the 1970s when a basic needs approach sought to address the large gaps in poverty reduction left by state-led development efforts. In a quick about-face, academics created the intellectual underpinnings of the market fundamentalism phase that lasted through the 1980s and 1990s, which similarly failed to yield widespread reductions in poverty, especially in middle-income countries transitioning from socialism. This understandably led to a more humble phase characterized by macroeconomists licking their wounds and microeconomists simply wanting to understand “what actually works” via the use of randomized controlled trials (RCTs). During the 2000s, RCTs have been implemented by academics over an enormous range of poverty interventions in microenterprise, education, health, and myriad other facets of economic development.
In response, development NGOs have tended to sort themselves into two groups: 1) those that have not embraced RCTs, but feel quietly guilty about not embracing them; 2) those that have embraced RCTs with an almost religious fervor. Organizations in the former group may feel like they lack the funding to invest in learning how their work affects the people they serve. In other cases they are led by those for whom such questions play a back seat to charging ahead with programming, hoping that their curated collection of positive participant anecdotes will sway donors to keep the funds flowing.
The latter group of organizations, in contrast, are beginning to develop a solid basis for understanding how much (and perhaps even how) they are helping the people they serve. Major organizations like Compassion International, World Vision, Catholic Relief Services, and International Care Ministries have dramatically improved their monitoring and evaluation capabilities, often in pursuit of the gold standard: randomized evidence of impact. But this dependence on running myriad RCTs in the form of “special studies,” often in partnership with academics, comes at substantial cost.
The Problem with RCTs
Please don’t misunderstand—the RCT is an exceptionally powerful tool. RCTs have been invaluable for evaluating new medicines (including the COVID-19 vaccines) and innovative development interventions alike. Through randomization of a treatment, RCTs elegantly solve the fundamental evaluation challenge: the generation of a valid comparison vis a vis their control group whose outcome–called the “counterfactual”–can then be compared to the outcome of the treated group. In a well-executed RCT, the difference between the outcomes of the treatment and control groups estimates a program’s average treatment effect.
But here’s the uncomfortable truth: RCTs are typically a poor fit for evaluating established NGO programming. They are expensive, typically costing $100-$500 per participant in sub-Saharan Africa. They can be logistically nightmarish. Furthermore, they can generate tremendous ill will among people struggling to make ends meet. Try explaining to a control group of mothers living in destitution why their daughters (randomly) did not receive a program that was implemented to improve children’s lives and futures. Most importantly, randomization of treatment is judged to be unethical when applied to interventions that already have strong empirical support—a violation of the equipoise criterion that institutional review boards take seriously.2
Think about the RCT conundrum this way: If you genuinely believe your program transforms lives, it is hard to justify the misallocation of resources inherent to randomizing access to it. But if you don’t genuinely believe in your program’s effectiveness, why are you running the program in the first place? A recent piece in the Stanford Social Innovation Review presents other problematic features of RCTs that make them a poor fit for use in NGOs. It’s time for a new approach.
A Better Way Forward: Controlled Access to Treatment
CEIDS, the Collaborative for Econometrics and Integrated Development Studies, based at the University of Notre Dame, is encouraging and helping NGOs to develop a more practical, ethical, and cost-effective alternative: building quasi-experimental impact evaluation directly into their ongoing development work. This can be done through integrating quasi-experimental approaches into routine program operations, an approach we call “Controlled Access to Treatment” (CAT).
The controlled access to treatment concept is simple. Most NGO programs are over-subscribed; more people want to participate than organizational resources allow. Instead of selecting participants through ad hoc processes, or randomly assigning access to programming, NGOs can systematically control access based on known, measurable, and rankable criteria. The allocation of scarce resources then generates treated and untreated groups. But by providing access to a program through a controlled and well-understood process, it allows for the generation of valid counterfactuals by statistically controlling for the process that allocated treatment, while maintaining (and arguably increasing) congruency with organizational objectives.
Consider the following ways of controlling access to programming:
Poverty Targeting: Use clear and established poverty indices to serve the neediest families first, with those just above the poverty cutoff forming a natural comparison group.
Geographic Proximity: Serve families closest to program centers first, then use the outcomes of those outside the catchment area as a counterfactual to program outcomes.
First Come, First Serve: When your program reaches capacity, the early applicants become your treatment group, while those just missing the cutoff serve as controls. This rewards initiative and desire to participate.
Health-Based Selection: Prioritize children with the most severe nutritional deficits, using standardized health metrics. Children on the healthier side of the capacity constraint serve as the control group.
Age Cutoffs: Many programs work best within specific age ranges. Those who are a little too old or a little too young then form the comparison group. Study the changes of the treated and (just barely) untreated groups over time.
Staggered Rollout: Programs usually can not start everywhere at once. Control access to the program by rolling it out in a planned geographical expansion, creating a control group from those living in places that will receive the program in the future.
The Technical Foundation
The 2019 Nobel Prize in Economics was awarded for the introduction of experimental methods into development program evaluation. But the 2021 Nobel Prize in Economics was given for the introduction of quasi-experimental methods. The above examples leverage two of the most common quasi-experimental methods into into program operations: regression discontinuity design (RDD) and difference-in-differences (DiD), robust quasi-experimental methodologies that have gained tremendous credibility across development economics and the social sciences.3
Until recently these methods have been used principally as “natural experiments,” idiosyncratic characteristics of past program rollout that academics have exploited in order to study their effects. Although tech firms such as eBay, Netflix, Google, and Amazon have started to use quasi-experimental approaches to lure people to their screens and products, non-profits are only beginning to consider building them into their operations to understand their impacts on people’s lives. However, the beauty in this for NGOs is that these methods can be diced and sliced and combined with other quasi-experimental methods based on an NGO’s individual modus operandi to construct program-wide measures of impact.
Making It Work in Practice
Imagine your NGO operates a child sponsorship program that can accommodate 50 more children in a village, but 100 children have applied. You decide to use poverty scores as your selection criterion, serving the poorest families first until reaching your capacity limit. Just about everyone understands a program can’t serve everybody. But controlling access to a poverty program by objective need is a generally accepted ethical criterion for allocating scarce resources. Allocating the program in this kind of transparent, controlled manner also ensures that program access is granted in a way that is congruent with an organization’s values, mitigating problems of allocating access through special connections or other ad hoc means.
This type of process can utilize a hybrid of RDD and DiD, comparing children’s outcomes over time across families who scored just below and just above your poverty cutoff. Because families very close to the cutoff are similar except for variation in their poverty scores, and because there is good reason to think that their outcomes would be on similar trends in the absence of the program, this type of design yields estimates of program impact as credible as those stemming from a randomization.
An important challenge is collecting the necessary data on children’s outcomes over time, not only among families in the sponsorship program, but also among families whose children are not in the program. How can we address this challenge? One way is to provide a cash transfer in exchange for two hours of survey time. Cash transfers have been shown to have positive effects on food security and other key outcomes in and of themselves, so they can be an excellent component of virtually any development program. Annually surveying program participants and non-participants on the other side of the cutoff will create a longitudinal data set that is able to reveal effects of a program, even in the long term.
Confronting the Trade-offs
Quasi-experimental methods are not perfect. For example, RDDs typically have 20-40% lower statistical power than RCTs, meaning we need larger sample sizes to detect effects of similar magnitudes. And in simple RDD applications, we are estimating program effects among those near to the cutoff point, assuming that these closely reflect impacts across an entire target population, which is likely, but of course not certain. Many versions also rely on oversubscription, more people desiring access to a program than there is space.
But consider the advantages: controlled access to treatment designs can be built directly into routine program budgets rather than requiring special evaluation funding. They are flexible, allowing you to combine multiple quasi-experimental methods as your data accumulates. Most importantly, they are easier to justify ethically and operationally because the selection criteria serve legitimate organizational objectives. The key is to control access to treatment by known and measurable criteria. Once we know how treatment was controlled, we can statistically account for this known factor. As such, they are not randomized and controlled as in an RCT; rather they are simply controlled. But this “controlling” is carefully incorporated in the program’s design so that it harmonizes with an NGOs operational priorities and ethical values.
Perhaps most significantly, because controlled access to treatment designs can be embedded in routine operations, they can accommodate much larger sample sizes than typical RCT studies, potentially compensating for their lower statistical power while simultaneously providing greater external validity across an NGOs global program. In other words, an NGO does not have to extrapolate what the effects of their microcredit program are in Ethiopia from an RCT run in Bangladesh.
The Path Forward
We are at an inflection point in development practice. The question is not whether NGOs should evaluate their work—that debate is settled. The question is how to build evaluation systems that are scientifically credible, operationally feasible, ethically sound, and financially sustainable.
Controlled access to treatment offers a promising path forward, allowing organizations to maintain rigorous evaluation standards while serving their missions more effectively. By building quasi-experimental methods into routine operations, NGOs can create continuous learning systems that improve program effectiveness over time.
CEIDS is prepared to work with NGOs who would like to build an impact evaluation into their program operations. The first step in this process is to build the necessary capacity into NGO monitoring and evaluation staff. This can be done through enrolling one or more employees in CEIDS’ global online Social Impact Analytics certificate program, offered through the University of San Francisco. In this program, student-employees learn the technical tools of experimental and quasi-experimental methods that are necessary today for learning how to carry out rigorous program evaluation. This begins the process of collaboration between students and researchers both during and after the program in which CEIDS researchers directly collaborate with NGOs. The end goal is then to implement systems that will allow NGOs to understand the impacts of programs on an array of human development outcomes, among which participants programs are most and least effective, and even why programs are effective.
The era of academic researchers admonishing development practitioners to “randomize like we randomize” is giving way to something more collaborative: practitioners and researchers working together to create evaluation systems that serve both scientific rigor and organizational mission.
Bruce Wydick is Professor of Economics and International Studies at the University of San Francisco and Visiting Professor at the University of California at Davis. Jeff Bloem is a Research Fellow at the International Food Policy Research Institute (IFPRI). Both Wydick and Bloem are on the leadership board of CEIDS, the Collaborative for Econometrics and Integrated Development Studies, a global community network of over 100 development economists and applied researchers based at the University of Notre Dame.
¹ The historical phases of researcher-practitioner collaboration and the emergence of “randomized religion” in development NGOs reflect broader trends documented in the development economics literature.
² The equipoise criterion in human subjects research requires genuine uncertainty about which treatment arm might be more beneficial. This standard becomes problematic for interventions with established evidence bases.
³ The regression discontinuity design and difference-in-differences methodologies and their applications in development contexts have been extensively utilized and validated in the economics literature, with notable applications in education, microfinance, and health programs.
- Five Reasons Why Development NGOs Should Invest in the Tools of Modern Impact Evaluation
- Impact Investing and Measuring Impact: Why the Industry is Misinforming Itself and Its Supporters (And How to Get it Right)
