Sneak peek: generalizability in the social sciences

One of my current research papers looks at how social scientists think about the idea of generalizability.  It’s not quite ready for public consumption, but in the meantime I wanted to share some of the interesting papers which have influenced my thinking on the topic.

Mary Ann Bates & Rachel Glennerster.  2017.  “The Generalizability Puzzle.”  Stanford Social Innovation Review.

At J-PAL we adopt a generalizability framework for integrating different types of evidence, including results from the increasing number of randomized evaluations of social programs, to help make evidencebased policy decisions. We suggest the use of a four-step generalizability framework that seeks to answer a crucial question at each step:

Step 1: What is the disaggregated theory behind the program?
Step 2: Do the local conditions hold for that theory to apply?
Step 3: How strong is the evidence for the required general behavioral change?
Step 4: What is the evidence that the implementation process can be carried out well?

Mark Rosenzweig & Chris Udry.  2019.  “External Validity in a Stochastic World.”  Review of Economic Studies.

We examine empirically the generalizability of internally valid micro estimates of causal effects in a fixed population over time when that population is subject to aggregate shocks. Using panel data we show that the returns to investments in agriculture in India and Ghana, small and medium non-farm enterprises in Sri Lanka, and schooling in Indonesia fluctuate significantly across time periods. We show how the returns to these investments interact with specific, measurable and economically-relevant aggregate shocks, focusing on rainfall and price fluctuations. We also obtain lower-bound estimates of confidence intervals of the returns based on estimates of the parameters of the distributions of rainfall shocks in our two agricultural samples. We find that even these lower-bound confidence intervals are substantially wider than those based solely on sampling error that are commonly provided in studies, most of which are based on single-year samples. We also find that cross-sectional variation in rainfall cannot be confidently used to replicate within-population rainfall variability. Based on our findings, we discuss methods for incorporating information on external shocks into evaluations of the returns to policy.

Karen Levy & Varna Sri Raman.  2018.  “Why (and When) We Test at Scale: No Lean Season and the Quest for Impact.”  Evidence Action blog.

No Lean Season, a late-stage program in the Beta incubation portfolio, provides small loans to poor, rural households for seasonal labor migration. Based on multiple rounds of rigorous research showing positive effects on migration and household consumption and income, the program was delivered and tested at scale for the first time in 2017. Performance monitoring revealed mixed results: program operations expanded substantially, but we observed some implementation challenges and take-up rates were lower than expected. An RCT-at-scale found that the program did not have the desired impact on inducing migration, and consequently did not increase income or consumption. We believe that implementation-related issues – namely, delivery constraints and mistargeting – were the primary causes of these results. We have since adjusted the program design to reduce delivery constraints and improve targeting.

Tom Pepinsky.  2018.  “The Return of the Single Country Case Study.”  SSRN.

This essay reviews the changing status of single country research in comparative politics, a field defined by the concept of comparison. An analysis of articles published in top general and comparative politics field journals reveals that single country research has evolved from an emphasis on description and theory generation to an emphasis on hypothesis testing and research design. This change is a result of shifting preferences for internal versus external validity combined with the quantitative and causal inference revolutions in the social sciences. A consequence of this shift is a change in substantive focus from macropolitical phenomena to micro-level processes, with consequences for the ability of comparative politics to address many substantive political phenomena that have long been at the center of the field.

Evan Lieberman.  2016.  “Can the Biomedical Research Cycle be a Model for Political Science?”  Perspectives on Politics.

In sciences such as biomedicine, researchers and journal editors are well aware that progress in answering difficult questions generally requires movement through a research cycle: Research on a topic or problem progresses from pure description, through correlational analyses and natural experiments, to phased randomized controlled trials (RCTs). In biomedical research all of these research activities are valued and find publication outlets in major journals. In political science, however, a growing emphasis on valid causal inference has led to the suppression of work early in the research cycle. The result of a potentially myopic emphasis on just one aspect of the cycle reduces incentives for discovery of new types of political phenomena, and more careful, efficient, transparent, and ethical research practices. Political science should recognize the significance of the research cycle and develop distinct criteria to evaluate work at each of its stages.