Q&A Week 4: Natural Experiments and Observational Studies

Table of Contents

Natural Experiments

In class you mentioned “Natural experiments based on geographical boundaries can be complicated by human factors”. Can you explain a bit more what this means?

Recall that the key assumption in a natural experiment design that ensures internal validity is that the treatment assignment is random or “as-if” random. In another word, we have to ask, is the treatment assignment correlated with any other factors that could potentially cause the observed difference between treatment and control group? If yes, then the assumption does not hold and the study’s internal validity is weakened. If no, then the assumption of “as-if” randomization holds.

In the study on whether money from lottery will increase happiness, the assumption is that the treatment (winning money from lottery) is randomly assigned among lottery buyers, hence whether someone is in the treatment group (lottery winners) or the control group (lottery losers) is not correlated with other factors that affects their happiness. In another word, treatment assignment (whether someone gets money) is independent of other confounding factors that could have affected the outcome (happiness).

In studies that leverage on geographical boundaries for natural experiment opportunities, the generic set-up is to compare Area A (treatment group) on one side of the geographical boundary that have received the treatment, with Area B (control group) on the other side of the boundary that have not received the treatment. This means that we have to ask, is the treatment assignment (being on one side of the boundary vs the other side) correlated with any other factors that could explain the difference in outcomes between Area A and Area B?

So what I meant by “natural experiments based on geographical boundaries can complicated by human factors“ was that, sometimes how the geographical boundaries are drawn, is not independent of the characteristics of the humans/political actors that draw these boundaries (i.e. the division introduced by the boundary is not random). If the reasons for how boundaries are drawn correlates with reasons that could explain the outcome, then the “as-if” randomization assumption would not hold.

Think about Posner (2004) we read for class, where Posner found that the relative size of the two ethnic groups (treatment) within each country explained why the cultural differences between the Chewa and Tumbuka ethnic groups are politically salient in Malawi but not in Zambia (outcome). He argued that the treatment assignment (being in a country where the two ethnic groups is relatively large vs relatively small) is “as-if“ random (assignment is uncorrelated with other factors that could explain the outcome), because “like many African borders, the one that separates Zambia and Malawi was drawn purely for [colonial] administrative purposes, with no attention to the distribution of groups on the ground” (Posner 2004: 530).

If however, the boundary that separates Zambia and Malawi are drawn for reasons that potentially correlate with factors affecting inter-group interaction (say for example, natural resource availability), then the treatment assignment is no long “as-if” random.

How would we know if the “as-if randomization” assumption is valid?

Since we have no control over the treatment assignment process in natural experiments, we cannot really “prove” whether this “as-if” randomization assumption is valid. All we can do is provide evidence to show that this assumption is plausible.

For example, we can rely on theory and background knowledge to make the case: assignment through lottery is plausibly random because we know how the winner are chose.

And for the Posner (2004) study, if there were some qualitative evidence (e.g. written records of how boundaries were decided) showing that the boundary was indeed “drawn purely for [colonial] administrative purposes, with no attention to the distribution of groups on the ground”, then that would be an important piece of evidence to support the “as-if” randomization claim.

We can also provide empirical evidence. Recall that randomly assignment treatment will give us comparable treatment and control groups, i.e. the groups on average, would be similar to each other in terms of any potential confounding variables. So we should expect that “as-if” randomization process should give us such comparable groups as well.

Researchers can measure the potential confounding variables and empirically test if the treatment and control groups are similar in those aspects. If we do not find any significant difference between the two groups in terms of those potential confounders, then that would be a piece of evidence supporting the “as-if” randomization assumption.

Observational studies

Is there any way to get rid of confounding variables in observational studies?

Confounding variables will always be present (we cannot “get rid of them” per se), but we can reduce the bias to our inference/conclusion introduced by any confounding variables.

Whenever we want to investigate if $X \rightarrow Y$, there will be confounding variables $Z$ lurking behind the scenes, that’s just the feature of the world we live in. These confounding variables will introduce bias to our inference, if we mistakenly conclude that the change in $Y$ is caused by $X$, while in fact the change in $Y$ was caused by $X$ and $Z$ (or $Z$ alone). This bias is often known by the jargon omitted variable bias.

When designing a study to investigate if $X \rightarrow Y$, one of our goals is to reduce any potential bias introduced by confounding variables, in order to isolate the effects of $X$ on $Y$ (how much of the change in $Y$ can be attributed to $X$, instead of $Z$).

Two common ways to reduce this bias in observational studies:

  1. Statistically adjusting/controlling for observable confounding variables (i.e. include the “omitted” confounding variables in the statistical model, at least for those we have the data for).
  2. If our data has multiple time points (i.e. panel data or time series data), statistically adjusting/controlling for observable and unobservable confounding variables by leveraging on the temporal nature of the data.

The jargon for these different techniques to isolate the effects of $X$ on $Y$ is “identification strategy” — strategies that help us to identify the effects of $X$ on $Y$. Randomized experiment, natural experiments, statistically adjusting for confounders are different types of identification strategies we can use.

How are longitudinal studies and cross-sectional studies different?

We have a longitudinal study if we have data for each unit at multiple time points, i.e. every unit is measured more than once. For example, a study on the effects of emergency events boosting presidential approval ratings (i.e. rally-the-flag effects) would be a longitudinal study (or more specifically, time series) — the unit of analysis is presidential approval ratings, and we have measures for this unit at multiple time points, before and after the emergency events.

A cross-sectional study is one where we only have data for each unit at one time points. If we were to examine whether partisanship affects how individuals evaluate the president’s response to a emergency event, say a devastating hurricane, using a survey conducted after the hurricane, then that would be a cross-sectional study — the unit of analysis is individual survey respondents, and we only have measures for the same person at one point in time (the time they responded to the survey).

Research Methods in Political Science
Supplemental course materials for Spring 2019.

Related

Next
Previous
comments powered by Disqus