Lecture 22: Randomized Experiments

Lecture 22: Randomized Experiments#

STATS 60 / STATS 160 / PSYCH 10

Concepts and Learning Goals:

Experimental Design
- Correlation vs. causation
- Treatment and Control groups
- Confounding variables
- Observational studies
- Randomized controlled trials (RCTs)

Retrieval Practice#

In today’s lesson, we will examine a strategy that may help you learn, called retrieval practice.
Retrieval practice is the strategy of recalling facts or concepts from memory.
- The act of retrieving something from your memory strengthens the connections in your brain, making it more likely that you’ll be able to recall it in the future.
- This is why quizzes and tests help you learn.
How do we know that retrieval practice works?

One Possible Study#

Suppose we do lots of retrieval practice in STATS 60 and the average on the exam is 90%.
Is this convincing evidence that retrieval practice works?

NO!

Maybe the students are good and would have done just as well on the exam without the retrieval practice.
Maybe the exam was just easy.

We need to compare the treatment group that did retrieval practice to a control group that didn’t.

A Controlled Study#

Suppose we use the students in STATS 60 last year, who didn’t do retrieval practice, as a control group.
They took the same exam, and their average on the exam was 75%.
So the group that did retrieval practice scored 15 percentage points higher than the group that didn’t. Are you convinced now that retrieval practice causes more learning?

NO! “Correlation does not imply causation.”

Maybe the students this year are stronger than the students last year.
Maybe the instructors this year are better than the instructors last year.

The problem is that the two groups are not comparable (in ways that affect the outcome).

Comparable Groups#

In summary, to determine if retrieval practice causes students to learn more, we need:

two groups, one that does retrieval practice and another that doesn’t,
that are comparable with respect to all other variables that affect the outcome.

If two groups differ in another variable that affects the outcome, that variable is called a confounding variable.

We can conclude causality if all confounding variables have been eliminated from the comparison.

Designing Comparable Groups#

A study that compares groups that already exist is called an observational study.

In general, the treatment and control groups in an observational study are not comparable, so it is difficult to infer causality.

The simplest way to ensure that groups are comparable (and eliminate confounding variables) is to design them that way.

A study that assigns subjects to groups is called an experiment.

Experiments#

How should we assign subjects to groups so that the groups are comparable?

Idea 1

Record variables for each subject, like year, sex, major, etc.
Then, manually divide the subjects into two groups so that there are exactly the same proportion of students in each year, of each sex, in each major, etc. in the two groups.

Unfortunately, this does not guarantee that the two groups will be balanced with respect to variables we did not record.

Experiments#

How should we assign subjects to groups so that the groups are comparable?

Idea 2

Randomly assign subjects to the two groups.

Now, the two groups are expected to be balanced with respect to all variables, even the ones we did not record.

This is called a randomized (controlled) experiment and is the gold standard for causal inference.

A Randomized Experiment#

Let’s do a randomized experiment to determine whether retrieval practice benefits learning.

First, we have to randomize students to the control and treatment groups. How do we do that?

Count off 1, 2, 1, 2, 1, 2, … until all students are assigned to a group?
Feed in the names of the students in the room to ChatGPT and ask it to divide the names into two groups?

No, none of these options guarantees randomness.

Count off 1, 2, 3, …. Remember your number!
We will use a proper random number to choose half of these numbers at random. These people will be in the treatment group.

Study Protocol#

Take 5 minutes to read the text on the handout.
Now, depending on which group you are in, take 8 minutes to do the following:
- Control: Take notes on the text as you normally would.
- Treatment: Turn the page over and write down as much as you can remember.
Now, take another 5 minutes to read the text again.
Depending on which group you are in, take 8 minutes to do the following:
- Control: Add to your notes.
- Treatment: Turn the page over and write down as much as you can remember. Try as hard as you can to recall even more information this time.
On Wednesday, we’ll see how much of this text you learned!

A story from my own research#

Suppose we did a randomized experiment and want an estimate of the “treatment effect”: $$ \tau = \text{expected outcome in treated group}- \text{expected outcome in control group} $$
A common choice is the difference-in-means estimator, which computes $$ \hat{\tau}=\text{average outcome in treated group} - \text{average outcome in control group} $$
In fact, $\hat\tau -\tau$ is approximately $N(0, \sigma^2/n)$ when sample size $n$ is large
Thus, $\hat\tau\pm 2\sigma/\sqrt{n}$ is a 95% confidence interval for $\tau$

This is from a real-life randomized trial, studying the electoral impact of Progresa, Mexico’s conditional cash transfer program (govt. welfare programs that provide residents in poor communities with funds as long as they meet cetain conditions)
Outcome: support rates for the incumbent party in 2000 Mexican presidential election, relative to previous election
Units: villages. Treatment: ‘early’ cash transfer (21 months vs 6 months)
Causal question: Do these early cash transfer increase support for the incumbent party in next election?

Back to the our research#

When outcomes have “heavy-tails” or contains extreme observations, the difference-in-means confidence interval can be quite wide.
If there are two confidence intervals, both valid, we prefer the shorter one.
We showed that using ranks instead of the original outcomes, we can make the confidence interval shorter.

	estimate	std. err	Conf. interval	CI length
diff-in-means	3.62	1.73	[0.23, 7.01]	6.77
rank-based	1.83	0.45	[0.96, 2.71]	1.75

Key idea: we pool all the samples, rank the outcomes, and look at the sum of the ranks for the treated units. If there is some treatment effect, this would behave differently as it would when there is no treatment effect.
This rank-based method was originally proposed by Paul Rosenbaum. We derived its large-sample properties for randomized trials.

Another story : observational study#

Causal question: does a majority in vote share in last election (X-axis) gives the democratic party an advantage in the next election (Y-axis)?

Outcome: democratic vote share in current election. Treatment group: precincts where democratic vote share > republican vote share in last election

Key idea: Units near the cutoff are similar, comparing them makes sense.
We fit regression lines (you will learn more about this next week) on either sides of the cutoff and take the difference at the cutoff.
Existing methods were already doing the above. We provided a new method of obtaining confidence interval that beats the existing ones in length.

Lecture 22: Randomized Experiments

Contents

Lecture 22: Randomized Experiments#

Retrieval Practice#

One Possible Study#

A Controlled Study#

Comparable Groups#

Designing Comparable Groups#

Experiments#

Experiments#

A Randomized Experiment#

Study Protocol#

A story from my own research#

Back to the our research#

Another story : observational study#