Discussion 6: Estimating from a sample

Discussion 6: Estimating from a sample#

STATS 60 / STATS 160 / PSYCH 10

The discussion assignment for the week was to find an example of a study or survey with a small sample size.

Why is the sample size too small?
Roughly how large is the standard deviation?
Does the normal approximation apply? How confident are you in the estimated value?

![](image link here)

Random sampling can be used to estimate the population mean \(\mu\)
If \(x_1,\ldots,x_n\) are a random sample, then the sample mean \(\hat\mu_n = \frac{1}{n}\sum_{i=1}^n x_i\) is a good guess for \(\mu\)
The sample mean is a random quantity (because the sample is random)
The sample mean gets more accurate as the sample size increases (bigger is better)
The standard deviation of \(\hat\mu_n\) is \(\frac{1}{\sqrt{n}}\) times \(\sigma_x\), the population standard deviation of \(x_i\)
Most of the time, the error \(|\hat\mu_n - \mu| \le C \frac{\sigma_x}{\sqrt{n}}\) for \(C\) not too large, say \(C \le 3\)

For large \(n\), the distribution of \(\hat\mu_n\) becomes close to the normal distribution
The 68-95-99 rule lets us construct confidence intervals for \(\hat\mu_n\)
- We can be \(68\%\) confident that \(|\hat\mu_n - \mu| \le \frac{\sigma_x}{\sqrt{n}}\).
- We can be \(95\%\) confident that \(|\hat\mu_n - \mu| \le 2\frac{\sigma_x}{\sqrt{n}}\).
- We can be \(99.7\%\) confident that \(|\hat\mu_n - \mu| \le 3\frac{\sigma_x}{\sqrt{n}}\).
If you want to target a certain error level \(\epsilon = |\hat\mu_n - \mu|\) and confidence level \(1-\alpha\), you can use the Normal approximation to decide how large a sample to take.
Large sample sizes lead to smaller confidence intervals and higher levels of confidence

Poll for dice roll results

Now we will see what happens if we use a biased measurement

Poll for biased dice rolls

The sample mean of the dice roll got more accurate with the sample size
The number of people in the class is too small to see the normal approximation in the histogram
When using a biased measurement, the sample mean does not approximate the population mean
This website can roll lots and lots of dice for us