Practice Quizzes#
Week 1 - Ballpark estimates#
There are three practice quizzes below, each with three questions. The prompt is the same for all three of them:
Ballpark estimates: For each of the quantities below, come up with a ballpark estimate. Be sure to explain your reasoning and show your work—you will be graded on how you broke up the problem and whether your reasoning made sense, rather than the accuracy of your estimate.
Practice Quiz #1#
How many cups of coffee are consumed on Stanford campus on an average Monday?
How many hours of human labor are spent on hair styling in the US in a year?
How many dice would it take to fill the STATS 60 classroom (room 200-02)?
Practice Quiz #2#
How many Honda Civics could you park on the quad (assuming that none of them are touching)?
How many hours of human labor are spent each year on correcting typos?
How many people in the U.S. have the letter “a” in their first name?
Practice Quiz 3#
How many gallons of milk are consumed in the Bay Area each year?
How long would it take you to empty the fountain in white plaza using only a teaspoon?
How many textbooks are bought by Stanford students each year?
Week 2 - Intro to probability#
Practice Quiz # 1#
There is no need to express your answer in fully simplified form; make sure to show your work, so we can understand how you got to your conclusion.
Model the following as a probabilistic experiment (state your assumptions), describe the set of outcomes and state whether all outcomes are equally likely:
Alice and Bob people play \(3\) rounds of rock-paper-scissors, keeping track of whether Alice wins, loses, or ties in every round.
I flip a coin with heads probability \(p\) a total of \(n\) times. What is the probability that I get \(k\) heads?
I put \(2m\) socks into \(n\) drawers by putting each one in a uniformly random drawer independently. What is the probability that none of the socks end up in the same drawer as their pair?
Practice Quiz 1 Solutions
The possible outcomes are ordered tuples \((R_1,R_2,R_3)\) where \(R_1,R_2,R_3 \in \{W,L,T\}\) are the results from each round. Here \(W\) corresponds to a win for Alice, \(L\) corresponds to a loss for Alice and \(T\) corresponds to a tie.
If we assume that Alice and Bob are equally likely to pick rock, paper or scissors, then each outcome is equally likely. There are \(3^3 = 27\) possible outcomes in total.
A specific string with \(k\) heads and \(n-k\) tails has probability \(p^k(1-p)^{n-k}\). There are \(\binom{n}{k}\) such strings, because we choose the location of the \(k\) heads in the sequence. Therefore,
\[\mathrm{Pr}[k \text{ heads}] = \binom{n}{k}p^k(1-p)^{n-k} \]Let \(A\) be the event that none of the socks end up in the same drawer as their pair. For a single pair, the probability that the two socks end up in different drawers is \(\frac{n-1}{n}\). This is because there are \(n\) possible drawers for the first sock. And, for the second sock to be in a different drawer, there are \(n-1\) possible drawers.
Since there are \(m\) pairs, the probability that none of the pairs end up in the same drawer is
\[\mathrm{Pr}[A] = \left(\frac{n-1}{n}\right)^m \]
Practice Quiz #2#
There is no need to express your answer in fully simplified form; make sure to show your work, so we can understand how you got to your conclusion.
Model the following as a probabilistic experiment (state your assumptions), describe the set of outcomes and state whether all outcomes are equally likely:
A professor calls on two volunteers from a 30 person class.
I roll \(n\) fair 6-sided dice. What is the probability that at least one of them lands on \(1\)?
You have a hat containing the numbers \(1,2,\ldots,10\). You draw three cards out of the hat without replacement. What is the probability that you pick 3 consecutive numbers, \(i,i+1,i+2\)?
Practice Quiz 2 Solutions
We will assume that the professor calls on the two volunteers randomly. The possible outcomes are all ordered pairs \((V_1,V_2)\) where \(V_1\) and \(V_2\) are in \(\{1,\ldots,30\}\) and \(V_1 \neq V_2\). Each pair is equally likely.
This is similar to the ‘bag of marbles’ example. The ‘bag’ is the class which contains 30 marbles corresponding to the 30 students. We are assuming that to select the volunteers, the teacher pulls out two marbles without replacement.
Let \(A\) be the event that at least one of the dice lands on \(1\). By the complement rule we have
\[\mathrm{Pr}[A] = 1-\mathrm{Pr}[\bar{A}] = 1 -\mathrm{Pr}[\text{no dice lands on $1$}] \]The probability that a single dice does not land on heads is \(5/6\). The probability that none of the \(n\) dice land on heads is therefore \((5/6)^n\). The final answer is \(\mathrm{Pr}[A] = 1-(5/6)^n\).
The possible outcomes are ordered tuples \((X_1,X_2,X_3)\) where \(X_1,X_2,X_3\) are all distinct numbers between \(1\) and \(10\) and all outcomes are equally likely. The number of such outcomes is \(10\times 9\times 8\). Let \(A\) be the event that the numbers are consecutive. Since all outcomes are equally likely
\[\mathrm{Pr}[A] = \frac{\text{number of outcomes in } A}{10\times 9\times 8}\]We will now count the number of outcomes in \(A\). For \((X_1,X_2,X_3)\) to be in \(A\) we must have \(X_1\le 8\). If \(X_1\) was equal to \(9\) or \(10\), then \(X_2\) and \(X_2\) would be too big. This means that there are \(8\) choices for \(X_1\). Once \(X_1\) is chosen we must have \(X_2=X_1+1\) and \(X_3 =X_1+2\). This means that \(X_1\) determines \(X_2\) and \(X_3\). Therefore, there are \(8\) outcomes in \(A\) and so
\[\mathrm{Pr}[A] = \frac{8}{10\times 9\times 8} = \frac{1}{10\times 9}=\frac{1}{90}\]
Practice Quiz #3#
There is no need to express your answer in fully simplified form; make sure to show your work, so we can understand how you got to your conclusion.
Model the following as a probabilistic experiment (state your assumptions), describe the set of outcomes and state whether all outcomes are equally likely:
You poll \(k\) Stanford students about whether they agree with the statement, “a hot dog is a type of sandwich.”
If you flip a fair coin \(5\) times, is the probability that you see the sequence \(H,T,H,T,H\) larger than the probability that you see the sequence \(H,H,H,H,H\)?
What is the probability that a random \(n\)-letter word is a palindrome (meaning it reads the same left-to-right as right-to-left) when \(n\) is odd?
Practice Quiz 3 Solutions
There are a few possible approaches to this question.
a) We can model the set of outcomes as unordered \(\{O_{i_1},O_{i_2},\ldots,O_{i_k}\}\) where \(O_i\) is the opinion of the \(i\)th student, and \(i_1,\ldots,i_k\) is any subset of size \(k\) of \(\{1,\ldots,n\}\). Here all of the outcomes are equally likely.
a) We can model the set of outcomes as unordered \(\{O_{i_1},O_{i_2},\ldots,O_{i_k}\}\) where \(O_i\) is the opinion of the \(i\)th student, and \(i_1,\ldots,i_k\) is any subset of size \(k\) of \(\{1,\ldots,n\}\), and \(n\) is the total number of Stanford students. Here all of the outcomes are equally likely.
b) We can model the set of outcomes as \(X \in \{0,1,\ldots,k\}\) where \(X\) is the number of students who said yes to the question. In this case, the outcomes are not all equally likely.
No. When flipping a fair coin, all sequences of heads and tails are equally likely. Both \(H,T,H,T,H\) and \(H,H,H,H,H\) have probability \(\frac{1}{2^5}\).
We will assume that the random word is generated by choosing random letters from the \(26\) letter English alphabet. The number of random \(n\) letter words is \(26^n\). Since each random word is equally likely, the probability that the word is a palindrome is
\[\mathrm{Pr}[\text{palindrome}] = \frac{\text{number of palindromes}}{26^n}\]To count the number of palindromes, consider first the case when \(n=5\). For the word to be a palindrome, the first and fifth letters must be the same, the second and fourth letters must be the same, and the third letter can be anything. Since there are \(26\) letters to choose from, the number of palindromes must be \(26^3\).
Now consider general \(n\) with \(n\) odd. For the word to be a palindrome, we can again pair up the first and last letter, the second and second last letter and so on. The middle letter gets paired with itself. The total number of choices is \(26^{(n+1)/2}\).
The probability of a palindrome is
\[\mathrm{Pr}[\text{palindrome}] = \frac{26^{(n+1)/2}}{26^n}=\frac{1}{26^{(n-1)/2}}\]
Week 3 - Conditional probability#
Practice Quiz #1#
11% of the U.S. population lives in California.
7% of people incarcerated in the United States are Californian.
Let \(C\) be the event that an individual is Californian, and let \(I\) be the event that an individual is incarcerated in the U.S.
Phrase the above statistic in the language of conditional probabilities.
What would you expect to be higher: \(\Pr[C \mid I]\), or \(\Pr[I \mid C]\)? Why?
Identify the flaw in the following statement, and explain the flaw using the language of conditional probabilities:
“Since 7% of incarcerated individuals are Californians, and there are 50 states, Californians are more likely to be incarcerated than citizens of other states!”
Practice Quiz 1 Solutions
\(\Pr[C \mid I] = 0.07\).
The fraction of Californians incarcerated \(\Pr[I \mid C]\) should be much smaller than the fraction of incarcerated people who are Californian, \(\Pr[C \mid I]\). We’d expect \(\Pr[C \mid I]\) to be on the same order as \(\Pr[C]\), and \(\Pr[I \mid C]\) to be on the same order as \(\Pr[I]\); the fraction of people incarcerated, \(\Pr[I]\), should be much smaller than the fraction of Californians.
Even though California is only 1/50 states, it actually contains 11% of the US population, so this argument ignores the base rate of being Californian.
In fact, though we wouldn’t expect you to reproduce the following math on a quiz,
Which, multiplying both sides by \(\Pr[I]\) and dividing by \(\Pr[C]\), gives
So actually, the probability of being incarcerated is lower, conditioned on being Californian.
Practice Quiz # 2#
17% of NBA players are at least 7 ft.
Phrase the statistic above in the language of conditional probabilities.
What do you think is larger, the number of NBA players or the number of people more than 7ft tall?
Identify the flaw in the following statement, and explain the flaw using the language of conditional probabilities:
“Wow, you’re more than 7ft tall! Are you a professional basketball player?”
Practice Quiz 2 Solutions
Let \(H\) be the event of being at least 7ft tall, the \(B\) be the event of being in the NBA. The statistic above says that \(\Pr[H \mid B] = 0.17\).
The number of people over 7ft tall is small, but it is probably much larger than the number of NBA players (a Fermi estimate indicates that there are probably around 20 x 30 NBA players).
This confuses \(\Pr[H \mid B]\) with \(\Pr[B \mid H]\). Even though \(\Pr[H \mid B]\) is large, \(\Pr[B \mid H]\) is still very small, as there are so few professional basketball players.
Practice Quiz #3#
A classroom of 28 students is evenly split between seniors, juniors, sophomores and first-years. There are four English majors in the class; two are juniors and two are first-years.
Choose a student uniformly at random from the class; let \(E\) be the event that the student is an English major, and let \(F\) be the event that the student is a first year.
Describe \(\Pr[E \mid F]\) in plain English.
What is larger, \(\Pr[E \mid F]\) or \(\Pr[E \mid \overline{F}]\)?
The class takes an “anonymized” survey. One of the questions on the survey is “what is your major?” and another question is “what is your class year?.” Explain the flaw in the following statement by the course instructor using the language of conditional probability:
“The survey is anonymous because there are 7 of you in each year, so even if I know your class year, I only have a 1/7 chance of guessing who you are.”
Practice Quiz 3 Solutions
This is the chance that if you choose a first-year uniformly at random, they will be an English major.
\(\Pr[E \mid F] = \frac{\Pr[E \cap F]}{\Pr[F]} = \frac{2}{7}\), while \(\Pr[E \mid \overline{F}] = \frac{\Pr[E \cap \overline{F}]}{\Pr[\overline{F}]} = \frac{2}{21}\), so \(\Pr[E \mid F]\) is larger.
The flaw is that the instructor will also know the class year; there are only two English majors in a year, so conditioned on all available information of both major and class year the instructor might have a 1/2 chance of guessing who the student is.
Week 4 - Data Visualization#
Practice Quiz #1#
What kind of graphic would you use to visually summarize the results of a Yes/No poll, and why?
A team of medical researchers has done a survey of 1000 patients and has collected the BMI and cholesterol of each. What kind of graphic would you use to visually summarize this data, and why?
Critique the following graphic visualization of data: could it have been better represented in a different format? Is there anything misleading about the visualization?
Practice Quiz 1 Solutions
A pie chart or a bar chart. Since there are only two responses, each person can only select one answer, and the pie chart often gives a relatively clear sense of which proportion picked which answer. A bar chart could also work well, and humans are better at judging bar heights than angles.
A scatterplot; this allows us to visually see if there is an association between these two parameters.
A not-as-good (but reasonable in some circumstances) answer would be a pair of histograms (on different axes); this would give us a sense of how each individual variable (BMI and cholesterol) are distributed in the patient population.
When it is presented as a bar chart, it makes it seem like each bar is comparable to all of the other bars, but here they are in two categories, year + age. The data measures how a single measurement changes over time, and could be better presented as 4 separate time series, one for each age group. That would also make the data easier to compare.
Practice Quiz # 2#
The registrar reports that the average Stanford GPA is 3.45 for first-years, 3.51 for second-years, 3.58 for third-years, and 3.69 for fourth-years (in the interest of full disclosure, I actually made this data up). What kind of graphic would you use to visually summarize this data, and why?
A city monitors wastewater for infectious diseases. Suppose city officials take weekly measurements of the concentration of bird flu virus in the wastewater. What kind of graphic would you use to visually summarize this data, and why?
Critique the following graphic visualization of data: could it have been better represented in a different format? Is there anything misleading about the visualization?
Practice Quiz 2 Solutions
A bar chart, because we have a numerical quantity and we are comparing its value across a number of different categories.
A time series. We are tracking how a single measurement evolves over time.
This visualization is misleading because it is a bar chart where (a) the not-colored-in part of the bar actually represents how large the quantity is, and (b) even if the bars were properly colored in, the value for bars does not start at 0, and the right-hand-side is set to make the most expensive option look best.
Practice Quiz #3#
You survey 1000 people and you ask them for (a) their annual salary and (b) how many days of vacation they take per year. What kind of graphic would you use to visually summarize this data, and why?
An ecologist has taken measurements of the beak length of 1000 randomly captured birds (all of the same species). What kind of graphic would you use to visually summarize this data, and why?
Critique the following graphic visualization of data: could it have been better represented in a different format? Is there anything misleading about the visualization?
Practice Quiz 3 Solutions
A scatterplot; this allows us to visually see if there is an association between these two parameters.
A not-as-good (but reasonable in some circumstances) answer would be a pair of histograms (on different axes); this would give us a sense of how each individual variable (salary and vacation days) are distributed in the patient population.
A histogram, because this is a result of a numerical sample and we will get a sense of how beak length is distributed in the population.
The type of quantities plotted in this time series are all of the same category: military expenditure by various contries. Yet, the Y-axes differ on the left and right hand sides, and every country except the U.S. is plotted according to the left axis, whereas the US is plotted according to the right axis. Also the right axis does not start at zero. The U.S. line would actually start above the top of the chart if it were plotted according to the left axis.