# Monthly Archives: September 2014

## Birthdays, CAS, Probability, and Student Creativity

Many readers are familiar with the very counter-intuitive Birthday Problem:

It is always fun to be in a group when two people suddenly discover that they share a birthday.  Should we be surprised when this happens?  Asked a different way, how large a group of randomly selected people is required to have at least a 50% probability of having a birthday match within the group?

I posed this question to both of my sections of AP Statistics in the first week of school this year.  In a quick poll, one section had a birthday match–two students who had taken classes together for a few years without even realizing what they had in common.  Was I lucky, or was this a commonplace occurrence?

Intrigue over this question motivated our early study of probability.  The remainder of this post follows what I believe is the traditional approach to the problem, supplemented by the computational power of a computer algebra system (CAS)–the TI Nspire CX CAS–available on each of my students’ laptops.

Initial Attempt:

Their first try at a solution was direct.  The difficulty was the number of ways a common birthday could occur.  After establishing that we wanted any common birthday to count as a match and not just an a priori specific birthday, we tried to find the number of ways birthday matches could happen for different sized groups.  Starting small, they reasoned that

• If there were 2 people in a room, there was only 1 possible birthday connection.
• If there were 3 people (A, B, and C), there were 4 possible birthday connections–three pairs (A-B, A-C, and B-C) and one triple (A-B-C).
• For four people (A, B, C, and D), they realized they had to look for pair, triple, and quad connections.  The latter two were easiest:  one quad (A-B-C-D) and four triples (A-B-C, A-B-D, A-C-D, and B-C-D).  For the pairs, we considered the problem as four points and looked for all the ways we could create segments.  That gave (A-B, A-C, A-D, B-C, B-D, and C-D).  These could also occur as double pairs in three ways (A-B & C-D, A-C & B-D, and A-D & B-C).  All together, this made 1+4+6+3=14 ways.

This required lots of support from me and was becoming VERY COMPLICATED VERY QUICKLY.  Two people had 1 connection, 3 people had 4 connections, and 4 people had 14 connections.  Tracking all of the possible connections as the group size expanded–and especially not losing track of any possibilities–was making this approach difficult.  This created a perfect opportunity to use complement probabilities.

While there were MANY ways to have a shared birthday, for every sized group, there is one and only one way to not have any shared birthdays–they all had to be different.  And computing a probability for a single possibility was a much simpler task.

We imagined an empty room with random people entering one at a time.  The first person entering could have any birthday without matching anyone, so $P \left( \text{no match with 1 person} \right) = \frac{365}{365}$ .  When the second person entered, there were 364 unchosen birthdays remaining, giving $P \left( \text{no match with 2 people} \right) = \frac{365}{365} \cdot \frac{364}{365}$, and $P \left( \text{no match with 3 people} \right) = \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365}$.  And the complements to each of these are the probabilities we sought:

$P \left( \text{birthday match with 1 person} \right) = 1- \frac{365}{365} = 0$
$P \left( \text{birthday match with 2 people} \right) = 1- \frac{365}{365} \cdot \frac{364}{365} \approx 0.002740$
$P \left( \text{birthday match with 3 people} \right) = 1- \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365} \approx 0.008204$.

The probabilities were small, but with persistent data entry from a few classmates, they found that the 50% threshold was reached with 23 people.

The hard work was finished, but some wanted to find an easier way to compute the solution.  A few students noticed that the numerator looked like the start of a factorial and revised the equation:

$\begin{matrix} \displaystyle P \left( \text{birthday match with n people} \right ) & = & 1- \frac{365}{365} \cdot \frac{364}{365} \dots \frac{(366-n)}{365} \\ \\ & = & 1- \frac{365 \cdot 364 \dots (366-n)}{365^n} \\ \\ & = & 1- \frac{365\cdot 364 \dots (366-n)\cdot (366-n-1)!}{365^n \cdot (366-n-1)!} \\ \\ & = & 1- \frac{365!}{365^n \cdot (365-n)!} \end{matrix}$

It was much simpler to plug in values to this simplified equation, confirming the earlier result.

Not everyone saw the “complete the factorial” manipulation, but one noticed in the first solution the linear pattern in the numerators of the probability fractions.  While it was easy enough to write a formula for the fractions, he didn’t know an easy way to multiply all the fractions together.  He had experience with Sigma Notation for sums, so I introduced him to Pi Notation–it works exactly the same as Sigma Notation, except Pi multiplies the individual terms instead of adding them.  On the TI-Nspire, the Pi Notation command is available in the template menu or under the calculus menu.

Conclusion:

I really like two things about this problem:  the extremely counterintuitive result (just 23 people gives a 50% chance of a birthday match) and discovering the multiple ways you could determine the solution.  Between student pattern recognition and my support in formalizing computation suggestions, students learned that translating different recognized patterns into mathematics symbols, supported by technology, can provide different equally valid ways to solve a problem.

Now I can answer the question I posed about the likelihood of me finding a birthday match among my two statistics classes.  The two sections have 15 and 21 students, respectively.  The probability of having at least one match is the complement of not having any matches.  Using the Pi Notation version of the solution gives

I wasn’t guaranteed a match, but the 58.4% probability gave me a decent chance of having a nice punch line to start the class.  It worked pretty well this time!

Extension:

My students are currently working on their first project, determining a way to simulate groups of people entering a room with randomly determined birthdays to see if the 23 person theoretical threshold bears out with experimental results.

## Monty Hall Continued

In my recent post describing a Monty Hall activity in my AP Statistics class, I shared an amazingly crystal-clear explanation of how one of my new students conceived of the solution:

If your strategy is staying, what’s your chance of winning?  You’d have to miraculously pick the money on the first shot, which is a 1/3 chance.  But if your strategy is switching, you’d have to pick a goat on the first shot.  Then that’s a 2/3 chance of winning.

Then I got a good follow-up question from @SteveWyborney on Twitter:

Returning to my student’s conclusion about the 3-door version of the problem, she said,

The fact that there are TWO goats actually can help you, which is counterintuitive on first glance.

Extending her insight and expanding the problem to any number of doors, including Steve’s proposed 1,000,000 doors, the more goats one adds to the problem statement, the more likely it becomes to win the treasure with a switching doors strategy.  This is very counterintuitive, I think.

For Steve’s formulation, only 1 initial guess from the 1,000,000 possible doors would have selected the treasure–the additional goats seem to diminish one’s hopes of ever finding the prize.  Each of the other 999,999 initial doors would have chosen a goat.  So if 999,998 goat-doors then are opened until all that remains is the original door and one other, the contestant would win by not switching doors iff the prize was initially randomly selected, giving P(win by staying) = 1/1000000.  The probability of winning with the switching strategy is the complement, 999999/1000000.

IN RETROSPECT:

My student’s solution statement reminds me on one hand how critically important it is for teachers to always listen to and celebrate their students’ clever new insights and questions, many possessing depth beyond what students realize.

The solution reminds me of a several variations on “Everything is obvious in retrospect.”  I once read an even better version but can’t track down the exact wording.  A crude paraphrasing is

The more profound a discovery or insight, the more obvious it appears after.

I’d love a lead from anyone with the original wording.

REALLY COOL FOOTNOTE:

Adding to the mystique of this problem, I read in the Wikipedia description that even the great problem poser and solver Paul Erdős didn’t believe the solution until he saw a computer simulation result detailing the solution.