Marilyn vos Savant Conditional Probability Follow Up

In the Marilyn vos Savant problem I posted yesterday, I focused on the subtle shift from simple to conditional probability the writer of the question appeared to miss.  Two of my students took a different approach.

The majority of my students, typical of AP Statistics students’ tendencies very early in the course, tried to use a “wall of words” to explain away the discrepancy rather than providing quantitative evidence.  But two fully embraced the probabilities and developed the following probability tree to incorporate all of the given probabilities.  Each branch shows the probability of a short or long straw given the present state of the system.  Notice that it includes both of the apparently confounding 1/3 and 1/2 probabilities.


The uncontested probability of the first person is 1/4.

The probability of the second person is then (3/4)(1/3) = 1/4, exactly as expected.  The probabilities of the 3rd and 4th people can be similarly computed to arrive at the same 1/4 final result.

My students argued essentially that the writer was correct in saying the probability of the second person having the short straw was 1/3 in the instant after it was revealed that the first person didn’t have the straw, but that they had forgotten to incorporate the probability of arriving in that state.  When you use all of the information, the probability of each person receiving the short straw remains at 1/4, just as expected.

Marilyn vos Savant and Conditional Probability

The following question appeared in the “Ask Marilyn” column in the August 16, 2015 issue of Parade magazine.  The writer seems stuck between two probabilities.


(Click here for a cleaned-up online version if you don’t like the newspaper look.)

I just pitched this question to my statistics class (we start the year with a probability unit).  I thought some of you might like it for your classes, too.

I asked them to do two things.  1) Answer the writer’s question, AND 2) Use precise probability terminology to identify the source of the writer’s conundrum.  Can you answer both before reading further?


Very briefly, the writer is correct in both situations.  If each of the four people draws a random straw, there is absolutely a 1 in 4 chance of each drawing the straw.  Think about shuffling the straws and “dealing” one to each person much like shuffling a deck of cards and dealing out all of the cards.  Any given straw or card is equally likely to land in any player’s hand.

Now let the first person look at his or her straw.  It is either short or not.  The author is then correct at claiming the probability of others holding the straw is now 0 (if the first person found the short straw) or 1/3 (if the first person did not).  And this is precisely the source of the writer’s conundrum.  She’s actually asking two different questions but thinks she’s asking only one.

The 1/4 result is from a pure, simple probability scenario.  There are four possible equally-likely locations for the short straw.

The 0 and 1/3 results happen only after the first (or any other) person looks at his or her straw.  At that point, the problem shifts from simple probability to conditional probability.  After observing a straw, the question shifts to determining the probability that one of the remaining people has the short straw GIVEN that you know the result of one person’s draw.

So, the writer was correct in all of her claims; she just didn’t realize she was asking two fundamentally different questions.  That’s a pretty excusable lapse, in my opinion.  Slips into conditional probability are often missed.

Perhaps the most famous of these misses is the solution to the Monty Hall scenario that vos Savant famously posited years ago.  What I particularly love about this is the number of very-well-educated mathematicians who missed the conditional and wrote flaming retorts to vos Savant brandishing their PhDs and ultimately found themselves publicly supporting errant conclusions.  You can read the original question, errant responses, and vos Savant’s very clear explanation here.


Probability is subtle and catches all of us at some point.  Even so, the careful thinking required to dissect and answer subtle probability questions is arguably one of the best exercises of logical reasoning around.


As a completely different connection, I think this is very much like Heisenberg’s Uncertainty Principle.  Until the first straw is observed, the short straw really could (does?) exist in all hands simultaneously.  Observing the system (looking at one person’s straw) permanently changes the state of the system, bifurcating forever the system into one of two potential future states:  the short straw is found in the first hand or is it not.

CORRECTION (3 hours after posting):

I knew I was likely to overstate or misname something in my final connection.  Thanks to Mike Lawler (@mikeandallie) for a quick correction via Twitter.  I should have called this quantum superposition and not the uncertainty principle.  Thanks so much, Mike.

SBG and AP Statistics Update

I’ve continued to work on my Standards for AP Statistics and after a few conversations with colleagues and finding this pdf of AP Statistics Standards, I’ve winnowed down and revised my Standards to the point I’m comfortable using them this year.

Following is the much shorter document I’m using in my classes this year.  They address the AP Statistics core content as well as the additional ideas, connections, etc. I hope my students learn this year.  As always, I welcome all feedback, and I hope someone else finds these guides helpful.

SBG and Statistics

I’ve been following Standards-Based Grading (SBG) for several years now after first being introduced to the concept by colleague John Burk (@occam98).  Thanks, John!

I finally made the dive into SBG with my Summer School Algebra 2 class this past June & July, and I’ve fully committed to an SBG pilot for my AP Statistics classes this year.

I found writing standards for Algebra 2 this summer relatively straightforward.  I’ve taught that content for decades now and know precisely what I want my students to understand.  I needed some practice writing standards and got better as the summer class progressed.  Over time, I’ve read several teachers’ versions of standards for various courses.  But writing standards for my statistics class prove MUCH more challenging.  In the end, I found myself guided by three major philosophies.

  1. The elegance and challenge of well designed Enduring Understandings from the Understanding by Design (UbD) work of Jay McTighe the late Grant Wiggins helped me craft many of my standards as targets for student learning that didn’t necessarily reveal everything all at once.
  2. The power of writing student-centered “I can …” statements that I learned through my colleague Jill Gough (@jgough) has become very important in my classroom design.  I’ve become much more focused on what I want my students (“learners” in Jill’s parlance) to be able to accomplish and less about what I’m trying to deliver.  This recentering of my teaching awareness has been good for my continuing professional development and was a prime motivator in writing these Standards.
  3. I struggled throughout the creation of my first AP Statistics standards document to find a balance between too few very broad high-level conceptual claims and a far-too-granular long list of skill minutiae.  I wanted more than a narrow checklist of tiny skills and less than overloaded individual standards that are difficult for students to satisfy.  I want a challenging, but reachable bar.

So, following is my first attempt at Standards for my AP Statistics class, and I’ll be using them this year.  In sharing this, I have two hopes:

  • Maybe some teacher out there might find some use in my Standards.
  • More importantly, I’d LOVE some feedback from anyone on this work.  It feels much too long to me, but I wonder if it is really too much or too little.  Have I left something out?

At some point, all work needs a public airing to improve.  That time for me is now.  Thank you in advance on behalf of my students for any feedback.

Chemistry, CAS, and Balancing Equations

Here’ s a cool application of linear equations I first encountered about 20 years ago working with chemistry colleague Penney Sconzo at my former school in Atlanta, GA.  Many students struggle early in their first chemistry classes with balancing equations.  Thinking about these as generalized systems of linear equations gives a universal approach to balancing chemical equations, including ionic equations.

This idea makes a brilliant connection if you teach algebra 2 students concurrently enrolled in chemistry, or vice versa.


Consider burning ethanol.  The chemical combination of ethanol and oxygen, creating carbon dioxide and water:

C_2H_6O+3O_2 \longrightarrow 2CO_2+3H_2O     (1)

But what if you didn’t know that 1 molecule of ethanol combined with 3 molecules of oxygen gas to create 2 molecules of carbon dioxide and 3 molecules of water?  This specific set coefficients (or multiples of the set) exist for this reaction because of the Law of Conservation of Matter.  While elements may rearrange in a chemical reaction, they do not become something else.  So how do you determine the unknown coefficients of a generic chemical reaction?

Using the ethanol example, assume you started with

wC_2H_6O+xO_2 \longrightarrow yCO_2+zH_2O     (2)

for some unknown values of w, x, y, and z.  Conservation of Matter guarantees that the amount of carbon, hydrogen, and oxygen are the same before and after the reaction.  Tallying the amount of each element on each side of the equation gives three linear equations:

Carbon:  2w=y
Hydrogen:  6w=2z
Oxygen:  w+2x=2y+z

where the coefficients come from the subscripts within the compound notations.  As one example, the carbon subscript in ethanol ( C_2H_6O ) is 2, indicating two carbon atoms in each ethanol molecule.  There must have been 2w carbon atoms in the w ethanol molecules.

This system of 3 equations in 4 variables won’t have a unique solution, but let’s see what my Nspire CAS says.  (NOTE:  On the TI-Nspire, you can solve for any one of the four variables.  Because the presence of more variables than equations makes the solution non-unique, some results may appear cleaner than others.  For me, w was more complicated than z, so I chose to use the z solution.)


All three equations have y in the numerator and denominators of 2.  The presence of the y indicates the expected non-unique solution.  But it also gives me the freedom to select any convenient value of y I want to use.  I’ll pick y=2 to simplify the fractions.  Plugging in gives me values for the other coefficients.


Substituting these into (2) above gives the original equation (1).


Traditionally, chemists write these equations with the lowest possible natural number coefficients, but thinking of them as systems of linear equations makes another reality obvious.  If 1 molecule of ethanol combines with 3 molecules of hydrogen gas to make 2 molecules of carbon dioxide and 3 molecules of water, surely 10 molecule of ethanol combines with 30 molecules of hydrogen gas to make 20 molecules of carbon dioxide and 30 molecules of water (the result of substituting y=20 instead of the y=2 used above).

You could even let y=1 to get z=\frac{3}{2}, w=\frac{1}{2}, and x=\frac{3}{2}.  Shifting units, this could mean a half-mole of ethanol and 1.5 moles of hydrogen make a mole of carbon dioxide and 1.5 moles of water.  The point is, the ratios are constant.  A good lesson.


Now let’s try a harder one to balance:  Reacting carbon monoxide and hydrogen gas to create octane and water.

wCO + xH_2 \longrightarrow y C_8 H_{18} + z H_2 O

Setting up equations for each element gives

Carbon:  w=8y
Oxygen:  w=z
Hydrogen:  2x=18y+2z

I could simplify the hydrogen equation, but that’s not required.  Solving this system of equations gives


Nice.  No fractions this time.  Using y=1 gives w=8, x=17, and z=8, or

8CO + 17H_2 \longrightarrow C_8 H_{18} + 8H_2 O



Now let’s balance an ionic equation with unknown coefficients a, b, c, d, e, and f:

a Ba^{2+} + b OH^- + c H^- + d PO_4^{3-} \longrightarrow eH_2O + fBa_3(PO_4)_2

In addition to writing equations for barium, oxygen, hydrogen, and phosphorus, Conservation of Charge allows me to write one more equation to reflect the balancing of charge in the reaction.

Barium:  a = 3f
Oxygen:  b +4d = e+8f
Hydrogen:  b+c=2e
Phosphorus:  d=2f
CHARGE (+/-):  2a-b-c-3d=0

Solving the system gives


Now that’s a curious result.  I’ll deal with the zeros in a moment.  Letting d=2 gives f=1 and a=3, indicating that 3 molecules of ionic barium combine with 2 molecules of ionic phosphate to create a single uncharged molecule of barium phosphate precipitate.

The zeros here indicate the presence of “spectator ions”.  Basically, the hydroxide and hydrogen ions on the left are in equal measure to the liquid water molecule on the right.  Since they are in equal measure, one solution is

3Ba^{2+}+6OH^- +6H^-+2PO_4^{3-} \longrightarrow 6H_2O + Ba_3(PO_4)_2


You still need to understand chemistry and algebra to interpret the results, but combining algebra (and especially a CAS) makes it much easier to balance chemical equations and ionic chemical equations, particularly those with non-trivial solutions not easily found by inspection.

The minor connection between science (chemistry) and math (algebra) is nice.

As many others have noted, CAS enables you to keep your mind on the problem while avoiding getting lost in the algebra.

Measuring Calculator Speed

Two weeks ago, my summer school Algebra 2 students were exploring sequences and series.  A problem I thought would be a routine check on students’ ability to compute the sum of a finite arithmetic series morphed into an experimental measure of the computational speed of the TI-Nspire CX handheld calculator.  This experiment can be replicated on any calculator that can compute sums of arithmetic series.


Teaching this topic in prior years, I’ve found that sometimes students have found series sums by actually adding all of the individual sequence terms.  Some former students have solved problems involving  addition of more than 50 terms, in sequence order, to find their sums.  That’s a valid, but computationally painful approach. I wanted my students to practice less brute-force series manipulations.  Despite my intentions, we ended up measuring brute-force anyway!

Readers of this ‘blog hopefully know that I’m not at all a fan of memorizing formulas.  One of my class mantras is

“Memorize as little as possible.  Use what you know as broadly as possible.”

Formulas can be mis-remembered and typically apply only in very particular scenarios.  Learning WHY a procedure works allows you to apply or adapt it to any situation.


Not wanting students to add terms, I allowed use of their Nspire handheld calculators and asked a question that couldn’t feasibly be solved without technological assistance.

The first two terms of a sequence are t_1=3 and t_2=6.  Another term farther down the sequence is t_k=25165824.

A)  If the sequence is arithmetic, what is k?

B)  Compute \sum_{n=1}^{k}t_n where t_n is the arithmetic sequence defined above, and k is the number you computed in part A.

Part A was easy.  They quickly recognized the terms were multiples of 3, so t_k=25165824=3\cdot k, or k=8388608.

For Part B, I expected students to use the Gaussian approach to summing long arithmetic series that we had explored/discovered the day before.   For arithmetic series, rearrange the terms in pairs:  the first with last, the second with next-to-last, the third with next-to-next-to-last, etc..  Each such pair will have a constant sum, so the sum of any arithmetic series can be computed by multiplying that constant sum by the number of pairs.

Unfortunately, I think I led my students astray by phrasing part B in summation notation.  They were working in pairs and (unexpectedly for me) every partnership tried to answer part B by entering \sum_{n=1}^{838860}(3n) into their calculators.  All became frustrated when their calculators appeared to freeze.  That’s when the fun began.

Multiple groups began reporting identical calculator “freezes”; it took me a few moments to realize what what happening.  That’s when I reminded students what I say at the start of every course:  Their graphing calculator will become their best, most loyal, hardworking, non-judgemental mathematical friend, but you should have some concept of what you are asking it to do.  Whatever you ask, the calculator will diligently attempt to answer until it finds a solution or runs out of energy, no matter how long it takes.  In this case, the students had asked their calculators to compute values of 8,388,608 terms and add them all up.  The machines hadn’t frozen; they were diligently computing and adding 8+ million terms, just as requested.  Nice calculator friends!

A few “Oh”s sounded around the room as they recognized the enormity of the task they had absentmindedly asked of their machines.  When I asked if there was another way to get the answer, most remembered what I had hoped they’d use in the first place.  Using a partner’s machine, they used Gauss’s approach to find \sum_{n=1}^{8388608}(3n)=(3+25165824)\cdot (8388608/2)=105553128849408 in an imperceptable fraction of a second.  Nice connections happened when, minutes later, the hard-working Nspires returned the same 15-digit result by the computationally painful approach.  My question phrasing hadn’t eliminated the term-by-term addition I’d hoped to avoid, but I did unintentionally create reinforcement of a concept.  Better yet, I got an idea for a data analysis lab.


They had some fundamental understanding that their calculators were “fast”, but couldn’t quantify what “fast” meant.  The question I posed them the next day was to compute \sum_{n=1}^k(3n) for various values of k, record the amount of time it took for the Nspire to return a solution, determine any pattern, and make predictions.

Recognizing the machine’s speed, one group said “K needs to be a large number, otherwise the calculator would be done before you even started to time.”  Here’s their data.


They graphed the first 5 values on a second Nspire and used the results to estimate how long it would take their first machine to compute the even more monumental task of adding up the first 50 million terms of the series–a task they had set their “loyal mathematical friend” to computing while they calculated their estimate.


Some claimed to be initially surprised that the data was so linear.  With some additional thought, they realized that every time k increased by 1, the Nspire had to do 2 additional computations:  one multiplication and one addition–a perfectly linear pattern.  They used a regression to find a quick linear model and checked residuals to make sure nothing strange was lurking in the background.


The lack of pattern and maximum residual magnitude of about 0.30 seconds over times as long as 390 seconds completely dispelled any remaining doubts of underlying linearity.  Using the linear regression, they estimated their first Nspire would be working for 32 minutes 29 seconds.


They looked at the calculator at 32 minutes, noted that it was still running, and unfortunately were briefly distracted.  When they looked back at 32 minutes, 48 seconds, the calculator had stopped.  It wasn’t worth it to them to re-run the experiment.  They were VERY IMPRESSED that even with the error, their estimate was off just 19 seconds (arguably up to 29 seconds off if the machine had stopped running right after their 32 minute observation).


The units of the linear regression slope (0.000039) were seconds per k.  Reciprocating gave approximately 25,657 computed and summed values of k per second.  As every increase in k required the calculator to multiply the next term number by 3 and add that new term value to the existing sum, each k represented 2 Nspire calculations.  Doubling the last result meant their Nspire was performing about 51,314 calculations per second when calculating the sum of an arithmetic series.


My students were impressed by the speed, the lurking linear function, and their ability to predict computation times within seconds for very long arithmetic series calculations.

Not a bad diversion from unexpected student work, I thought.

Infinite Ways to an Infinite Geometric Sum

One of my students, K, and I were reviewing Taylor Series last Friday when she asked for a reminder why an infinite geometric series summed to \displaystyle \frac{g}{1-r} for first term g and common ratio r when \left| r \right| < 1.  I was glad she was dissatisfied with blind use of a formula and dove into a familiar (to me) derivation.  In the end, she shook me free from my routine just as she made sure she didn’t fall into her own.


My standard explanation starts with a generic infinite geometric series.

S = g+g\cdot r+g\cdot r^2+g\cdot r^3+...  (1)

We can reason this series converges iff \left| r \right| <1 (see Footnote 1 for an explanation).  Assume this is true for (1).  Notice the terms on the right keep multiplying by r.

The annoying part of summing any infinite series is the ellipsis (…).  Any finite number of terms always has a finite sum, but that simply written, but vague ellipsis is logically difficult.  In the geometric series case, we might be able to handle the ellipsis by aligning terms in a similar series.  You can accomplish this by continuing the pattern on the right:  multiplying both sides by r

r\cdot S = r\cdot \left( g+g\cdot r+g\cdot r^2+... \right)

r\cdot S = g\cdot r+g\cdot r^2+g\cdot r^3+...  (2)

This seems to make make the right side of (2) identical to the right side of (1) except for the leading g term of (1), but the ellipsis requires some careful treatment. Footnote 2 explains how the ellipses of (1) and (2) are identical.  After that is established, subtracting (2) from (1), factoring, and rearranging some terms leads to the infinite geometric sum formula.

(1)-(2) = S-S\cdot r = S\cdot (1-r)=g

\displaystyle S=\frac{g}{1-r}


I despise giving any formula to any of my classes without at least exploring its genesis.  I also allow my students to use any legitimate mathematics to solve problems so long as reasoning is justified.

In my experiences, about half of my students opt for a formulaic approach to infinite geometric sums while an equal number prefer the quick “multiply-by-r-and-subtract” approach used to derive the summation formula.  For many, apparently, the dynamic manipulation is more meaningful than a static rule.  It’s very cool to watch student preferences at play.


K understood the proof, and then asked a question I hadn’t thought to ask.  Why did we have to multiply by r?  Could multiplication by r^2 also determine the summation formula?

I had three nearly simultaneous thoughts followed quickly by a fourth.  First, why hadn’t I ever thought to ask that?  Second, geometric series for \left| r \right|<1 are absolutely convergent, so K’s suggestion should work.  Third, while the formula would initially look different, absolute convergence guaranteed that whatever the “r^2 formula” looked like, it had to be algebraically equivalent to the standard form.  While I considered those conscious questions, my math subconscious quickly saw the easy resolution to K’s question and the equivalence from Thought #3.

Multiplying (1) by r^2 gives

r^2 \cdot S = g\cdot r^2 + g\cdot r^3 + ... (3)

and the ellipses of (1) and (3) partner perfectly (Footnote 2), so K subtracted, factored, and simplified to get the inevitable result.

(1)-(3) = S-S\cdot r^2 = g+g\cdot r

S\cdot \left( 1-r^2 \right) = g\cdot (1+r)

\displaystyle S=\frac{g\cdot (1+r)}{1-r^2} = \frac{g\cdot (1+r)}{(1+r)(1-r)} = \frac{g}{1-r}

That was cool, but this success meant that there were surely many more options.


Why stop at multiplying by r or r^2?  Why not multiply both sides of (1) by a generic r^N for any natural number N?   That would give

r^N \cdot S = g\cdot r^N + g\cdot r^{N+1} + ... (4)

where the ellipses of (1) and (4) are again identical by the method of Footnote 2.  Subtracting (4) from (1) gives

(1)-(4) = S-S\cdot r^N = g+g\cdot r + g\cdot r^2+...+ g\cdot r^{N-1}

S\cdot \left( 1-r^N \right) = g\cdot \left( 1+r+r^2+...+r^{N-1} \right)  (5)

There are two ways to proceed from (5).  You could recognize the right side as a finite geometric sum with first term 1 and ratio r.  Substituting that formula and dividing by \left( 1-r^N \right) would give the general result.

Alternatively, I could see students exploring \left( 1-r^N \right), and discovering by hand or by CAS that (1-r) is always a factor.  I got the following TI-Nspire CAS result in about 10-15 seconds, clearly suggesting that

1-r^N = (1-r)\left( 1+r+r^2+...+r^{N-1} \right).  (6)


Math induction or a careful polynomial expansion of (6) would prove the pattern suggested by the CAS.  From there, dividing both sides of (5) by \left( 1-r^N \right) gives the generic result.

\displaystyle S = \frac{g\cdot \left( 1+r+r^2+...+r^{N-1} \right)}{\left( 1-r^N \right)}

\displaystyle S = \frac{g\cdot \left( 1+r+r^2+...+r^{N-1} \right) }{(1-r) \cdot \left( 1+r+r^2+...+r^{N-1} \right)} = \frac{g}{1-r}

In the end, K helped me see there wasn’t just my stock approach to an infinite geometric sum, but really an infinite number of parallel ways.  Nice.


1) RESTRICTING r:  Obviously an infinite geometric series diverges for \left| r \right| >1 because that would make g\cdot r^n \rightarrow \infty as n\rightarrow \infty, and adding an infinitely large term (positive or negative) to any sum ruins any chance of finding a sum.

For r=1, the sum converges iff g=0 (a rather boring series). If g \ne 0 , you get a sum of an infinite number of some nonzero quantity, and that is always infinite, no matter how small or large the nonzero quantity.

The last case, r=-1, is more subtle.  For g \ne 0, this terms of this series alternate between positive and negative g, making the partial sums of the series add to either g or 0, depending on whether you have summed an even or an odd number of terms.  Since the partial sums alternate, the overall sum is divergent.  Remember that series sums and limits are functions; without a single numeric output at a particular point, the function value at that point is considered to be non-existent.

2) NOT ALL INFINITIES ARE THE SAME:  There are two ways to show two groups are the same size.  The obvious way is to count the elements in each group and find out there is the same number of elements in each, but this works only if you have a finite group size.  Alternatively, you could a) match every element in group 1 with a unique element from group 2, and b) match every element in group 2 with a unique element from group 1.  It is important to do both steps here to show that there are no left-over, unpaired elements in either group.

So do the ellipses in (1) and (2) represent the same sets?  As the ellipses represent sets with an infinite number of elements, the first comparison technique is irrelevant.  For the second approach using pairing, we need to compare individual elements.  For every element in the ellipsis of (1), obviously there is an “partner” in (2) as the multiplication of (1) by r visually shifts all of the terms of the series right one position, creating the necessary matches.

Students often are troubled by the second matching as it appears the ellipsis in (2) contains an “extra term” from the right shift.  But, for every specific term you identify in (2), its identical twin exists in (1).  In the weirdness of infinity, that “extra term” appears to have been absorbed without changing the “size” of the infinity.

Since there is a 1:1 mapping of all elements in the ellipses of (1) and (2), you can conclude they are identical, and their difference is zero.