# CAS and Normal Probability Distributions

My presentation this past Saturday at the 2015 T^3 International Conference in Dallas, TX was on the underappreciated applicability of CAS to statistics.  This post shares some of what I shared there from my first year teaching AP Statistics.

MOVING PAST OUTDATED PEDAGOGY

It’s been decades since we’ve required students to use tables of values to compute by hand trigonometric and radical values.  It seems odd to me that we continue to do exactly that today for so many statistics classes, including the AP.  While the College Board permits statistics-capable calculators, it still provides probability tables with every exam.  That messaging is clear:  it is still “acceptable” to teach statistics using outdated probability tables.

In this, my first year teaching AP Statistics, I decided it was time for my students and I to completely break from this lingering past.  My statistics classes this year have been 100% software-enabled.  Not one of my students has been required to use or even see any tables of probability values.

My classes also have been fortunate to have complete CAS availability on their laptops.  My school’s math department deliberately adopted the TI-Nspire platform in part because that software looks and operates exactly the same on tablet, computer, and handheld platforms.  We primarily use the computer-based version for learning because of the speed and visualization of the large “real estate” there.  We are shifting to school-owned handhelds in our last month before the AP Exam to gain practice on the platform required on the AP.

The remainder of this post shares ways my students and I have learned to apply the TI-Nspire CAS to some statistical questions around normal distributions.

FINDING NORMAL AREAS AND PROBABILITIES

Assume a manufacturer makes golf balls whose distances traveled under identical testing conditions are approximately normally distributed with a mean 295 yards with a standard deviation of 3 yards.  What is the probability that one such randomly selected ball travels more than 300 yards?

Traditional statistics courses teach students to transform the 300 yards into a z-score to look up in a probability table.  That approach obviously works, but with appropriate technology, I believe there will be far less need to use or even compute z-scores in much the same way that always converting logarithms to base-10 or base-to use logarithmic tables is anachronistic when using many modern scientific calculators.

TI calculators and other technologies allow computations of non-standard normal curves.  Notice the Nspire CAS calculation below the curve uses both bounds of the area of interest along with the mean and standard deviation of the distribution to accomplish the computation in a single step.

So the probability of a randomly selected ball from the population described above going more than 300 yards is 4.779%.

GOING BACKWARDS

Now assume the manufacturing process can control the mean distance traveled.  What mean should it use so that no more than 1% of the golf balls travel more than 300 yards?

Depending on the available normal probability tables, the traditional approach to this problem is again to work with z-scores.  A modified CAS version of this is shown below.

Therefore, the manufacturer should produce a ball that travels a mean 293.021 yards under the given conditions.

The approach is legitimate, and I shared it with my students.  Several of them ultimately chose a more efficient single line command:

But remember that the invNorm() and normCdf() commands on the Nspire are themselves functions, and so their internal parameters are available to solve commands.  A pure CAS, “forward solution” still incorporating only the normCdf() command to which my students were first introduced makes use of this to determine the missing center.

DIFFERENTIATING INSTRUCTION

While calculus techniques definitely are NOT part of the AP Statistics curriculum, I do have several students jointly enrolled in various calculus classes.  Some of these astutely noted the similarity between the area-based arguments above and the area under a curve techniques they were learning in their calculus classes.  Never being one to pass on a teaching moment, I pulled a few of these to the side to show them that the previous solutions also could have been derived via integration.

I can’t recall any instances of my students actually employing integrals to solve statistics problems this year, but just having the connection verified completely solidified the mathematics they were learning in my class.

CONFIDENCE INTERVALS

The mean lead level of 35 crows in a random sample from a region was 4.90 ppm and the standard deviation was 1.12 ppm.  Construct a 95 percent confidence interval for the mean lead level of crows in the region.

Many students–mine included–have difficulty comprehending confidence intervals and resort to “black box” confidence interval tools available in most (all?) statistics-capable calculators, including the TI-Nspire.

As n is greater than 30, I can compute the requested z-interval by filling in just four entries in a pop-up window and pressing Enter.

Convenient, for sure, but this approach doesn’t help the confused students understand that the confidence interval is nothing more than the bounds of the middle 95% of the normal pdf described in the problem, a fact crystallized by the application of the tools the students have been using for weeks by that point in the course.

Notice in the solve+normCdf() combination commands that the unknown this time was a bound and not the mean as was the case in the previous example.

EXTENDING THE RULE OF FOUR

I’ve used the “Rule of Four” in every math class I’ve taught for over two decades, explaining that every mathematical concept can be explained or expressed four different ways:  Numerically, Algebraically, Graphically (including graphs and geometric figures), and Verbally.  While not the contextual point of his quote, I often cite MIT’s Marvin Minsky here:

“You don’t understand anything until you learn it more than one way.”

Learning to translate between the four representations grants deeper understanding of concepts and often gives access to solutions in one form that may be difficult or impossible in other forms.

After my decades-long work with CAS, I now believe there is actually a 5th representation of mathematical ideas:  Tools.  Knowing how to translate a question into a form that your tool (in the case of CAS, the tool is computers) can manage or compute creates a different representation of the problem and requires deeper insights to manage the translation.

I knew some of my students this year had deeply embraced this “5th Way” when one showed me his alternative approach to the confidence interval question:

I found this solution particularly lovely for several reasons.

• The student knew about lists and statistical commands and on a whim tried combining them in a novel way to produce the desired solution.
• He found the confidence interval directly using a normal distribution command rather than the arguably more convenient black box confidence interval tool.  He also showed explicitly his understanding of the distribution of sample means by adjusting the given standard deviation for the sample size.
• Finally, while using a CAS sometimes involves getting answers in forms you didn’t expect, in this case, I think the CAS command and list output actually provide a cleaner, interval-looking result than the black box confidence interval command much more intuitively connected to the actual meaning of a confidence interval.
• While I haven’t tried it out, it seems to me that this approach also should work on non-CAS statistical calculators that can handle lists.

(a very minor disappointment, quickly overcome)

Returning to my multiple approaches, I tried using my student’s newfound approach using a normCdf() command.

Alas, my Nspire returned the very command I had entered, indicating that it didn’t understand the question I had posed.  While a bit disappointed that this approach didn’t work, I was actually excited to have discovered a boundary in the current programming of the Nspire.  Perhaps someday this approach will also work, but my students and I have many other directions we can exploit to find what we need.

Leaving the probability tables behind in their appropriate historical dust while fully embracing the power of modern classroom technology to enhance my students’ statistical learning and understanding, I’m convinced I made the right decision to start this school year.  They know more, understand the foundations of statistics better, and as a group feel much more confident and flexible.  Whether their scores on next month’s AP exam will reflect their growth, I can’t say, but they’ve definitely learned more statistics this year than any previous statistics class I’ve ever taught.

COMPLETE FILES FROM MY 2015 T3 PRESENTATION

If you are interested, you can download here the PowerPoint file for my entire Nspired Statistics and CAS presentation from last week’s 2015 T3 International Conference in Dallas, TX.  While not the point of this post, the presentation started with a non-calculus derivation/explanation of linear regressions.  Using some great feedback from Jeff McCalla, here is an Nspire CAS document creating the linear regression computation updated from what I presented in Dallas.  I hope you found this post and these files helpful, or at least thought-provoking.