Calling for More CAS in Statistics

When you allow your students to solve problems in ways that make the most sense to them, interesting and unexpected results sometimes happen.  On a test in my senior, non-AP statistics course earlier this semester, we posed this.

A child is 40 inches tall, which places her in the top 10% of all children of similar age. The heights for children of this age form an approximately normal distribution with a mean of 38 inches. Based on this information, what is the standard deviation of the heights of all children of this age? 

From the question, you likely deduced that we had been exploring normal distributions and the connection between areas under such curves and their related probabilities and percentiles.  Hoping to get students to think just a little bit, we decided to reverse a cookbook question (given or derive a z-score and compute probability) and asked instead for standard deviation.  My fellow teacher and I saw the question as a simple Algebra I-level manipulation, but our students found it a very challenging revision.  Only about 5% of the students actually solved it the way we thought.  The majority employed a valid (but not always justified) trial-and-error approach.  And then one of my students invoked what I thought to be a brilliant use of a CAS command that I should have imagined myself.  Unfortunately, it did not work out, even though it should.  I’m hoping future iterations of CAS software of all types will address this shortcoming.

What We Thought Would Happen

The problem information can be visually represented as shown below.  

Given x-values, means, and standard deviations, our students had practice with many problems which gave the resulting area.  They had also been given areas under normal curves and worked backwards to z-scores which could be re-scaled and re-centered to corresponding points on any normal curve.  We hoped they would be able to apply what they knew about normal curves to make this a different, but relatively straightforward question.

Given the TI-nSpire software and calculators each of our students has, we’ve completely abandoned statistics tables.  Knowing that the given score was at the 90th percentile, the inverse normal TI-nSpire command quickly shows that this point corresponds to an approximate z-score of 1.28155.  Substituting this and the other givens into the x-value to z-score scaling relationship, z=\frac{x-\overline{x}}{s} , leads to an equation with a single unknown which easily can be solved by hand or by CAS to find the unknown standard deviation, s \approx 1.56061 .  Just a scant handful of students actually employed this.

What the Majority Did

Recognizing the problem as a twist on their previous work, most invoked a trial-and-error approach.  From their work, I could see that most essentially established bounds around the potential standard deviation and employed an interval bisection approach (not that any actually formally named their technique).

If you know the bounds, mean, and standard deviation of a portion of a normal distribution, you can find the percentage area using the nSpire’s normal Cdf command.  Knowing that the percentage area was 0.1, most tried a standard deviation of 1, and saw that not enough area (0.02275) was captured.  Then they tried 2, and goth too much area (0.158655).  A reproduction of one student’s refinements leading to a standard deviation of s \approx 1.56 follows.

THE COOL PART:  Students who attempted this approach got to deal directly with the upper 10% of the area; they weren’t required to adapt this to the “left-side area” input requirement of the inverse normal command.  While this adjustment is certainly minor, being able to focus on the problem parameters–as defined–helped some of my students.

THE PROBLEM:  As a colleague at my school told me decades ago when I started teaching, “Remember that a solution to every math problem must meet two requirements.  Show that the solution(s) you find is (are) correct, and show that there are no other solutions.”

Given a fixed mean and x-value, it seems intuitive to me that there is a one-to-one correspondence between the standard deviation of the normal curve and the area to the right of that point.  This isn’t a class for which I’d expect rigorous proof of such an assertion, but I still hoped that some might address the generic possibility of multiple answers and attempt some explanation for why that can’t happen here.  None showed anything like that, and I’m pretty certain that not a single one of my students in this class considered this possibility.  They had found an answer that worked and walked away satisfied.  I tried talking with them afterwards, but I’m not sure how many fully understood the subtle logic and why it was mathematically important.

The Creative Solution

Exactly one student remembered that he had a CAS and that it could solve equations.  Tapping the normal Cdf command used by the majority of his peers, he set up and tried to solve an equation as shown below.

Sadly, this should have worked for my student, but it didn’t.  (He ultimately fell back on the trial-and-error approach.) The equation he wrote is the heart of the trial-and-error process, and there is a single solution.  I suspect the programmers at TI simply hadn’t thought about applying the CAS commands from one area of their software to the statistical functions in another part of their software.  Although I should have, I hadn’t thought about that either.

Following my student’s lead, I tried a couple other CAS approaches (solving systems, etc.) to no avail.  Then I shifted to a graphical approach.  Defining a function using the normal Cdf command, I was able to get a graph.

Graphing y=0.1 showed a single point of intersection which could then be computed numerically in the graph window to give the standard deviation from earlier.

What this says to me is that the CAS certainly has the ability to solve my student’s equation–it did so numerically in the graph screen–but for some reason this functionality is not currently available on the TI-nSpire CAS’s calculator screens.

Extensions

My statistics students just completed a unit on confidence intervals and inferential reasoning.  Rather than teaching them the embedded TI syntax for confidence intervals and hypothesis testing, I deliberately stayed with just the normal Cdf and inverse normal commands–nothing more.  A core belief of my teaching is

Memorize as little as possible and use it as broadly as possible.

By staying with these just two commands, I continued to reinforce what normal distributions are and do, concepts that some still find challenging.  What my student taught me was that perhaps I could limit these commands to just one, the normal Cdf.

For example, if you had a normal distribution with mean 38 and standard deviation 2, what x-value marks the 60th percentile?

Now that’s even more curious.  The solve command doesn’t work (even in an approximation mode), but now the numerical solve gives the solution confirmed by the inverse normal command.

What if you wanted the bounds on the same normal distribution that defined the middle 80% of the area?  As shown below, I failed to solve then when I asked the CAS to compute directly for any of the equivalent distance of the bounds from the mean (line 1), the z-scores (line 2), or the location of the upper x-value (line 3).

But reversing the problem to define the 10% area of the right tail does give the desired result (line 2) (note that nsolve does work even though solve still does not) with the solution confirmed by the final two lines.

Conclusion

Admittedly, there’s not lots of algebra involved in most statistics classes–LOADS of arithmetic and computation, but not much algebra.  I’m convinced, though, that more attention to some algebraic thinking could benefit students.  The different statistics packages out there do lots of amazing things, especially the TI-nSpire, but it would be very nice if these packages could align themselves better with CAS features to permit students to ask their questions more “naturally”.  After all, such support and scaffolding are key features that make CAS and all technology so attractive for those of us using them in the classroom.

Advertisements

2 responses to “Calling for More CAS in Statistics

  1. Great ideas here – I agree the algebraic thinking that would come out of it would be very beneficial. I suspect the normCdf function is too complicated for CAS to be able to work with it algebraically, but here’s something to try – and it adds to the thinking: make use of guesses. In your first one, solve(normCdf(40,∞,38,x)=0.1,x=1) gives the desired solution, as does nSolve(normCdf(40,∞,38,x)=0.1,x,1). Other guesses also work – but the guesses need to make sense for the problem, hence the added thinking involved.
    The interval problem can also work, Here’s a set-up that gave the desired solution: solve(normCdf(38-x,38+x,38,2)=0.8,x=2). I tried higher guesses and the solution still came through (although for the nSolve syntax, the guess had to be less than the solution for it to work).

  2. The creative solution using CAS will indeed work if the normCdf() function is substituted with the actual normal distribution function. However it will take more than 3 minutes to arrive the answer of 1.56061 with solve(), with a message of “Questionable accuracy”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s