# Monthly Archives: November 2012

## Calling for More CAS in Statistics

When you allow your students to solve problems in ways that make the most sense to them, interesting and unexpected results sometimes happen.  On a test in my senior, non-AP statistics course earlier this semester, we posed this.

A child is 40 inches tall, which places her in the top 10% of all children of similar age. The heights for children of this age form an approximately normal distribution with a mean of 38 inches. Based on this information, what is the standard deviation of the heights of all children of this age?

From the question, you likely deduced that we had been exploring normal distributions and the connection between areas under such curves and their related probabilities and percentiles.  Hoping to get students to think just a little bit, we decided to reverse a cookbook question (given or derive a z-score and compute probability) and asked instead for standard deviation.  My fellow teacher and I saw the question as a simple Algebra I-level manipulation, but our students found it a very challenging revision.  Only about 5% of the students actually solved it the way we thought.  The majority employed a valid (but not always justified) trial-and-error approach.  And then one of my students invoked what I thought to be a brilliant use of a CAS command that I should have imagined myself.  Unfortunately, it did not work out, even though it should.  I’m hoping future iterations of CAS software of all types will address this shortcoming.

What We Thought Would Happen

The problem information can be visually represented as shown below.

Given x-values, means, and standard deviations, our students had practice with many problems which gave the resulting area.  They had also been given areas under normal curves and worked backwards to z-scores which could be re-scaled and re-centered to corresponding points on any normal curve.  We hoped they would be able to apply what they knew about normal curves to make this a different, but relatively straightforward question.

Given the TI-nSpire software and calculators each of our students has, we’ve completely abandoned statistics tables.  Knowing that the given score was at the 90th percentile, the inverse normal TI-nSpire command quickly shows that this point corresponds to an approximate z-score of 1.28155.  Substituting this and the other givens into the x-value to z-score scaling relationship, $z=\frac{x-\overline{x}}{s}$ , leads to an equation with a single unknown which easily can be solved by hand or by CAS to find the unknown standard deviation, $s \approx 1.56061$ .  Just a scant handful of students actually employed this.

What the Majority Did

Recognizing the problem as a twist on their previous work, most invoked a trial-and-error approach.  From their work, I could see that most essentially established bounds around the potential standard deviation and employed an interval bisection approach (not that any actually formally named their technique).

If you know the bounds, mean, and standard deviation of a portion of a normal distribution, you can find the percentage area using the nSpire’s normal Cdf command.  Knowing that the percentage area was 0.1, most tried a standard deviation of 1, and saw that not enough area (0.02275) was captured.  Then they tried 2, and goth too much area (0.158655).  A reproduction of one student’s refinements leading to a standard deviation of $s \approx 1.56$ follows.

THE COOL PART:  Students who attempted this approach got to deal directly with the upper 10% of the area; they weren’t required to adapt this to the “left-side area” input requirement of the inverse normal command.  While this adjustment is certainly minor, being able to focus on the problem parameters–as defined–helped some of my students.

THE PROBLEM:  As a colleague at my school told me decades ago when I started teaching, “Remember that a solution to every math problem must meet two requirements.  Show that the solution(s) you find is (are) correct, and show that there are no other solutions.”

Given a fixed mean and x-value, it seems intuitive to me that there is a one-to-one correspondence between the standard deviation of the normal curve and the area to the right of that point.  This isn’t a class for which I’d expect rigorous proof of such an assertion, but I still hoped that some might address the generic possibility of multiple answers and attempt some explanation for why that can’t happen here.  None showed anything like that, and I’m pretty certain that not a single one of my students in this class considered this possibility.  They had found an answer that worked and walked away satisfied.  I tried talking with them afterwards, but I’m not sure how many fully understood the subtle logic and why it was mathematically important.

The Creative Solution

Exactly one student remembered that he had a CAS and that it could solve equations.  Tapping the normal Cdf command used by the majority of his peers, he set up and tried to solve an equation as shown below.

Sadly, this should have worked for my student, but it didn’t.  (He ultimately fell back on the trial-and-error approach.) The equation he wrote is the heart of the trial-and-error process, and there is a single solution.  I suspect the programmers at TI simply hadn’t thought about applying the CAS commands from one area of their software to the statistical functions in another part of their software.  Although I should have, I hadn’t thought about that either.

Following my student’s lead, I tried a couple other CAS approaches (solving systems, etc.) to no avail.  Then I shifted to a graphical approach.  Defining a function using the normal Cdf command, I was able to get a graph.

Graphing $y=0.1$ showed a single point of intersection which could then be computed numerically in the graph window to give the standard deviation from earlier.

What this says to me is that the CAS certainly has the ability to solve my student’s equation–it did so numerically in the graph screen–but for some reason this functionality is not currently available on the TI-nSpire CAS’s calculator screens.

Extensions

My statistics students just completed a unit on confidence intervals and inferential reasoning.  Rather than teaching them the embedded TI syntax for confidence intervals and hypothesis testing, I deliberately stayed with just the normal Cdf and inverse normal commands–nothing more.  A core belief of my teaching is

Memorize as little as possible and use it as broadly as possible.

By staying with these just two commands, I continued to reinforce what normal distributions are and do, concepts that some still find challenging.  What my student taught me was that perhaps I could limit these commands to just one, the normal Cdf.

For example, if you had a normal distribution with mean 38 and standard deviation 2, what x-value marks the 60th percentile?

Now that’s even more curious.  The solve command doesn’t work (even in an approximation mode), but now the numerical solve gives the solution confirmed by the inverse normal command.

What if you wanted the bounds on the same normal distribution that defined the middle 80% of the area?  As shown below, I failed to solve then when I asked the CAS to compute directly for any of the equivalent distance of the bounds from the mean (line 1), the z-scores (line 2), or the location of the upper x-value (line 3).

But reversing the problem to define the 10% area of the right tail does give the desired result (line 2) (note that nsolve does work even though solve still does not) with the solution confirmed by the final two lines.

Conclusion

Admittedly, there’s not lots of algebra involved in most statistics classes–LOADS of arithmetic and computation, but not much algebra.  I’m convinced, though, that more attention to some algebraic thinking could benefit students.  The different statistics packages out there do lots of amazing things, especially the TI-nSpire, but it would be very nice if these packages could align themselves better with CAS features to permit students to ask their questions more “naturally”.  After all, such support and scaffolding are key features that make CAS and all technology so attractive for those of us using them in the classroom.

## A Bigger Education

On this Election Day 2012 in the United States, I’m going to divert briefly from my typical math education posts to share my experiences at my local polling place.

Especially today, I’m so unbelievably proud to be American.

I got to my local polling station pretty early to cast my ballot so I could get to school in time for the beginning of the day. By the time I left, the line was easily 5ish times longer than when I arrived, and it snaked through a few different rooms in the building before winding outside into the cold rain and wind with voters all along cheerfully waiting their turn to be part of the process. The poll workers strove to get everyone through quickly and efficiently. The voters were civil and seemed generally honored to have the opportunity to be part of the decision-making process in deciding the next iteration of our government.

Our country and political process certainly has its warts, but at least on this day, in my part of Atlanta, we set aside our differences and were civil, respectful, and hopeful as we made our collective voices heard.

I ran into two seniors from my school on my way out; they were thrilled to be taking part in their first election.

Whatever the election results after today, I hope we all can remember the respect and determination of every person in that polling line.

In one small place today, we were a better people.

## Binomial Probability and CAS

I posted previously about a year ago an idea for using CAS in a statistics course with probability.  I’ve finally had an opportunity to use it with students in my senior one-semester statistics course over the last few weeks, so I thought I’d share some refinements.  To demonstrate the mathematics, I’ll use the following problem situation.

Assume in a given country that women represent 40% of the total work force.  A company in that country has 10 employees, only 2 of which are women.
1) What is the probability that by pure chance a 10-employee company in that country might employ exactly 2 women?
2) What is the probability that by pure chance a 10-employee company in that country might employ 2 or fewer women?

Over a decade ago, I used binomial probability situations like this as an application of polynomial expansions, tapping Pascal’s Triangle and combinatorics to find the number of ways a group of exactly 2 women can appear in a total group size of 10.  Historically, I encouraged students to approach this problem by defining m=men and w=women and expand $(m+w)^{10}$ where the exponent was the number of employees, or more generally, the number of trials.  Because question 1 asks about the probability of exactly 2 women, I was interested in the specific term in the binomial expansion that contained $w^2$.  Whether you use Pascal’s Triangle or combinations, that term is $45w^2m^8$.  Substituting in given percentages of women and men in the workforce, $P(w)=0.4$ and $P(m)=0.6$, answers the first question.  I used a TI-nSpire to determine that there is a 12.1% chance of this.

That was 10-20 years ago and I hadn’t taught a statistics course in a very long time.  I suspect most statistics classes using TI-nSpires (CAS or non-CAS) today use the binompdf command to get this probability.

The slight differences in the input parameters determine whether you get the probability of the single event or the probabilities for all of the events in the entire sample space.  The challenge for the latter is remembering that the order of the probabilities starts at 0 occurrences of the event whose probability is defined by the second parameter.  Counting over carefully from the correct end of the sequence gives the desired probability.

With my exploration of CAS in the classroom over the past decade, I saw this problem very differently when I posted last year.  The binompdf command works well, but you need to remember what the outputs mean.  The earlier algebra does this, but it is clearly more cumbersome.  Together, all of this screams (IMO) for a CAS.  A CAS could enable me to see the number of ways each event in the sample space could occur.  The TI-nSpire CAS‘s output using an expand command follows.

The cool part is that all 11 terms in this expansion appear simultaneously.  It would be nice if I could see all of the terms at once, but a little scrolling leads to the highlighted term which could then be evaluated using a substitute command.

The insight from my previous post was that when expanding binomials, any coefficients of the individual terms “received” the same exponents as the individual variables in the expansion.  With that in mind, I repeated the expansion.

The resulting polynomial now shows all the possible combinations of men and women, but now each coefficient is the probability of its corresponding event.  In other words, in a single command this approach defines the entire probability distribution!  The highlighted portion above shows the answer to question 1 in a single step.

Last week one of my students reminded me that TI-nSpire CAS variables need not be restricted to a single character.  Some didn’t like the extra typing, but others really liked the fully descriptive output.

To answer question 2, TI-nSpire users could add up the individual binompdf outputs -OR- use a binomcdf command.

This gets the answer quickly, but suffers somewhat from the lack of descriptives noted earlier.  Some of my students this year preferred to copy the binomial expansion terms from the CAS expand command results above, delete the variable terms, and sum the results.  Then one suggested a cool way around the somewhat cumbersome algebra would be to substitute 1s for both variables.

CONCLUSION:  I’ve loved the way my students have developed a very organic understanding of binomial probabilities over this last unit.  They are using technology as a scaffold to support cumbersome, repetitive computations and have enhanced in a few directions my initial presentations of optional ways to incorporate CAS.  This is technology serving its appropriate role as a supporter of student learning.

OTHER CAS:  I focused on the TI-nSpire CAS for the examples above because that is the technology is my students have.  Obviously any CAS system would do.  For a free, Web-based CAS system, I always investigate what Wolfram Alpha has to offer.  Surprisingly, it didn’t deal well with the expanded variable names in $(0.4women+0.6men)^{10}$.  Perhaps I could have used a syntax variation, but what to do wasn’t intuitive, so I simplified the variables here to get

Huge Pro:  The entire probability distribution with its descriptors is shown.
Very minor Con:  Variables aren’t as fully readable as with the fully expanded variables on the nSpire CAS.