Tag Archives: graphing

How One Data Point Destroyed a Study

Statistics are powerful tools.  Well-implemented, they tease out underlying patterns from the noise of raw data and improve our understanding.  But statistics must take care to avoid misstatements.   Unfortunately, statistics can also deliberately distort relationships, declaring patterns where none exist.  In my AP Statistics classes, I hope my students learn to extract meaning from well-designed studies, and to spot instances of Benjamin Disraeli’s “three kinds of lies:  lies, damned lies, and statistics.”

This post explores part of a study published August 12, 2015, exposing what I believe to be examples of four critical ways statistics are misunderstood and misused:

  • Not recognizing the distortion power of outliers in means, standard deviations, and in the case of the study below, regressions.
  • Distorting graphs to create the impression of patterns different from what actually exists,
  • Cherry-picking data to show only favorable results, and
  • Misunderstanding the p-value in inferential studies.

THE STUDY:

I was searching online for examples of research I could use with my AP Statistics classes when I found on the page of a math teacher organization a link to an article entitled, “Cardiorespiratory fitness linked to thinner gray matter and better math skills in kids.”  Following the URL trail, I found a description of the referenced article in an August, 2015 summary article by Science Daily and the actual research posted on August 12, 2015 by the journal, PLOS ONE.

As a middle and high school teacher, I’ve read multiple studies connecting physical fitness to brain health.  I was sure I had hit paydirt with an article offering multiple, valuable lessons for my students!  I read the claims of the Science Daily research summary correlating the physical fitness of 9- and 10-year-old children to performance on a test of arithmetic.  It was careful not to declare cause-and-effect,  but did say

The team found differences in math skills and cortical brain structure between the higher-fit and lower-fit children. In particular, thinner gray matter corresponded to better math performance in the higher-fit kids. No significant fitness-associated differences in reading or spelling aptitude were detected. (source)

The researchers described plausible connections for the aerobic fitness of children and the thickness of cortical gray matter for each participating child.  The study went astray when they attempted to connect their findings to the academic performance of the participants.

Independent t-tests were employed to compare WRAT-3 scores in higher fit and lower fit children. Pearson correlations were also conducted to determine associations between cortical thickness and academic achievement. The alpha level for all tests was set at p < .05. (source)

All of the remaining images, quotes, and data in this post are pulled directly from the primary article on PLOS ONE.  The URLs are provided above with bibliographic references are at the end.

To address questions raised by the study, I had to access the original data and recreate the researchers’ analyses.  Thankfully, PLOS ONE is an open-access journal, and I was able to download the research data.  In case you want to review the data yourself or use it with your classes, here is the original SPSS file which I converted into Excel and TI-Nspire CAS formats.

BEWARE OUTLIERS and MISLEADING SCALES:

My suspicions were piqued when I saw the following two graphs–the only scatterplots offered in their research publication.

fitness1

Scatterplot 1:  Attempt to connect Anterior Frontal Gray Matter thickness with WRAT-3 Arithmetic performance

The right side of the top scatterplot looked like an uncorrelated cloud of data with one data point on the far left seeming to pull the left side of the linear regression upwards, creating a more negative slope.  Because the study reported only two statistically significant correlations between the WRAT tests and cortical thickness in two areas of the brain, I was now concerned that the single extreme data point may have distorted results.

My initial scatterplot (below) confirmed the published graph, but fit to the the entire window, the data now looked even less correlated.

fitness3

In this scale, the farthest left data point (WRAT Arithmetic score = 66, Anterior Frontal thickness = 3.9) looked much more like an outlier.  I confirmed that the point exceeded 1.5IQRs below the lower quartile, as indicated visually in a boxplot of the WRAT-Arithmetic scores.

fitness7

Also note from my rescaled scatterplot that the Anterior Frontal measure (y-coordinate) was higher than any of the next five ordered pairs to its right.  Its horizontal outlier location, coupled with its notably higher vertical component, suggested that the single point could have significant influence on any regression on the data.  There was sufficient evidence for me to investigate the study results excluding the (66, 3.9) data point.

fitness4

The original linear regression on the 48 (WRAT Arithmetic, AF thickness) data was AF=-0.007817(WRAT_A)+4.350.  Excluding (66, 3.9), the new scatterplot above shows the revised linear regression on the remaining 47 points:  AF=-0.007460(WRAT_A)+4.308.  This and the original equation are close, but the revised slope is 4.6% smaller in magnitude relative to the published result. With the two published results reported significant at p=0.04, the influence of the outlier (66, 3.9) has a reasonable possibility of changing the study results.

Scatterplot 2:  Attempt to connect Superior Frontal Gray Matter thickness with WRAT-3 Arithmetic performance

The tightly compressed scale of the second published scatterplot made me deeply suspicious the (WRAT Arithmetic, Superior Frontal thickness) data was being vertically compressed to create the illusion of a linear relationship where one possibly did not exist.

Rescaling the the graphing window (below) made those appear notably less linear than the publication implied.  Also, the data point corresponding to the WRAT-Arithmetic score of 66 appeared to suffer from the same outlier-influences as the first data set.  It was still an outlier, but now its vertical component was higher than the next eight data points to its right, with some of them notably lower.  Again, there was sufficient evidence to investigate results excluding the outlier data point.

fitness2

The linear regression on the original 48 (WRAT Arithmetic, SF thickness) data points was SF=-0.002767(WRAT_A)+4.113 (above).  Excluding the outlier , the new scatterplot (below) had revised linear regression, SF=-0.002391(WRAT_A)+4.069.  This time, the revised slope was 13.6% smaller in magnitude relative to the original slope.  With the published significance also at p=0.04, omitting the outlier was almost certain to change the published results. 

fitness5

THE OUTLIER BROKE THE STUDY

The findings above strongly suggest the published study results are not as reliable as reported.  It is time to rerun the significance tests.

For the first data set–(WRAT Arithmetic, AF thickness) —run an independent t-test on the regression slope with and without the outlier.

  • INCLUDING OUTLIER:  For all 48 samples, the researchers reported a slope of -0.007817, r=-0.292, and p=0.04.  This was reported as a significant result.
  • EXCLUDING OUTLIER:  For the remaining 47 samples, the slope is -0.007460, r=-0.252, and p=0.087.  The r confirms the visual impression that the data was less linear and, most importantly, the correlation is no longer significant at \alpha <0.05.

For the second data set–(WRAT Arithmetic, SF thickness):

  • INCLUDING OUTLIER:  For all 48 samples, the researchers reported a slope of -0.002767, r=-0.291, and p=0.04.  This was reported as a significant result.
  • EXCLUDING OUTLIER:  For the remaining 47 samples, the slope is -0.002391, r=-0.229, and p=0.121.  This revision is even less linear and, most importantly, the correlation is no longer significant for any standard significance level.

In brief, the researchers’ arguable decision to include the single, clear outlier data point was the source of any significant results at all.  Whatever correlation exists between gray matter thickness and WRAT-Arithmetic as measured by this study is tenuous, at best, and almost certainly not significant.

THE DANGERS OF CHERRY-PICKING RESULTS:

So, let’s set aside the entire questionable decision to keep an outlier in the data set to achieve significant findings.  There is still a subtle, potential problem with this study’s result that actually impacts many published studies.

The researchers understandably were seeking connections between the thickness of a brain’s gray matter and the academic performance of that brain as measured by various WRAT instruments.  They computed independent t-tests of linear regression slopes between thickness measures at nine different locations in the brain against three WRAT test measures for a total of 27 separate t-tests.  The next table shows the correlation coefficient and p-value from each test.

fitness6

This approach is commonly used with researchers reporting out only the tests found to be significant.  But in doing so, the researchers may have overlooked a fundamental property of the confidence intervals that underlie p-values.  Using the typical critical value of p=0.05 uses a 95% confidence interval, and one interpretation of a 95% confidence interval is that under the conditions of the assumed null hypothesis, results that occur in most extreme 5% of outcomes will NOT be considered as resulting from the null hypothesis, even though they are.

In other words, even under they typical conditions for which the null hypothesis is true, 5% of correct results would be deemed different enough to be statistically significant–a Type I Error.  Within this study, this defines a binomial probability situation with 27 trials for which the probability of any one trial producing a significant result even though the null hypothesis is correct, is p=0.05.

The binomial probability of finding exactly 2 significant results at p=0.05 over 27 trials is 0.243, and the probability of producing 2 or more significant results when the null hypothesis is true is 39.4%.

fitness8

That means there is a 39.4% probability in any study testing 27 trials at a p<0.05 critical value that at least 2 of those trials would report a result that would INCORRECTLY be interpreted as contradicting the null hypothesis.  And if more conditions than 27 are tested, the probability of a Type I Error is even higher.

Whenever you have a large number of inference trials, there is an increasingly large probability that at least some of the “significant” trials are actually just random, undetected occurrences of the null hypothesis.

It just happens.

THE ELUSIVE MEANING OF A p-VALUE:

For more on the difficulty of understanding p-values, check out this nice recent article on FiveThirtyEight Science–Not Even Scientists Can Easily Explain P-Values. 

CONCLUSION:

Personally, I’m a little disappointed that this study didn’t find significant results.  There are many recent studies showing the connection between physical activity and brain health, but this study didn’t achieve its goal of finding a biological source to explain the correlation.

It is the responsibility of researchers to know their studies and their resulting data sets.  Not finding significant results is not a problem.  But I do expect research to disclaim when its significant results hang entirely on a choice to retain an outlier in its data set.

REFERENCES:

Chaddock-Heyman L, Erickson KI, Kienzler C, King M, Pontifex MB, Raine LB, et al. (2015) The Role of Aerobic Fitness in Cortical Thickness and Mathematics Achievement in Preadolescent Children. PLoS ONE 10(8): e0134115. doi:10.1371/journal.pone.0134115

University of Illinois at Urbana-Champaign. “Cardiorespiratory fitness linked to thinner gray matter and better math skills in kids.” ScienceDaily. http://www.sciencedaily.com/releases/2015/08/150812151229.htm (accessed December 8, 2015).

 

 

Best Algebra 2 Lab Ever

This post shares what I think is one of the best, inclusive, data-oriented labs for a second year algebra class.  This single experiment produces linear, quadratic, and exponential (and logarithmic) data from a lab my Algebra 2 students completed this past summer.  In that class, I assigned frequent labs where students gathered real data, determined models to fit that data, and analyzed goodness of the models’ fit to the data.   I believe in the importance of doing so much more than just writing an equation and moving on.

For kicks, I’ll derive an approximation for the coefficient of gravity at the end.

THE LAB:

On the way to school one morning last summer, I grabbed one of my daughters’ “almost fully inflated” kickballs and attached a TI CBR2 to my laptop and gathered (distance, time) data from bouncing the ball under the Motion Sensor.  NOTE:  TI’s CBR2 can connect directly to their Nspire and TI84 families of graphing calculators.  I typically use computer-based Nspire CAS software, so I connected the CBR via my laptop’s USB port.  It’s crazy easy to use.

One student held the CBR2 about 1.5-2 meters above the ground while another held the ball steady about 20 cm below the CBR2 sensor.  When the second student released the ball, a third clicked a button on my laptop to gather the data:  time every 0.05 seconds and height from the ground.  The graphed data is shown below.  In case you don’t have access to a CBR or other data gathering devices, I’ve uploaded my students’ data in this Excel file.

Bounce1

Remember, this is data was collected under far-from-ideal conditions.  I picked up a kickball my kids left outside on my way to class.  The sensor was handheld and likely wobbled some, and the ball was dropped on the well-worn carpet of our classroom floor.  It is also likely the ball did not remain perfectly under the sensor the entire time.  Even so, my students created a very pretty graph on their first try.

For further context, we did this lab in the middle of our quadratics unit that was preceded by a unit on linear functions and another on exponential and logarithmic functions.  So what can we learn from the bouncing ball data?

LINEAR 1:  

While it is very unlikely that any of the recorded data points were precisely at maximums, they are close enough to create a nice linear pattern.

As the height of a ball above the ground helps determine the height of its next bounce (height before –> energy on impact –> height after), the eight ordered pairs (max height #n, max height #(n+1) ) from my students’ data are shown below

bounce2

This looks very linear.  Fitting a linear regression and analyzing the residuals gives the following.

bounce3

The data seems to be close to the line, and the residuals are relatively small, about evenly distributed above and below the line, and there is no apparent pattern to their distribution.  This confirms that the regression equation, y=0.673x+0.000233, is a good fit for the = height before bounce and = height after bounce data.

NOTE:  You could reasonably easily gather this data sans any technology.  Have teams of students release a ball from different measured heights while others carefully identify the rebound heights.

The coefficients also have meaning.  The 0.673 suggests that after each bounce, the ball rebounded to 67.3%, or 2/3, of its previous height–not bad for a ball plucked from a driveway that morning.  Also, the y-intercept, 0.000233, is essentially zero, suggesting that a ball released 0 meters from the ground would rebound to basically 0 meters above the ground.  That this isn’t exactly zero is a small measure of error in the experiment.

EXPONENTIAL:

Using the same idea, consider data of the form (x,y) = (bounce number, bounce height). the graph of the nine points from my students’ data is:

bounce4

This could be power or exponential data–something you should confirm for yourself–but an exponential regression and its residuals show

bounce5

While something of a pattern seems to exist, the other residual criteria are met, making the exponential regression a reasonably good model: y = 0.972 \cdot (0.676)^x.  That means bounce number 0, the initial release height from which the downward movement on the far left of the initial scatterplot can be seen, is 0.972 meters, and the constant multiplier is about 0.676.  This second number represents the percentage of height maintained from each previous bounce, and is therefore the percentage rebound.  Also note that this is essentially the same value as the slope from the previous linear example, confirming that the ball we used basically maintained slightly more than 2/3 of its height from one bounce to the next.

And you can get logarithms from these data if you use the equation to determine, for example, which bounces exceed 0.2 meters.

bounce12

So, bounces 1-4 satisfy the requirement for exceeding 0.20 meters, as confirmed by the data.

A second way to invoke logarithms is to reverse the data.  Graphing x=height and y=bounce number will also produce the desired effect.

QUADRATIC:

Each individual bounce looks like an inverted parabola.  If you remember a little physics, the moment after the ball leaves the ground after each bounce, it is essentially in free-fall, a situation defined by quadratic movement if you ignore air resistance–something we can safely assume given the very short duration of each bounce.

I had eight complete bounces I could use, but chose the first to have as many data points as possible to model.  As it was impossible to know whether the lowest point on each end of any data set came from the ball moving up or down, I omitted the first and last point in each set.  Using (x,y) = (time, height of first bounce) data, my students got:

bounce6

What a pretty parabola.  Fitting a quadratic regression (or manually fitting one, if that’s more appropriate for your classes), I get:

bounce7

Again, there’s maybe a slight pattern, but all but two points are will withing  0.1 of 1% of the model and are 1/2 above and 1/2 below.  The model, y=-4.84x^2+4.60x-4.24, could be interpreted in terms of the physics formula for an object in free fall, but I’ll postpone that for a moment.

LINEAR 2:

If your second year algebra class has explored common differences, your students could explore second common differences to confirm the quadratic nature of the data.  Other than the first two differences (far right column below), the second common difference of all data points is roughly 0.024.  This raises suspicions that my student’s hand holding the CBR2 may have wiggled during the data collection.

bounce8

Since the second common differences are roughly constant, the original data must have been quadratic, and the first common differences linear. As a small variation for each consecutive pair of (time, height) points, I had my students graph (x,y) = (x midpoint, slope between two points):

bounce10

If you get the common difference discussion, the linearity of this graph is not surprising.  Despite those conversations, most of my students seem completely surprised by this pattern emerging from the quadratic data.  I guess they didn’t really “get” what common differences–or the closely related slope–meant until this point.

bounce11

Other than the first three points, the model seems very strong.  The coefficients tell an even more interesting story.

GRAVITY:

The equation from the last linear regression is y=4.55-9.61x.  Since the data came from slope, the y-intercept, 4.55, is measured in m/sec.  That makes it the velocity of the ball at the moment (t=0) the ball left the ground.  Nice.

The slope of this line is -9.61.  As this is a slope, its units are the y-units over the x-units, or (m/sec)/(sec).  That is, meters per squared second.  And those are the units for gravity!  That means my students measured, hidden within their data, an approximation for coefficient of gravity by bouncing an outdoor ball on a well-worn carpet with a mildly wobbly hand holding a CBR2.  The gravitational constant at sea-level on Earth is about -9.807 m/sec^2.  That means, my students measurement error was about \frac{9.807-9.610}{9.807}=2.801%.  And 2.8% is not a bad measurement for a very unscientific setting!

CONCLUSION:

Whenever I teach second year algebra classes, I find it extremely valuable to have students gather real data whenever possible and with every new function, determine models to fit their data, and analyze the goodness of the model’s fit to the data.  In addition to these activities just being good mathematics explorations, I believe they do an excellent job exposing students to a few topics often underrepresented in many secondary math classes:  numerical representations and methods, experimentation, and introduction to statistics.  Hopefully some of the ideas shared here will inspire you to help your students experience more.

Confidence Intervals via graphs and CAS

Confidence intervals (CIs) are a challenging topic for many students, a task made more challenging, in my opinion, because many (most?) statistics texts approach CIs via z-scores.  While repeatedly calculating CI endpoints from standard deviations explains the underlying mathematical structure, it relies on an (admittedly simple) algebraic technique that predates classroom technology currently available for students on the AP Statistics exam.

Many (most?) statistics packages now include automatic CI commands.  Unfortunately for students just learning what a CI means, automatic commands can become computational “black boxes.”  Both CAS and graphing techniques offer a strong middle ground–enough foundation to reinforce what CIs mean with enough automation to avoid unnecessary symbol manipulation time.

In most cases, this is accomplished by understanding a normal cumulative distribution function (cdf) as a function, not just as an electronic substitute for normal probability tables of values.  In this post, I share two alternatives each for three approaches to determining CIs using a TI-Nspire CAS.

SAMPLE PROBLEM:

In 2010, the mean ACT mathematics score for all tests was 21.0 with standard deviation 5.3.  Determine a 90% confidence interval for the math ACT score of an individual chosen at random from all 2010 ACT test takers.

METHOD 1a — THE STANDARD APPROACH:

A 90% CI excludes the extreme 5% on each end of the normal distribution.  Using an inverse normal command gives the z-scores at the corresponding 5% and 95% locations on the normal cdf.

normCAS6

Of course, utilizing symmetry would have required only one command.  To find the actual boundary points of the CI, standardize the endpoints, x, and equate that to the two versions of the z-scores.

\displaystyle \frac{x-21.0}{5.3} = \pm 1.64485

Solving these rational equations for x gives x=12.28 and x=29.72, or CI = \left[ 12.28,29.72 \right] .

Most statistics software lets users avoid this computation with optional parameters for the mean and standard deviation of non-standard normal curves.  One of my students last year used this in the next variation.

METHOD 1b — INTRODUCING LISTS:

After using lists as shortcuts on our TI-Nspires last year for evaluating functions at several points simultaneously, one of my students creatively applied them to the inverse normal command, entering the separate 0.05 and 0.95 cdf probabilities as a single list.  I particularly like how the output for this approach outputs looks exactly like a CI.

CI4

METHOD 2a — CAS:

The endpoints of a CI are just endpoints of an interval on a normal cdf, so why not avoid the algebra and additional inverse normal command and determine the endpoints via CAS commands?  My students know the solve command from previous math classes, so after learning the normal cdf command, there are very few situations for them to even use the inverse.

CI1

This approach keeps my students connected to the normal cdf and solving for the bounds quickly gives the previous CI bounds.

METHOD 2b (Alas, not yet) — CAS and LISTS:

Currently, the numerical techniques the TI-Nspire family uses to solve equations with statistics commands don’t work well with lists in all situations.  Curiously, the Nspire currently can’t handle the solve+lists equivalent of the inverse normal+lists approach in METHOD 1b.

CI5

But, I’ve also learned that problems not easily solved in an Nspire CAS calculator window typically crack pretty easily when translated to their graphical equivalents.

METHOD 3a — GRAPHING:

This approach should work for any CAS or non-CAS graphing calculator or software with statistics commands.

Remember the “f” in cdf.  A cumulative distribution function is a function, and graphing calculators/software treats them as such.  Replacing the normCdf upper bounds with an x for standard graphing syntax lets one graph the normal cdf (below).

Also remember that any algebraic equation can be solved graphically by independently graphing each side of the equation and treating the resulting pair of equations as a system of equations.  In this case, graphing y=0.05 and y=0.95 and finding the points of intersection gives the values of the CI.

CI2

METHOD 3b — GRAPHING and LISTS:

SIDENOTE:  While lists didn’t work with the CAS in the case of METHOD 2b, the next screen shows the syntax to graph both ends of the CI using lists with a single endpoint equation.

CI3

The lists obviously weren’t necessary here, but the ability to use lists is a very convenient feature on the TI-Nspire that I’ve leveraged countless times to represent families of functions.  In my opinion, using them in METHOD 3b again leverages that same idea, that the endpoints you seek are different aspects of the same family–the CI.

CONCLUSION:

There are many ways for students in their first statistics courses to use what they already know to determine the endpoints of a confidence interval.  And keeping students attention focused on new ways to use old information solidifies both old and new content.  Eliminating unnecessary computations that aren’t the point of most of introductory statistics anyway is an added bonus.

Happy learning everyone…

Recentering a Normal Curve with CAS

Sometimes, knowing how to ask a question in a different way using appropriate tools can dramatically simplify a solution.  For context, I’ll use an AP Statistics question from the last decade about a fictitious railway.

THE QUESTION:

After two set-up questions, students were asked to compute how long to delay one train’s departure to create a very small chance of delay while waiting for a second train to arrive.  I’ll share an abbreviated version of the suggested solution before giving what I think is a much more elegant approach using the full power of CAS technology.

BACKGROUND:

Initially, students were told that X was the normally distributed time Train B took to travel to city C, and Y was the normally distributed time Train D took to travel to C.  The first question asked for the distribution of Y-X if the mean and standard deviation of X are respectively 170 and 20, and the mean and standard deviation of Y are 200 and 10, respectively.  Knowing how to transform normally distributed variables quickly gives that Y-X is normally distributed with mean 30 and standard deviation \sqrt{500}.

Due to damage to a part of the railroad, if Train B arrived at C before Train D, B would have to wait for D to clear the tracks before proceeding.  In part 2, you had to find the probability that B would wait for D.  Translating from English to math, if B arrives before D, then X \le Y.  So the probability of Train B waiting on Train D is equivalent to P(0 \le Y-X).  Using the distribution information in part 1 and a statistics command on my Nspire, this probability is

normCAS1

FIX THE DELAY:

Under the given conditions, there’s about a 91.0% chance that Train B will have to wait at C for Train D to clear the tracks.  Clearly, that’s not a good railway management situation, setting up the final question.  Paraphrasing,

How long should Train B be delayed so that its probability of being delayed is only 0.01?

FORMAL PROPOSED SOLUTION:

A delay in Train B says the mean arrival time of Train D, Y, will remain unchanged at 200, while the mean of arrival time of Train B, X, is increased by some unknown amount.  Call that new mean of X, \hat{X}=170+delay.  That makes the new mean of the difference in arrival times

Y - \hat{X} = 200-(170+delay) = 30-delay

As this is just a translation, the distribution of Y - \hat{X} is congruent to the distribution of Y-X, but recentered.  The standard deviation of both curves is \sqrt{500}.  You want to find the value of delay so that P \left(0 \le Y - \hat{X} \right) = 0.01.  That’s equivalent to knowing the location on the standard normal distribution where the area to the right is 0.01, or equivalently, the area to the left is 0.99.  One way that can be determined is with an inverse normal command.

normCAS2

 The proposed solution used z-scores to suggest finding the value of delay by solving

\displaystyle \frac{0-(30-delay)}{\sqrt{500}} = 2.32635

A little algebra gives delay=82.0187, so the railway should delay Train B just a hair over 82 minutes.

CAS-ENHANCED ALTERNATIVE SOLUTION:

From part 2, the initial conditions suggest Train B has a 91.0% chance of delay, and part 3 asks for the amount of recentering required to change that probability to 0.01.  Rephrasing this as a CAS command (using TI-Nspire syntax), that’s equivalent to solving

normCAS3

Notice that this is precisely the command used in part 2, re-expressed as an equation with a variable adjustment to the mean.  And since I’m using a CAS, I recognize the left side of this equation as a function of delay, making it something that can easily be “solved”.

normCAS4

Notice that I got exactly the same solution without the algebraic manipulation of a rational expression.

My big point here is not that use of a CAS simplifies the algebra (that wasn’t that hard in the first place), but rather that DEEP KNOWLEDGE of the mathematical situation allows one to rephrase a question in a way that enables incredibly efficient use of the technology.  CAS aren’t replacements for basic algebra skills, they are enhancements for mathematical thinking.

I DON”T HAVE CAS IN MY CLASSROOM.  NOW WHAT????

The CAS solve command is certainly nice, but many teachers and students don’t yet have CAS access, even though it is 100% legal for the PSAT, SAT, and all AP math exams.   But that’s OK.  If you recognize the normCdf command as a function, you can essentially use a graph menu to accomplish the same end.

Too often, I think teachers and students think of normCdf and invNorm commands as nothing more than glorified “lookup commands”–essentially nothing more than electronic versions of the probability tables they replaced.  But, when one of the parameters is missing, replacing it with X makes it graphable.  In fact, whenever you have an equation that is difficult (or impossible to solve), graph both sides and find the intersection, just like a solution to a system of equations.  Using this strategy, graphing y=normCdf(0,\infty,30-X,\sqrt{500}) and y=0.01 and finding the intersection gives the required solution.

normCAS5

CONCLUSION

Whether you can access a CAS or not, think more deeply about what questions ask and find creative alternatives to symbolic manipulations.

Graphing Ratios and Proportions

Last week, some colleagues and I were pondering the difficulties many middle school students have solving ratio and proportion problems.  Here are a few thoughts we developed to address this and what we think might be an uncommon graphical extension (for most) as a different way to solve.

For context, consider the equation \displaystyle \frac{x}{6} = \frac{3}{4}.

(UNFORTUNATE) STANDARD METHOD:

The default procedure most textbooks and students employ is cross-multiplication.   Using this, a student would get

\displaystyle 4x=18 \longrightarrow x = \frac{18}{4} = \frac{9}{2}

While this delivers a quick solution, we sadly noted that far too many students don’t really seem to know why the procedure works.  From my purist mathematical perspective, the cross-multiplication procedure may be an efficient algorithm, but cross-multiplication isn’t actually a mathematical function.  Cross-multiplication may be the result, but it isn’t what happens.

METHOD 2:

In every math class I teach at every grade level, my mantra is to memorize as little as possible and to use what you know as broadly as possible.  To avoid learning unnecessary, isolated procedures (like cross-multiplication), I propose “fraction-clearing”–multiplying both sides of an equation by common denominatoras a universal technique in any equation involving fractions.  As students’ mathematical and symbolic sophistication grows, fraction-clearing may occasionally yield to other techniques, but it is a solid, widely-applicable approach for developing algebraic thinking.

From the original equation, multiply both sides by common denominator, handle all of the divisions first, and clean up.  For our example, the common denominator 24 will do the trick.

\displaystyle 24 \cdot \frac{x}{6} = 24 \cdot \frac{3}{4}

4 \cdot x = 6 \cdot 3

\displaystyle x = \frac{9}{2}

Notice that the middle line is precisely the result of cross-multiplication.  Fraction-clearing is the procedure behind cross-multiplication and explains exactly why it works:  You have an equation and apply the same operation (in our case, multiplying by 24) to both sides.

As an aside, I’d help students see that multiplying by any common denominator would do the trick (for our example, 12, 24, 36, 48, … all work), but the least common denominator (12) produces the smallest products in line 2, potentially simplifying any remaining algebra.  Since many approaches work, I believe students should be free to use ANY common denominator they want.   Eventually, they’ll convince themselves that the LCD is just more efficient, but there’s absolutely no need to demand that of students from the outset.

METHOD 3:

Remember that every equation compares two expressions that have the same measure, size, value, whatever.  But fractions with differing denominators (like our given equation) are difficult to compare.  Rewrite the expressions with the same “units” (denominators) to simplify comparisons.

Fourths and sixths can both be rewritten in twelfths.  Then, since the two different expressions of twelfths are equivalent, their numerators must be equivalent, leading to our results from above.

\displaystyle \frac{2}{2} \cdot \frac{x}{6} = \frac{3}{3} \cdot \frac{3}{4}

\displaystyle \frac{2x}{12} = \frac {9}{12}

2x=9

\displaystyle x = \frac{9}{2}

I find this approach more appealing as the two fractions never actually interact.  Fewer moving pieces makes this approach feel much cleaner.

UNCOMMON(?) METHOD 4:  Graphing

A fundamental mathematics concept (for me) is the Rule of 4 from the calculus reform movement of the 1990s.  That is, mathematical ideas can be represented numerically, algebraically, graphically, and verbally.  [I’d extend this to a Rule of 5 to include computer/CAS representations, but that’s another post.]  If you have difficulty understanding an idea in one representation, try translating it into a different representation and you might gain additional insights, or even a solution.  At a minimum, the act of translating the idea deepens your understanding.

One problem many students have with ratios is that teachers almost exclusively teach them as an algebraic technique–just as I have done in the first three methods above.  In my conversation this week, I finally recognized this weakness and wondered how I could solve ratios using one of the missing Rules: graphically.  Since equivalent fractions could be seen as different representations of the slope of a line through the origin, I had my answer.

Students learning ratios and proportions may not seen slope yet and may or may not have seen an xy-coordinate grid, so I’d avoid initial use of any formal terminology.  I labeled my vertical axis “Top,” and the horizontal “Bottom”.  More formal names are fine, but unnecessary.  While I suspect most students might think “top” makes more sense for a vertical axis and “bottom” for the horizontal, it really doesn’t matter which axis receives which label.

In the purely numeric fraction in our given problem, \displaystyle \frac{x}{6} = \frac{3}{4}, “3” is on top, and “4” is on the bottom.  Put a point at the place where these two values meet.  Finally draw a line connecting your point and the origin.

ratio1

The other fraction has a “6” in the denominator.  Locate 6 on the “bottom axis”, trace to the line, and from there over to the “top axis” to find the top value of 4.5.

ratio2

  Admittedly, the 4.5 solution would have been a rough guess without the earlier solutions, but the graphical method would have given me a spectacular estimate.  If the graph grid was scaled by 0.5s instead of by 1s and the line was drawn very carefully, this graph could have given an exact answer.  In general, solutions with integer-valued unknowns should solve exactly, but very solid approximations would always result.

CONCLUSION:

Even before algebraic representations of lines are introduced, students can leverage the essence of that concept to answer proportion problems.  Serendipitously, the graphical approach also sets the stage for later discussions of the coordinate plane, slope, and linear functions.  I could also see using this approach as the cornerstone of future class conversations and discoveries leading to those generalizations.

I suspect that students who struggle with mathematical notation might find greater understanding with the graphical/visual approach.  Eventually, symbolic manipulation skills will be required, but there is no need for any teacher to expect early algebra learners to be instant masters of abstract notation.

Controlling graphs and a free online calculator

When graphing functions with multiple local features, I often find myself wanting to explain a portion of the graph’s behavior independent of the rest of the graph.  When I started teaching a couple decades ago, the processor on my TI-81 was slow enough that I could actually watch the pixels light up sequentially.  I could see HOW the graph was formed.  Today, processors obviously are much faster.  I love the problem-solving power that has given my students and me, but I’ve sometimes missed being able to see function graphs as they develop.

Below, I describe the origins of the graph control idea, how the control works, and then provide examples of polynomials with multiple roots, rational functions with multiple intercepts and/or vertical asymptotes, polar functions, parametric collision modeling, and graphing derivatives of given curves.

BACKGROUND:  A colleague and I were planning a rational function unit after school last week wanting to be able to create graphs in pieces so that we could discuss the effect of each local feature.  In the past, we “rigged” calculator images by graphing the functions parametrically and controlling the input values of t.  Clunky and static, but it gave us useful still shots.  Nice enough, but we really wanted something dynamic.  Because we had the use of sliders on our TI-nSpire software, on Geogebra, and on the Desmos calculator, the solution we sought was closer than we suspected.

REALIZATION & WHY IT WORKS: Last week, we discovered that we could use g(x)=\sqrt \frac{\left | x \right |}{x} to create what we wanted.  The argument of the root is 1 for x<0, making g(x)=1.  For x>0, the root’s argument is -1, making g(x)=i, a non-real number.  Our insight was that multiplying any function y=f(x) by an appropriate version of g wouldn’t change the output of f if the input to g is positive, but would make the product ungraphable due to complex values if the input to g is negative.

If I make a slider for parameter a, then g_2(x)=\sqrt \frac{\left | a-x \right |}{a-x} will have output 1 for all x<a.  That means for any function y=f(x) with real outputs only, y=f(x)\cdot g_2(x) will have real outputs (and a real graph) for x<a only.  Aha!  Using a slider and g_2 would allow me to control the appearance of my graph from left to right.

NOTE:  While it’s still developing, I’ve become a big fan of the free online Desmos calculator after a recent presentation at the Global Math Department (join our 45-60 minute online meetings every Tuesday at 9PM ET!).  I use Desmos for all of the following graphs in this post, but obviously any graphing software with slider capabilities would do.

EXAMPLE 1:  Graph y=(x+2)^3x^2(x-1), a 6th degree polynomial whose end behavior is up for \pm \infty, “wiggles” through the x-axis at -2, then bounces off the origin, and finally passes through the x-axis at 1.

Click here to access the Desmos graph that created the image above.  You can then manipulate the slider to watch the graph wiggle through, then bounce off, and finally pass through the x-axis.

EXAMPLE 2:  Graph y=\frac{(x+1)^2}{(x+2)(x-1)^2}, a 6th degree polynomial whose end behavior is up for \pm \infty, “wiggles” through the x-axis at -2, then bounces off the origin, and finally passes through the x-axis at 1.

Click here to access the Desmos graph above and control the creation of the rational function’s graph using a slider.

EXAMPLE 3:  I believe students understand polar graphing better when they see curves like the  limacon r=2+3cos(\theta ) moving between its maximum and minimum circles.  Controlling the slider also allows users to see the values of \theta at which the limacon crosses the pole. Here is the Desmos graph for the graph below.

EXAMPLE 4:  Object A leaves (2,3) and travels south at 0.29 units/second.  Object B leaves (-2,1) traveling east at 0.45 units/second.  The intersection of their paths is (2,1), but which object arrives there first?  Here is the live version.

OK, I know this is an overly simplistic example, but you’ll get the idea of how the controlling slider works on a parametrically-defined function.  The $latex \sqrt{\frac{\left | a-x \right |}{a-x}}$ term only needs to be on one of parametric equations.  Another benefit of the slider approach is the ease with which users can identify the value of t (or time) when each particle reaches the point of intersection or their axes intercepts.  Obviously those values could be algebraically determined in this problem, but that isn’t always true, and this graphical-numeric approach always gives an alternative to algebraic techniques when investigating parametric functions.

ASIDE 1–Notice the ease of the Desmos notation for parametric graphs.  Enter [r,s] where r is the x-component of the parametric function and s is the y-component.  To graph a point, leave r and s as constants.  Easy.

EXAMPLE 5:  When teaching calculus, I always ask my students to sketch graphs of the derivatives of functions given in graphical forms.  I always create these graphs one part at a time.  As an example, this graph shows y=x^3+2x^2 and allows you to get its derivative gradually using a slider.

ASIDE 2–It is also very easy to enter derivatives of functions in the Desmos calculator.  Type “d/dx” before the function name or definition, and the derivative is accomplished.  Desmos is not a CAS, so I’m sure the software is computing derivatives numerically.  No matter.  Derivatives are easy to define and use here.

I’m hoping you find this technology tip as useful as I do.