# Monthly Archives: December 2015

## Mistakes are Good

Confession #1:  My answers on my last post were WRONG.

I briefly thought about taking that post down, but discarded that idea when I thought about the reality that almost all published mathematics is polished, cleaned, and optimized.  Many students struggle with mathematics under the misconception that their first attempts at any topic should be as polished as what they read in published sources.

While not precisely from the same perspective, Dan Teague recently wrote an excellent, short piece of advice to new teachers on NCTM’s ‘blog entitled Demonstrating Competence by Making Mistakes.  I argue Dan’s advice actually applies to all teachers, so in the spirit of showing how to stick with a problem and not just walking away saying “I was wrong”, I’m going to keep my original post up, add an advisory note at the start about the error, and show below how I corrected my error.

Confession #2:  My approach was a much longer and far less elegant solution than the identical approaches offered by a comment by “P” on my last post and the solution offered on FiveThirtyEight.  Rather than just accepting the alternative solution, as too many students are wont to do, I acknowledged the more efficient approach of others before proceeding to find a way to get the answer through my initial idea.

I’ll also admit that I didn’t immediately see the simple approach to the answer and rushed my post in the time I had available to get it up before the answer went live on FiveThirtyEight.

GENERAL STRATEGY and GOALS:

1-Use a PDF:  The original FiveThirtyEight post asked for the expected time before the siblings simultaneously finished their tasks.  I interpreted this as expected value, and I knew how to compute the expected value of a pdf of a random variable.  All I needed was the potential wait times, t, and their corresponding probabilities.  My approach was solid, but a few of my computations were off.

2-Use Self-Similarity:  I don’t see many people employing the self-similarity tactic I used in my initial solution.  Resolving my initial solution would allow me to continue using what I consider a pretty elegant strategy for handling cumbersome infinite sums.

A CORRECTED SOLUTION:

Stage 1:  My table for the distribution of initial choices was correct, as were my conclusions about the probability and expected time if they chose the same initial app.

My first mistake was in my calculation of the expected time if they did not choose the same initial app.  The 20 numbers in blue above represent that sample space.  Notice that there are 8 times where one sibling chose a 5-minute app, leaving 6 other times where one sibling chose a 4-minute app while the other chose something shorter.  Similarly, there are 4 choices of an at most 3-minute app, and 2 choices of an at most 2-minute app.  So the expected length of time spent by the longer app if the same was not chosen for both is

$E(Round1) = \frac{1}{20}*(8*5+6*4+4*3+2*2)=4$ minutes,

a notably longer time than I initially reported.

For the initial app choice, there is a $\frac{1}{5}$ chance they choose the same app for an average time of 3 minutes, and a $\frac{4}{5}$ chance they choose different apps for an average time of 4 minutes.

Stage 2:  My biggest error was a rushed assumption that all of the entries I gave in the Round 2 table were equally likely.  That is clearly false as you can see from Table 1 above.  There are only two instances of a time difference of 4, while there are eight instances of a time difference of 1.  A correct solution using my approach needs to account for these varied probabilities.  Here is a revised version of Table 2 with these probabilities included.

Conveniently–as I had noted without full realization in my last post–the revised Table 2 still shows the distribution for the 2nd and all future potential rounds until the siblings finally align, including the probabilities.  This proved to be a critical feature of the problem.

Another oversight was not fully recognizing which events would contribute to increasing the time before parity.  The yellow highlighted cells in Table 2 are those for which the next app choice was longer than the current time difference, and any of these would increase the length of a trial.

I was initially correct in concluding there was a $\frac{1}{5}$ probability of the second app choice achieving a simultaneous finish and that this would not result in any additional total time.  I missed the fact that the six non-highlighted values also did not result in additional time and that there was a $\frac{1}{5}$ chance of this happening.

That leaves a $\frac{3}{5}$ chance of the trial time extending by selecting one of the highlighted events.  If that happens, the expected time the trial would continue is

$\displaystyle \frac{4*4+(4+3)*3+(4+3+2)*2+(4+3+2+1)*1}{4+(4+3)+(4+3+2)+(4+3+2+1)}=\frac{13}{6}$ minutes.

Iterating:  So now I recognized there were 3 potential outcomes at Stage 2–a $\frac{1}{5}$ chance of matching and ending, a $\frac{1}{5}$ chance of not matching but not adding time, and a $\frac{3}{5}$ chance of not matching and adding an average $\frac{13}{6}$ minutes.  Conveniently, the last two possibilities still combined to recreate perfectly the outcomes and probabilities of the original Stage 2, creating a self-similar, pseudo-fractal situation.  Here’s the revised flowchart for time.

Invoking the similarity, if there were T minutes remaining after arriving at Stage 2, then there was a $\frac{1}{5}$ chance of adding 0 minutes, a $\frac{1}{5}$ chance of remaining at T minutes, and a $\frac{3}{5}$ chance of adding $\frac{13}{6}$ minutes–that is being at $T+\frac{13}{6}$ minutes.  Equating all of this allows me to solve for T.

$T=\frac{1}{5}*0+\frac{1}{5}*T+\frac{3}{5}*\left( T+\frac{13}{6} \right) \longrightarrow T=6.5$ minutes

Time Solution:  As noted above, at the start, there was a $\frac{1}{5}$ chance of immediately matching with an average 3 minutes, and there was a $\frac{4}{5}$ chance of not matching while using an average 4 minutes.  I just showed that from this latter stage, one would expect to need to use an additional mean 6.5 minutes for the siblings to end simultaneously, for a mean total of 10.5 minutes.  That means the overall expected time spent is

Total Expected Time $=\frac{1}{5}*3 + \frac{4}{5}*10.5 = 9$ minutes.

Number of Rounds Solution:  My initial computation of the number of rounds was actually correct–despite the comment from “P” in my last post–but I think the explanation could have been clearer.  I’ll try again.

One round is obviously required for the first choice, and in the $\frac{4}{5}$ chance the siblings don’t match, let N be the average number of rounds remaining.  In Stage 2, there’s a $\frac{1}{5}$ chance the trial will end with the next choice, and a $\frac{4}{5}$ chance there will still be N rounds remaining.  This second situation is correct because both the no time added and time added possibilities combine to reset Table 2 with a combined probability of $\frac{4}{5}$.  As before, I invoke self-similarity to find N.

$N = \frac{1}{5}*1 + \frac{4}{5}*N \longrightarrow N=5$

Therefore, the expected number of rounds is $\frac{1}{5}*1 + \frac{4}{5}*5 = 4.2$ rounds.

It would be cool if someone could confirm this prediction by simulation.

CONCLUSION:

I corrected my work and found the exact solution proposed by others and simulated by Steve!   Even better, I have shown my approach works and, while notably less elegant, one could solve this expected value problem by invoking the definition of expected value.

Best of all, I learned from a mistake and didn’t give up on a problem.  Now that’s the real lesson I hope all of my students get.

Happy New Year, everyone!

## Great Probability Problems

UPDATE:  Unfortunately, there are a couple errors in my computations below that I found after this post went live.  In my next post, Mistakes are Good, I fix those errors and reflect on the process of learning from them.

ORIGINAL POST:

A post last week to the AP Statistics Teacher Community by David Bock alerted me to the new weekly Puzzler by Nate Silver’s new Web site, http://fivethirtyeight.com/.  As David noted, with their focus on probability, this new feature offers some great possibilities for AP Statistics probability and simulation.

I describe below FiveThirtyEight’s first three Puzzlers along with a potential solution to the last one.  If you’re searching for some great problems for your classes or challenges for some, try these out!

THE FIRST THREE PUZZLERS:

The first Puzzler asked a variation on a great engineering question:

You work for a tech firm developing the newest smartphone that supposedly can survive falls from great heights. Your firm wants to advertise the maximum height from which the phone can be dropped without breaking.

You are given two of the smartphones and access to a 100-story tower from which you can drop either phone from whatever story you want. If it doesn’t break when it falls, you can retrieve it and use it for future drops. But if it breaks, you don’t get a replacement phone.

Using the two phones, what is the minimum number of drops you need to ensure that you can determine exactly the highest story from which a dropped phone does not break? (Assume you know that it breaks when dropped from the very top.) What if, instead, the tower were 1,000 stories high?

The second Puzzler investigated random geyser eruptions:

You arrive at the beautiful Three Geysers National Park. You read a placard explaining that the three eponymous geysers — creatively named A, B and C — erupt at intervals of precisely two hours, four hours and six hours, respectively. However, you just got there, so you have no idea how the three eruptions are staggered. Assuming they each started erupting at some independently random point in history, what are the probabilities that A, B and C, respectively, will be the first to erupt after your arrival?

Both very cool problems with solutions on the FiveThirtyEight site.  The current Puzzler talked about siblings playing with new phone apps.

SOLVING THE CURRENT PUZZLER:

Before I started, I saw Nick Brown‘s interesting Tweet of his simulation.

If Nick’s correct, it looks like a mode of 5 minutes and an understandable right skew.  I approached the solution by first considering the distribution of initial random app choices.

There is a $\displaystyle \frac{5}{25}$ chance the siblings choose the same app and head to dinner after the first round.  The expected length of that round is $\frac{1}{5} \cdot \left( 1+2=3=4+5 \right) = 3$ minutes.

That means there is a $\displaystyle \frac{4}{5}$ chance different length apps are chosen with time differences between 1 and 4 minutes.  In the case of unequal apps, the average time spent before the shorter app finishes is $\frac{1}{25} \cdot \left( 8*1+6*2+4*3+2*4 \right) = 1.6$ minutes.

It doesn’t matter which sibling chose the shorter app.  That sibling chooses next with distribution as follows.

While the distributions are different, conveniently, there is still a time difference between 1 and 4 minutes when the total times aren’t equal.  That means the second table shows the distribution for the 2nd and all future potential rounds until the siblings finally align.  While this problem has the potential to extend for quite some time, this adds a nice pseudo-fractal self-similarity to the scenario.

As noted, there is a $\displaystyle \frac{4}{20}=\frac{1}{5}$ chance they complete their apps on any round after the first, and this would not add any additional time to the total as the sibling making the choice at this time would have initially chosen the shorter total app time(s).  Each round after the first will take an expected time of $\frac{1}{20} \cdot \left( 7*1+5*2+3*3+1*4 \right) = 1.5$ minutes.

The only remaining question is the expected number of rounds of app choices the siblings will take if they don’t align on their first choice.  This is where I invoked self-similarity.

In the initial choice there was a $\frac{4}{5}$ chance one sibling would take an average 1.6 minutes using a shorter app than the other.  From there, some unknown average N choices remain.  There is a $\frac{1}{5}$ chance the choosing sibling ends the experiment with no additional time, and a $\frac{4}{5}$ chance s/he takes an average 1.5 minutes to end up back at the Table 2 distribution, still needing an average N choices to finish the experiment (the pseudo-fractal self-similarity connection).  All of this is simulated in the flowchart below.

Recognizing the self-similarity allows me to solve for N.

$\displaystyle N = \frac{1}{5} \cdot 1 + \frac{4}{5} \cdot N \longrightarrow N=5$

Number of Rounds – Starting from the beginning, there is a $\frac{1}{5}$ chance of ending in 1 round and a $\frac{4}{5}$ chance of ending in an average 5 rounds, so the expected number of rounds of app choices before the siblings simultaneously end is

$\frac{1}{5} *1 + \frac{4}{5}*5=4.2$ rounds

Time until Eating – In the first choice, there is a $\frac{1}{5}$ chance of ending in 3 minutes.  If that doesn’t happen, there is a subsequent $\frac{1}{5}$ chance of ending with the second choice with no additional time.  If neither of those events happen, there will be 1.6 minutes on the first choice plus an average 5 more rounds, each taking an average 1.5 minutes, for a total average $1.6+5*1.5=9.1$ minutes.  So the total average time until both siblings finish simultaneously will be

$\frac{1}{5}*3+\frac{4}{5}*9.1 = 7.88$ minutes

CONCLUSION:

My 7.88 minute mean is reasonably to the right of Nick’s 5 minute mode shown above.  We’ll see tomorrow if I match the FiveThirtyEight solution.

Anyone else want to give it a go?  I’d love to hear other approaches.

Most of my thinking about teaching lately has been about the priceless, timeless value of process in problem solving over the ephemeral worth of answers.  While an answer to a problem puts a period at the end of a sentence, the beauty and worth of the sentence was the construction, word choice, and elegance employed in sharing the idea at the heart of the sentence.

Just as there are many ways to craft a sentence–from cumbersome plodding to poetic imagery–there are equally many ways to solve problems in mathematics.  Just as great writing reaches, explores, and surprises, great problem solving often starts with the solver not really knowing where the story will lead, taking different paths depending on the experience of the solver, and ending with even more questions.

I experienced that yesterday reading through tweets from one of my favorite middle and upper school problem sources, Five Triangles.  The valuable part of what follows is, in my opinion, the multiple paths I tried before settling on something productive.  My hope is that students learn the value in exploration, even when initially unproductive.

At the end of this post, I offer a few variations on the problem.

The Problem

Try this for yourself before reading further.  I’d LOVE to hear others’ approaches.

First Thoughts and Inherent Variability

My teaching career has been steeped in transformations, and I’ve been playing with origami lately, so my brain immediately translated the setup:

Fold vertex A of equilateral triangle ABC onto side BC.  Let segment DE be the resulting crease with endpoints on sides AB and AC with measurements as given above.

So DF is the folding image of AD and EF is the folding image of AE.  That is, ADFE is a kite and segment DE is a perpendicular bisector of (undrawn) segment AF.  That gave $\Delta ADE \cong \Delta FDE$ .

I also knew that there were lots of possible locations for point F, even though this set-up chose the specific orientation defined by BF=3.

Lovely, but what could I do with all of that?

Trigonometry Solution Eventually Leads to Simpler Insights

Because FD=7, I knew AD=7.  Combining this with the given DB=8 gave AB=15, so now I knew the side of the original equilateral triangle and could quickly compute its perimeter or area if needed.  Because BF=3, I got FC=12.

At this point, I had thoughts of employing Heron’s Formula to connect the side lengths of a triangle with its area.  I let AE=x, making EF=x and $EC=15-x$.  With all of the sides of $\Delta EFC$ defined, its perimeter was 27, and I could use Heron’s Formula to define its area:

$Area(\Delta EFC) = \sqrt{13.5(1.5)(13.5-x)(x-1.5)}$

But I didn’t know the exact area, so that was a dead end.

Since $\Delta ABC$ is equilateral, $m \angle C=60^{\circ}$ , I then thought about expressing the area using trigonometry.  With trig, the area of a triangle is half the product of any two sides multiplied by the sine of the contained angle.  That meant $Area(\Delta EFC) = \frac{1}{2} \cdot 12 \cdot (15-x) \cdot sin(60^{\circ}) = 3(15-x) \sqrt3$.

Now I had two expressions for the same area, so I could solve for x.

$3\sqrt{3}(15-x) = \sqrt{13.5(1.5)(13.5-x)(x-1.5)}$

Squaring both sides revealed a quadratic in x.  I could do this algebra, if necessary, but this was clearly a CAS moment.

I had two solutions, but this felt WAY too complicated.  Also, Five Triangles problems are generally accessible to middle school students.  The trigonometric form of a triangle’s area is not standard middle school fare.  There had to be an easier way.

A Quicker Ending

Thinking trig opened me up to angle measures.  If I let $m \angle CEF = \theta$, then $m \angle EFC = 120^{\circ}-\theta$, making $m \angle DFB = \theta$, and I suddenly had my simple breakthrough!  Because their angles were congruent, I knew $\Delta CEF \sim \Delta BFD$.

Because the triangles were similar, I could employ similarity ratios.

$\frac{7}{8}=\frac{x}{12}$
$x=10.5$

And that is one of the CAS solutions by a MUCH SIMPLER approach.

Extensions and Variations

Following are five variations on the original Five Triangles problem.  What other possible variations can you find?

1)  Why did the CAS give two solutions?  Because $\Delta BDF$ had all three sides explicitly given, by SSS there should be only one solution.  So is the 13.0714 solution real or extraneous?  Can you prove your claim?  If that solution is extraneous, identify the moment when the solution became “real”.

2)  Eliminating the initial condition that BF=3 gives another possibility.  Using only the remaining information, how long is $\overline{BF}$ ?

$\Delta BDF$ now has SSA information, making it an ambiguous case situation.  Let BF=x and invoke the Law of Cosines.

$7^2=x^2+8^2-2 \cdot x \cdot 8 cos(60^{\circ})$
$49=x^2-8x+64$
$0=(x-3)(x-5)$

Giving the original BF=3 solution and a second possible answer:  BF=5.

3)  You could also stay with the original problem asking for AE.

From above, the solution for BF=3 is AE=10.5.  But if BF=5 from the ambiguous case, then FC=10 and the similarity ratio above becomes

$\frac{7}{8}=\frac{x}{10}$
$x=\frac{35}{4}=8.75$

4)  Under what conditions is $\overline{DE} \parallel \overline{BC}$ ?

5)  Consider all possible locations of folding point A onto $\overline{BC}$.  What are all possible lengths of $\overline{DE}$?

## How One Data Point Destroyed a Study

Statistics are powerful tools.  Well-implemented, they tease out underlying patterns from the noise of raw data and improve our understanding.  But statistics must take care to avoid misstatements.   Unfortunately, statistics can also deliberately distort relationships, declaring patterns where none exist.  In my AP Statistics classes, I hope my students learn to extract meaning from well-designed studies, and to spot instances of Benjamin Disraeli’s “three kinds of lies:  lies, damned lies, and statistics.”

This post explores part of a study published August 12, 2015, exposing what I believe to be examples of four critical ways statistics are misunderstood and misused:

• Not recognizing the distortion power of outliers in means, standard deviations, and in the case of the study below, regressions.
• Distorting graphs to create the impression of patterns different from what actually exists,
• Cherry-picking data to show only favorable results, and
• Misunderstanding the p-value in inferential studies.

THE STUDY:

I was searching online for examples of research I could use with my AP Statistics classes when I found on the page of a math teacher organization a link to an article entitled, “Cardiorespiratory fitness linked to thinner gray matter and better math skills in kids.”  Following the URL trail, I found a description of the referenced article in an August, 2015 summary article by Science Daily and the actual research posted on August 12, 2015 by the journal, PLOS ONE.

As a middle and high school teacher, I’ve read multiple studies connecting physical fitness to brain health.  I was sure I had hit paydirt with an article offering multiple, valuable lessons for my students!  I read the claims of the Science Daily research summary correlating the physical fitness of 9- and 10-year-old children to performance on a test of arithmetic.  It was careful not to declare cause-and-effect,  but did say

The team found differences in math skills and cortical brain structure between the higher-fit and lower-fit children. In particular, thinner gray matter corresponded to better math performance in the higher-fit kids. No significant fitness-associated differences in reading or spelling aptitude were detected. (source)

The researchers described plausible connections for the aerobic fitness of children and the thickness of cortical gray matter for each participating child.  The study went astray when they attempted to connect their findings to the academic performance of the participants.

Independent t-tests were employed to compare WRAT-3 scores in higher fit and lower fit children. Pearson correlations were also conducted to determine associations between cortical thickness and academic achievement. The alpha level for all tests was set at p < .05. (source)

All of the remaining images, quotes, and data in this post are pulled directly from the primary article on PLOS ONE.  The URLs are provided above with bibliographic references are at the end.

To address questions raised by the study, I had to access the original data and recreate the researchers’ analyses.  Thankfully, PLOS ONE is an open-access journal, and I was able to download the research data.  In case you want to review the data yourself or use it with your classes, here is the original SPSS file which I converted into Excel and TI-Nspire CAS formats.

My suspicions were piqued when I saw the following two graphs–the only scatterplots offered in their research publication.

Scatterplot 1:  Attempt to connect Anterior Frontal Gray Matter thickness with WRAT-3 Arithmetic performance

The right side of the top scatterplot looked like an uncorrelated cloud of data with one data point on the far left seeming to pull the left side of the linear regression upwards, creating a more negative slope.  Because the study reported only two statistically significant correlations between the WRAT tests and cortical thickness in two areas of the brain, I was now concerned that the single extreme data point may have distorted results.

My initial scatterplot (below) confirmed the published graph, but fit to the the entire window, the data now looked even less correlated.

In this scale, the farthest left data point (WRAT Arithmetic score = 66, Anterior Frontal thickness = 3.9) looked much more like an outlier.  I confirmed that the point exceeded 1.5IQRs below the lower quartile, as indicated visually in a boxplot of the WRAT-Arithmetic scores.

Also note from my rescaled scatterplot that the Anterior Frontal measure (y-coordinate) was higher than any of the next five ordered pairs to its right.  Its horizontal outlier location, coupled with its notably higher vertical component, suggested that the single point could have significant influence on any regression on the data.  There was sufficient evidence for me to investigate the study results excluding the (66, 3.9) data point.

The original linear regression on the 48 (WRAT Arithmetic, AF thickness) data was $AF=-0.007817(WRAT_A)+4.350$.  Excluding (66, 3.9), the new scatterplot above shows the revised linear regression on the remaining 47 points:  $AF=-0.007460(WRAT_A)+4.308$.  This and the original equation are close, but the revised slope is 4.6% smaller in magnitude relative to the published result. With the two published results reported significant at p=0.04, the influence of the outlier (66, 3.9) has a reasonable possibility of changing the study results.

Scatterplot 2:  Attempt to connect Superior Frontal Gray Matter thickness with WRAT-3 Arithmetic performance

The tightly compressed scale of the second published scatterplot made me deeply suspicious the (WRAT Arithmetic, Superior Frontal thickness) data was being vertically compressed to create the illusion of a linear relationship where one possibly did not exist.

Rescaling the the graphing window (below) made those appear notably less linear than the publication implied.  Also, the data point corresponding to the WRAT-Arithmetic score of 66 appeared to suffer from the same outlier-influences as the first data set.  It was still an outlier, but now its vertical component was higher than the next eight data points to its right, with some of them notably lower.  Again, there was sufficient evidence to investigate results excluding the outlier data point.

The linear regression on the original 48 (WRAT Arithmetic, SF thickness) data points was $SF=-0.002767(WRAT_A)+4.113$ (above).  Excluding the outlier , the new scatterplot (below) had revised linear regression, $SF=-0.002391(WRAT_A)+4.069$.  This time, the revised slope was 13.6% smaller in magnitude relative to the original slope.  With the published significance also at p=0.04, omitting the outlier was almost certain to change the published results.

THE OUTLIER BROKE THE STUDY

The findings above strongly suggest the published study results are not as reliable as reported.  It is time to rerun the significance tests.

For the first data set–(WRAT Arithmetic, AF thickness) —run an independent t-test on the regression slope with and without the outlier.

• INCLUDING OUTLIER:  For all 48 samples, the researchers reported a slope of -0.007817, $r=-0.292$, and $p=0.04$.  This was reported as a significant result.
• EXCLUDING OUTLIER:  For the remaining 47 samples, the slope is -0.007460, r=-0.252, and p=0.087.  The r confirms the visual impression that the data was less linear and, most importantly, the correlation is no longer significant at $\alpha <0.05$.

For the second data set–(WRAT Arithmetic, SF thickness):

• INCLUDING OUTLIER:  For all 48 samples, the researchers reported a slope of -0.002767, r=-0.291, and p=0.04.  This was reported as a significant result.
• EXCLUDING OUTLIER:  For the remaining 47 samples, the slope is -0.002391, r=-0.229, and p=0.121.  This revision is even less linear and, most importantly, the correlation is no longer significant for any standard significance level.

In brief, the researchers’ arguable decision to include the single, clear outlier data point was the source of any significant results at all.  Whatever correlation exists between gray matter thickness and WRAT-Arithmetic as measured by this study is tenuous, at best, and almost certainly not significant.

THE DANGERS OF CHERRY-PICKING RESULTS:

So, let’s set aside the entire questionable decision to keep an outlier in the data set to achieve significant findings.  There is still a subtle, potential problem with this study’s result that actually impacts many published studies.

The researchers understandably were seeking connections between the thickness of a brain’s gray matter and the academic performance of that brain as measured by various WRAT instruments.  They computed independent t-tests of linear regression slopes between thickness measures at nine different locations in the brain against three WRAT test measures for a total of 27 separate t-tests.  The next table shows the correlation coefficient and p-value from each test.

This approach is commonly used with researchers reporting out only the tests found to be significant.  But in doing so, the researchers may have overlooked a fundamental property of the confidence intervals that underlie p-values.  Using the typical critical value of p=0.05 uses a 95% confidence interval, and one interpretation of a 95% confidence interval is that under the conditions of the assumed null hypothesis, results that occur in most extreme 5% of outcomes will NOT be considered as resulting from the null hypothesis, even though they are.

In other words, even under they typical conditions for which the null hypothesis is true, 5% of correct results would be deemed different enough to be statistically significant–a Type I Error.  Within this study, this defines a binomial probability situation with 27 trials for which the probability of any one trial producing a significant result even though the null hypothesis is correct, is p=0.05.

The binomial probability of finding exactly 2 significant results at p=0.05 over 27 trials is 0.243, and the probability of producing 2 or more significant results when the null hypothesis is true is 39.4%.

That means there is a 39.4% probability in any study testing 27 trials at a p<0.05 critical value that at least 2 of those trials would report a result that would INCORRECTLY be interpreted as contradicting the null hypothesis.  And if more conditions than 27 are tested, the probability of a Type I Error is even higher.

Whenever you have a large number of inference trials, there is an increasingly large probability that at least some of the “significant” trials are actually just random, undetected occurrences of the null hypothesis.

It just happens.

THE ELUSIVE MEANING OF A p-VALUE:

For more on the difficulty of understanding p-values, check out this nice recent article on FiveThirtyEight Science–Not Even Scientists Can Easily Explain P-Values.

CONCLUSION:

Personally, I’m a little disappointed that this study didn’t find significant results.  There are many recent studies showing the connection between physical activity and brain health, but this study didn’t achieve its goal of finding a biological source to explain the correlation.

It is the responsibility of researchers to know their studies and their resulting data sets.  Not finding significant results is not a problem.  But I do expect research to disclaim when its significant results hang entirely on a choice to retain an outlier in its data set.

REFERENCES:

Chaddock-Heyman L, Erickson KI, Kienzler C, King M, Pontifex MB, Raine LB, et al. (2015) The Role of Aerobic Fitness in Cortical Thickness and Mathematics Achievement in Preadolescent Children. PLoS ONE 10(8): e0134115. doi:10.1371/journal.pone.0134115

University of Illinois at Urbana-Champaign. “Cardiorespiratory fitness linked to thinner gray matter and better math skills in kids.” ScienceDaily. http://www.sciencedaily.com/releases/2015/08/150812151229.htm (accessed December 8, 2015).

## Best Algebra 2 Lab Ever

This post shares what I think is one of the best, inclusive, data-oriented labs for a second year algebra class.  This single experiment produces linear, quadratic, and exponential (and logarithmic) data from a lab my Algebra 2 students completed this past summer.  In that class, I assigned frequent labs where students gathered real data, determined models to fit that data, and analyzed goodness of the models’ fit to the data.   I believe in the importance of doing so much more than just writing an equation and moving on.

For kicks, I’ll derive an approximation for the coefficient of gravity at the end.

THE LAB:

On the way to school one morning last summer, I grabbed one of my daughters’ “almost fully inflated” kickballs and attached a TI CBR2 to my laptop and gathered (distance, time) data from bouncing the ball under the Motion Sensor.  NOTE:  TI’s CBR2 can connect directly to their Nspire and TI84 families of graphing calculators.  I typically use computer-based Nspire CAS software, so I connected the CBR via my laptop’s USB port.  It’s crazy easy to use.

One student held the CBR2 about 1.5-2 meters above the ground while another held the ball steady about 20 cm below the CBR2 sensor.  When the second student released the ball, a third clicked a button on my laptop to gather the data:  time every 0.05 seconds and height from the ground.  The graphed data is shown below.  In case you don’t have access to a CBR or other data gathering devices, I’ve uploaded my students’ data in this Excel file.

Remember, this is data was collected under far-from-ideal conditions.  I picked up a kickball my kids left outside on my way to class.  The sensor was handheld and likely wobbled some, and the ball was dropped on the well-worn carpet of our classroom floor.  It is also likely the ball did not remain perfectly under the sensor the entire time.  Even so, my students created a very pretty graph on their first try.

For further context, we did this lab in the middle of our quadratics unit that was preceded by a unit on linear functions and another on exponential and logarithmic functions.  So what can we learn from the bouncing ball data?

LINEAR 1:

While it is very unlikely that any of the recorded data points were precisely at maximums, they are close enough to create a nice linear pattern.

As the height of a ball above the ground helps determine the height of its next bounce (height before –> energy on impact –> height after), the eight ordered pairs (max height #n, max height #(n+1) ) from my students’ data are shown below

This looks very linear.  Fitting a linear regression and analyzing the residuals gives the following.

The data seems to be close to the line, and the residuals are relatively small, about evenly distributed above and below the line, and there is no apparent pattern to their distribution.  This confirms that the regression equation, $y=0.673x+0.000233$, is a good fit for the = height before bounce and = height after bounce data.

NOTE:  You could reasonably easily gather this data sans any technology.  Have teams of students release a ball from different measured heights while others carefully identify the rebound heights.

The coefficients also have meaning.  The 0.673 suggests that after each bounce, the ball rebounded to 67.3%, or 2/3, of its previous height–not bad for a ball plucked from a driveway that morning.  Also, the y-intercept, 0.000233, is essentially zero, suggesting that a ball released 0 meters from the ground would rebound to basically 0 meters above the ground.  That this isn’t exactly zero is a small measure of error in the experiment.

EXPONENTIAL:

Using the same idea, consider data of the form (x,y) = (bounce number, bounce height). the graph of the nine points from my students’ data is:

This could be power or exponential data–something you should confirm for yourself–but an exponential regression and its residuals show

While something of a pattern seems to exist, the other residual criteria are met, making the exponential regression a reasonably good model: $y = 0.972 \cdot (0.676)^x$.  That means bounce number 0, the initial release height from which the downward movement on the far left of the initial scatterplot can be seen, is 0.972 meters, and the constant multiplier is about 0.676.  This second number represents the percentage of height maintained from each previous bounce, and is therefore the percentage rebound.  Also note that this is essentially the same value as the slope from the previous linear example, confirming that the ball we used basically maintained slightly more than 2/3 of its height from one bounce to the next.

And you can get logarithms from these data if you use the equation to determine, for example, which bounces exceed 0.2 meters.

So, bounces 1-4 satisfy the requirement for exceeding 0.20 meters, as confirmed by the data.

A second way to invoke logarithms is to reverse the data.  Graphing x=height and y=bounce number will also produce the desired effect.

Each individual bounce looks like an inverted parabola.  If you remember a little physics, the moment after the ball leaves the ground after each bounce, it is essentially in free-fall, a situation defined by quadratic movement if you ignore air resistance–something we can safely assume given the very short duration of each bounce.

I had eight complete bounces I could use, but chose the first to have as many data points as possible to model.  As it was impossible to know whether the lowest point on each end of any data set came from the ball moving up or down, I omitted the first and last point in each set.  Using (x,y) = (time, height of first bounce) data, my students got:

What a pretty parabola.  Fitting a quadratic regression (or manually fitting one, if that’s more appropriate for your classes), I get:

Again, there’s maybe a slight pattern, but all but two points are will withing  0.1 of 1% of the model and are 1/2 above and 1/2 below.  The model, $y=-4.84x^2+4.60x-4.24$, could be interpreted in terms of the physics formula for an object in free fall, but I’ll postpone that for a moment.

LINEAR 2:

If your second year algebra class has explored common differences, your students could explore second common differences to confirm the quadratic nature of the data.  Other than the first two differences (far right column below), the second common difference of all data points is roughly 0.024.  This raises suspicions that my student’s hand holding the CBR2 may have wiggled during the data collection.

Since the second common differences are roughly constant, the original data must have been quadratic, and the first common differences linear. As a small variation for each consecutive pair of (time, height) points, I had my students graph (x,y) = (x midpoint, slope between two points):

If you get the common difference discussion, the linearity of this graph is not surprising.  Despite those conversations, most of my students seem completely surprised by this pattern emerging from the quadratic data.  I guess they didn’t really “get” what common differences–or the closely related slope–meant until this point.

Other than the first three points, the model seems very strong.  The coefficients tell an even more interesting story.

GRAVITY:

The equation from the last linear regression is $y=4.55-9.61x$.  Since the data came from slope, the y-intercept, 4.55, is measured in m/sec.  That makes it the velocity of the ball at the moment (t=0) the ball left the ground.  Nice.

The slope of this line is -9.61.  As this is a slope, its units are the y-units over the x-units, or (m/sec)/(sec).  That is, meters per squared second.  And those are the units for gravity!  That means my students measured, hidden within their data, an approximation for coefficient of gravity by bouncing an outdoor ball on a well-worn carpet with a mildly wobbly hand holding a CBR2.  The gravitational constant at sea-level on Earth is about -9.807 m/sec^2.  That means, my students measurement error was about $\frac{9.807-9.610}{9.807}=2.801%$.  And 2.8% is not a bad measurement for a very unscientific setting!

CONCLUSION:

Whenever I teach second year algebra classes, I find it extremely valuable to have students gather real data whenever possible and with every new function, determine models to fit their data, and analyze the goodness of the model’s fit to the data.  In addition to these activities just being good mathematics explorations, I believe they do an excellent job exposing students to a few topics often underrepresented in many secondary math classes:  numerical representations and methods, experimentation, and introduction to statistics.  Hopefully some of the ideas shared here will inspire you to help your students experience more.

## Recentering Normal Curves, revisited

I wrote here about using a CAS to determine a the new mean of a recentered normal curve from an AP Statistics exam question from the last decade.  My initial post shared my ideas on using CAS technology to determine the new center.  After hearing some of my students’ attempts to solve the problem, I believe they took a simpler, more intuitive approach than I had proposed.

REVISITING:

In the first part of the problem, solvers found the mean and standard deviation of the wait time of one train: $\mu = 30$ and $\sigma = \sqrt{500}$, respectively.  Then, students computed the probability of waiting to be 0.910144.

The final part of the question asked how long that train would have to be delayed to make that wait time 0.01.  Here’s where my solution diverged from my students’ approach.  Being comfortable with transformations, I thought of the solution as the original time less some unknown delay which was easily solved on our CAS.

STUDENT VARIATION:

Instead of thinking of the delay–the explicit goal of the AP question–my students  sought the new starting time.  Now that I’ve thought more about it, knowing the new time when the train will leave does seem like a more natural question and avoids the more awkward expression I used for the center.

The setup is the same, but now the new unknown variable, the center of the translated normal curve, is newtime.  Using their CAS solve command, they found

It was a little different to think about negative time, but they found the difference between the new time difference (-52.0187 minutes) and the original (30 minutes) to be 82.0187 minutes, the same solution I discovered using transformations.

CONCLUSION:

This is nothing revolutionary, but my students’ thought processes were cleaner than mine.  And fresh thinking is always worth celebrating.