This week during my statistics classes’ final review time for their term final, I had a small idea I wish I’d had years ago.

Early in the course, we talked about means and medians as centers of data and how these values were nearly identical in roughly symmetric data sets. However, when there are extreme or outlier values on only one side of the center, the more extreme-sensitive mean would be “pulled” in the direction of those outliers. In simple cases like this, the pull of the extreme values is said to “skew” the data in the direction of the extremes. Admittedly, in more complicated data sets with one long tail and one heavy tail, skew can be difficult to visualize, but in early statistics classes with appropriate warnings, I’ve found it sufficient to discuss basic skew in terms of the pull of extreme values on the mean.

Despite my efforts to help students understand the direction of skew relative to the tails of a data set, I’ve noticed that many still describe first the side where the “data piles up” before declaring the skew to be the opposite. For example, using a histogram (below) of data from my school last week where students were given a week to guess the number of Skittles in glass jar, several of my students continued to note that the data “piled up” on the left, so the skew was right, or positive.

SIDE NOTE: There were actually 838 Skittles in the jar. Clearly most of the students seriously underestimated the total with a few extremely hopeful outliers to the far right.

While my students can properly identify the right skew of this data set, I remained bothered by the mildly convoluted approach they persistently used to determine the skew. I absolutely understand why their eyes are drawn to the left-side pile up, but I wondered if there was another way I could visualize skew that might get them to look first in the skewed direction. That’s when I wondered about the possibility of describing the “stretchiness” of skew via boxplots. Nearly symmetric data have nearly symmetric box plots, but extreme or outlier values would notably pull the whiskers of the boxplot or appear as outlier dots on the ends.

If my first visualization of the Skittles data was with a boxplot (below) would that have made it any easier to see the extreme right-side pull on the Skittles data or would they see the box on the left and then declare right skew?

Is it possible the longer right whisker and several right-side outliers in this boxplot make it easier to see right skewness directly rather than as the opposite of the data’s left-side pile up side? It’ll be the start of class in Fall 2016 before I can try out this idea.

In the meantime, I wonder how others approach helping students see and understand skewness directly rather than as a consequence of something else. Ideas, anyone?

It’s been my experience that there isn’t really any conceptual misunderstandings by the students (i.e. a few outlying large[small] values can pull a measure of central tendency deceptively upwards[downwards]), but rather it’s just a matter of mixing up the nomenclature. When I first introduce the notion of skewness, and for the next several weeks, I never simply say “this distribution is skewed right”; instead I say “the distribution is being skewed to the right[left] by these few large[small] values” (as I point to the tail of the pdf/histogram or end of the long whisker in a box plot). Of course, adding the phrase “by these large/small values” conveys no more information than just the skew direction (and by the end of the next unit I’ve abandoned the redundancy) , but I think it initially helps emphasize what is causing the skew and helps them keep straight the direction reference. Also, here’s a tip I picked up at the AP reading (by Daren Starnes himself, I think) which I have found useful: he jokes with the students that “skewing” sounds a bit like “skiing”, and crouches down mimicking ski poles in hands when discussing skewing/skiing. You can draw a little stick figure skier at the peak of the distribution and show that the downhill direction of the skiing = direction of the skewing. Whenever a student in class states an erroneous direction of a skew, I crouch down and say “which way am I skiing/skewing again?”