This week during my statistics classes’ final review time for their term final, I had a small idea I wish I’d had years ago.
Early in the course, we talked about means and medians as centers of data and how these values were nearly identical in roughly symmetric data sets. However, when there are extreme or outlier values on only one side of the center, the more extreme-sensitive mean would be “pulled” in the direction of those outliers. In simple cases like this, the pull of the extreme values is said to “skew” the data in the direction of the extremes. Admittedly, in more complicated data sets with one long tail and one heavy tail, skew can be difficult to visualize, but in early statistics classes with appropriate warnings, I’ve found it sufficient to discuss basic skew in terms of the pull of extreme values on the mean.
Despite my efforts to help students understand the direction of skew relative to the tails of a data set, I’ve noticed that many still describe first the side where the “data piles up” before declaring the skew to be the opposite. For example, using a histogram (below) of data from my school last week where students were given a week to guess the number of Skittles in glass jar, several of my students continued to note that the data “piled up” on the left, so the skew was right, or positive.
SIDE NOTE: There were actually 838 Skittles in the jar. Clearly most of the students seriously underestimated the total with a few extremely hopeful outliers to the far right.
While my students can properly identify the right skew of this data set, I remained bothered by the mildly convoluted approach they persistently used to determine the skew. I absolutely understand why their eyes are drawn to the left-side pile up, but I wondered if there was another way I could visualize skew that might get them to look first in the skewed direction. That’s when I wondered about the possibility of describing the “stretchiness” of skew via boxplots. Nearly symmetric data have nearly symmetric box plots, but extreme or outlier values would notably pull the whiskers of the boxplot or appear as outlier dots on the ends.
If my first visualization of the Skittles data was with a boxplot (below) would that have made it any easier to see the extreme right-side pull on the Skittles data or would they see the box on the left and then declare right skew?
Is it possible the longer right whisker and several right-side outliers in this boxplot make it easier to see right skewness directly rather than as the opposite of the data’s left-side pile up side? It’ll be the start of class in Fall 2016 before I can try out this idea.
In the meantime, I wonder how others approach helping students see and understand skewness directly rather than as a consequence of something else. Ideas, anyone?