Well begun is half done

Two groups of colourful cut-out people divided by a dotted line

Recently as part of my research for the Cambridge Mathematics Framework I have been reading a lot about the median, and it has got me thinking about some of the things that I took for granted when teaching this statistical value.

You may be familiar with this basic approach to teaching median: start with a list of data or a stem and leaf diagram for which there is an odd number of data points. Cross off the largest and smallest values until you reach the middle value – BOOM! This is the median. Many teachers may do this using concrete objects, or lining the students up in some order such as height, but the basic premise is the same – you are hunting for a number which is THE MEDIAN. Once students have a handle on locating this specific value, a curveball is introduced: “What happens if there is an even number of data?”

At this point a conceptually tricky idea is thrown in: the number now being sought is the average (mean?) of the two central values, a number no longer in the dataset and perhaps reliant on the use of a different sort of average to find. We tend to gloss over the fact that the median is now in some sense a “virtual” thing, a value that does not exist in the data set, and may even be of the wrong data type, for example in the case of a discrete data set with a median falling between two consecutive values.

Part of the issue here is of our conception of the median, in which we tend to focus on the precise value rather than its implication in terms of the overall distribution, i.e. the point in the data set at which exactly half of the data is either side. We focus on the position of the measure rather than the pattern in the data, using its “halfway-ness” as a (the) method of identification, rather than a fundamental feature.

Why not approach median in a different way, from the point of view of distribution rather than centrality? Why not start with a data set with an even number of values and try to divide it into two halves? Can we describe the boundary using a single number? The fact that this number is now no longer in the data set is less problematic as it is simply a boundary point; the location of the line through which the data set can be cut exactly in half. It makes sense for this value to be equidistant from the two central data, but conceptually this is not important in context of the distribution itself.

Of course, the median is useful both as a proportional indicator of distribution and as a central value that can be used to identify signals in a noisy process, but to have a full conception of median, one aspect ought not to be privileged over the other. Perhaps by introducing median through a proportional lens early on, we could begin to link early conceptions of sharing and dividing with the median as a boundary for dividing a dataset.

SOMETHING TO TRY:

KS1: Ask students how to divide 4 identical objects into 2 equal piles.

What might they do if a fifth object was added? How can they make it fair? (If a student suggests leaving an object out, ask if adding another object would also be fair?) Can they suggest other numbers of objects that they know would be easy to divide fairly in this way? How do they know?

KS2: Investigate strategies for organising a class photo so that the tallest half of the students is in the back of 2 rows. Consider the decision of where the “extra” student stands in a class with an odd total – is it arbitrary?

Mark the heights of each student with a horizontal line on a wall and ask students to draw a line which divides this data exactly in two. Consider the pattern of the lines: are they close together?

KS3: Display the age in months of each student in both a dot plot and case-value plot (a plot of individual values such as the one shown below) and then ask students to draw a line through the data in each graph dividing it in half. What do they notice?

KS4: Ask students to suggest a type of data set in which they would expect the median to be a member of the set, and one in which they would expect the median not to be a member of the set.

Can they extend this to include the lower and upper quartile values? Why?

KS5: Collect a data set of number of letters in each A-level subject taken by the students in their class (e.g. a student might study Mathematics = 11, Theatre Studies = 14, and Art = 3).

Ask one student to imagine changing one of their subjects; how does this affect the 5 number summary for the whole class data set (the 5 key values shown on a box plot).

Is it possible to for all the students to change all of their subjects but leave the 5 number summary unchanged?

What is the minimum number of data values that must be changed to change all 5 numbers in the summary? Is this true for all data sets?