Statistical ways of seeing

A person wearing red glasses surrounded by multiple cut outs of eyes, with a red and blue striped background

Have you ever struggled with teaching statistics? Do you and your students share a sense of apprehension when data lessons appear on the scheme of work? You’re not alone. Anecdotally, many teachers tell me that statistics is one of the topics they like teaching the least, and I am no exception to this myself. In my mathematics degree I took the minimum number of statistics-related courses allowed following a very poor diet of data at school, and carried this negative association into my teaching. Looking back on my career in the classroom, I did not do a good job of teaching statistics, but having had the luxury of spending many years at Cambridge Mathematics immersed in research from excellent statistics teachers and education academics I now understand why!

So now of course, the question has been posed. Why is statistics hard to teach well? In part, I believe that it stems from viewing statistics through a mathematical lens – understandably, given that we are delivering it alongside quadratic equations, Pythagoras’ theorem, fractions, decimals and percentages. But while statistical analysis would not exist without the mathematical concepts and techniques underpinning it, we have a tendency within curricula to make the mathematical techniques the whole point, and reduce the statistical analysis part to an afterthought or an added extra. Students find the more subjective analysis hard, so it is tempting to make sure everyone can manage the techniques and then focus on the interpretation as something only the most able have time to spend on (although, there is always the additional temptation to move on to other, more properly ‘maths-y’ topics as soon as possible).

This approach is at odds with how education researchers suggest students should encounter statistical ideas. In the early 1990s, George Cobbⁱ and other researchers recommended that statistics should

emphasise statistical thinking,
include more real data,
encourage the exploration of genuine statistical problems, and
reduce emphasis on calculations and techniques.

Since then, much subsequent research has refined these recommendations to account for new technology tools and new ideas, but the core principles have remained the same. In much of my reading of education research, three ways of seeing or interacting with data keep appearing:

Data modelling – the idea that data can be used to create models of the world in order to pose and answer questions
Informal inference – the idea that data can be used to make predictions about something outside of the data itself with some attempt made to describe how likely the prediction is to be true
Exploratory data analysis – the idea that data can be explored, manipulated and represented to identify and make visible patterns and associations that can be interpreted

In the abstract, these ways of seeing, while distinct, have a degree of overlap and all students may benefit from multiple experiences of all three approaches to data work from their very earliest encounters with data through to advanced level study.

Imagine the following classroom activity that could be given to very young students (e.g., in primary school). A class of students is given a list of snacks and treats and the students are asked to rank them on a scale of one to five based on how much they like each item. How could this data be worked with through each of the three approaches?

Firstly, we will consider data modelling. Students could be asked to plan a class party with a limited budget. They can buy some but not all of the items listed, and must decide what they should buy so that the maximum number of students get to have things they like. In this activity, students must create a model from the data that identifies those things they should buy more of, and those things they should buy least of, along with how many of each thing they should get – perhaps considering these quantities proportionally. This activity uses the data as a model but inevitably requires some assumptions and the creation of some principles. Is the goal to ensure everyone gets the thing they like most? Or is it to minimise the inclusion of the things students like least? What if everyone gets their favourite thing except one student who gets nothing they like?

Secondly, we will think about this as an activity in informal inference. Imagine a new student is joining the class and the class wants to make a welcome pack of a few treats for this student, but they don’t know which treats the student likes. Can they use the data to decide which five items an unknown student is most likely to choose? What if they know some small details about the student; would that additional information allow them to decide based on ‘similar’ students in the class? While the second part of this activity must be handled with a degree of sensitivity, it is an excellent primer for how purchasing algorithms, which are common in online shops, work.

Finally, we turn to exploratory data analysis. In this approach students are encouraged to look for patterns in the data, perhaps by creating representations. This approach may come from asking questions – e.g., do students who like one type of chocolate snacks rate the other chocolate snacks highly too? Is a certain brand of snack popular with everyone in the class? What is the least popular snack? Alternatively, the analysis may generate questions from patterns that are spotted – e.g. why do students seem to rate a certain snack highly? What are the common characteristics of the three most popular snacks?

Each of these approaches could be engaged in as separate and isolated activities, but there is also the scope to combine them, and use the results of one approach to inform another. For example, exploratory data analysis may usefully contribute both to model building and inference making, and support students’ justifications for their decisions in those activities. Similarly, data modelling activities can be extended into inferential tasks very easily, simply by shifting the use of the model from the population of the data (e.g., the students in the class it was collected from) to some secondary population (e.g., another class in the school, or as in the example, a new student joining the class).

Looking back on my time in the classroom, I wish that my understanding of these approaches and their importance for developing statistical reasoning skills in my students had been better. While not made explicit as important in many curricula, there are ample opportunities to embed these approaches and make them a fundamental part of the statistics teacher’s pedagogy.

Do you currently use any of these approaches in your lessons? Can you see where you might use them in the future? And how might you adapt activities to allow your students opportunities to engage in data modelling, informal inference and exploratory data analysis?

Reference:

Cobb, George W. (1992). Teaching statistics. In L. A. Steen (Ed.), Heeding the call for change: Suggestions for curricular action (pp. 3–46). Mathematical Association of America.

Join the conversation: You can tweet us @CambridgeMaths or comment below.