No surprises, given how much I love both mathematics and technology (and have among my heroes similarly-inclined women such as Professor Dame Celia Hoyles) that I have a sleep tracking app that lets me track my sleeping hours, wakefulness and movement. This is achieved via movement sensors in the phone, kept under the pillow. I originally used this as a way of allowing me to set an alarm in the morning that sounded at a time based on my most wakeful period – as it has a window rather than one specific time – and found the subsequent wake-up call to be much more pleasant. But now, after nearly four years’ worth of data, I have some delicious analysis just waiting for me to take a bite – will you join me?
My first question was a deceptively simple one: what is my mean number of hours’ sleep per night? When looking at the data to answer this question, something interesting happened: the weekly data shows as a bar chart, but becomes a line graph when much more data is added, like this:

Something to note is that the data is incomplete: this is easily explained by noting that I only record my sleep when I set an alarm, meaning lie-in days are invisible in terms of data points (there are also some inevitable glitches where data seems to be missing anyway). This is likely to skew the overall data pattern away from the really high amounts of sleep and towards the lower end, of course. There are also a few outliers visible on the first graph: around 13 hours and around 3 hours per night (!). The app helpfully provides some contextual information within which one can place one’s own sleep stats: you can see that my average over several years is pretty healthy (and that those in Saudi Arabia appear to sleep very little...)
So: is the second graph a valid representation of the data?
If you’re a mathematics teacher, you might be tempted to answer an instinctive ‘no’. I confess to being an occasional sufferer, in my teaching days, of ‘data terror’ – a condition caused by worry over whether you have represented data in a way that might well cause a seasoned statistician sneer at you. For example, the conventional wisdom/golden rule surrounding representing discrete data in its two broad categories (quantitative and qualitative) in two different ways on bar charts – and reserving histograms for continuous data – has often kept me awake at night. If it’s so straightforward to define these categories and logically deduce how we might represent them in graphs, why do we – pupils and teachers – get it ‘wrong’ so often?
What if the scale of the data we are considering determines whether the data can be perceived as discrete or continuous? Essentially, 'large' discrete sets can be mathematically manipulated as continuous; a choice of minimal (resolution) unit converts a continuum to a discrete set. Hugh Berkhardt puts it thus: ‘The urge to relate the numbers that represent continuous variables back to natural counting numbers via rationals reflects mathematical aesthetics, a type of elegance – getting as much as you can from minimum assumptions’. We could ask ourselves what would happen if we plotted every data point on a zigzagging line instead of smoothing out the curve – would this be ‘better’? Would the extra information be helpful, or simply too noisy to concentrate?
The question of whether humans perceive the world around them (time and space) as discrete (can only take certain values), continuous (can take a range of values), or a hybrid of both, has not yet been resolved (Herzog, Kammer & Scharnowski, 2016). Developments in how science can handle big data and complexity means that the pursuit of empirical fact can take us through alternating levels of discrete and continuous’, which can be unnerving if your previous idea of science involved ‘relentless simplification’ and new research methods are focusing on quantifying in a different way (Trefethen, 2012). This is a topic in which overconfidence and the illusion of understanding are common, and so it needs careful attention (Merenluoto & Lehtinen, 2004).
Maybe when looking at graphs we need to start asking what the purpose and focus is, rather than whether it is ‘right’ or ‘wrong’, an argument often made by proponents of statistical literacy. I can imagine these two graphs providing the stimulus for some excellent classroom practice and maths teacher CPD alike. Forensically examining the second graph in particular – trying to determine how the curve has been smoothed (moving averages?) and what effect that has on the pattern that emerges from any underlying variability.
Tweet us your thoughts @CambridgeMaths