Marking our words

Cartoon image of lower half of a woman's face with a speech bubble saying

One of the big challenges of statistics education is embedding statistical literacy alongside the development of skills and techniques. Whilst it is relatively straightforward to create an environment in which students acquire and practice specific skills, studies have shown that developing statistical literacy is a slow process that requires time and repeated exposure to situations involving contextualising data.

It can be time consuming to find sources of data and visualisations that can be accessed quickly, easily and often in order to allow students repeated opportunities to consider asking questions of data and making inferences.

One resource I have discovered recently is the “Google Books Ngram Viewer” (an n-gram is a sequence of n items from a text, speech or canon – when these are words, it is also called a shingle). This website produces time series visualisations of word combinations taken from Google Books, allowing users to create charts almost instantly of how word usage has changed over time.

Of course, moving through the different levels of data interpretation is extremely important to statistical literacy: most students can “read the data” by identifying specific values and obvious patterns, but fewer are able to “read within the data” by contextualising the information shown on the graph, or even “read beyond the data” by using the information in the graph to infer additional information not displayed.

After the initial flurry of excitement for throwing ever more esoteric words and phrases at the n-gram tracker had worn off, I began to consider how this tool could be used in a classroom setting to help students develop their statistical literacy skills. There could be a lot of mileage in simply getting students to shout out words that appeal to them (be careful!) and instantly creating the graph before asking them to describe the graphical narrative: how the word usage has changed over time. As a secure starting point this will allow students to become familiar with the elements of the graph, with a fixed x-axis covering around 200 years, and a variable y-axis that scales to produce a clear image of the variation occurring. During this period the location of key milestones such as the Second World War could be established so that students can start to spot the effect of these significant historic events on language. A rich vein of exploration once students are familiar with the tool could be to suggest words and get students to predict what the graph will look like before displaying it; for instance:

• Will usage of “thou” decline over the time period?

• Will “bare” become more popular after the year 2000? (a former colleague was accused by a student of “talking bare waffle cakes” - I still have no idea what this means!).

An interesting set of graphs concerns technology use. My immediate go-to was music formats and the interesting shapes they produce. Spare a thought for the minidisk whose popularity waxed and waned in an exceptionally short period of time…

This is an excellent opportunity for cross-curricular work, with discussions to be had about how language changes which could be explored further by collaborating with colleagues in other departments – particularly English and History.

This tool is a satisfying rabbit hole that, once familiar, could be returned to regularly in order to give students the space to develop their statistical literacy skills without needing to invest in time-consuming activities that require lots of contextual exposition or data production. Let us know if you use the n-gram explorer in class and show us the most interesting graphs you produce!