Since 6th April 2017, according to the Equality Act 2010 (Gender Pay Gap Information) Regulations, UK companies both private and public with more 250 employees have had to publish their gender pay gap information. Some 10,000 large firms based in the UK have now published their gender pay gaps, providing a rich resource for examining what this sort of data can tell us and how its visual representations might support both societal awareness and change for the better, as well as an excellent basis for a lesson or series of lessons exploring informal ideas of statistical inference.
So what do these graphs and averages actually tell us about pay equality?

This is one of my favourite graphs to consider because it raises an immediate question about the validity of the data if you make one crucial, yet simple assumption: distributions for large quantities of data tend to be smooth.
In this case, while we may be initially thankful (or saddened) that fully 8% of companies (claim to) have no gender pay gap at all, the graph casts immediate doubt on this value. The shape seems suspect. It doesn’t seem to fit. The grey bar that represents a 0% pay gap towers over the rest of the data. We could suggest reasons for this: is it an inglorious edifice built from the filings of companies who have most likely failed to report accurately – or who have perhaps invented creative methodologies that return a predefined value? Of course it’s impossible to tell. But informally inferring from this graph (by curve-smoothing by eye), perhaps only around 400-500 of more than 1000 companies reporting a 0% gap did so legitimately, assuming of course that the rest of the data is itself valid.
While the imperative to publish data on the gender pay gap is a recent phenomenon, the principle of equal pay for equal work has been enshrined in law since the Equal Pay Act 1970. If for the sake of argument we make the major (and perhaps wrong) assumption that most companies do stick to the spirit of the law, there must be something else going on to create a pay gap distribution heavily skewed in favour of male employees. One theory is that women are disproportionately over-represented in lower paid jobs, while men are more likely to make up a higher proportion of high paid jobs, and it’s easy to find an example of this in the data for individual companies.

In this large retail company (Company A) for example, the representation of women falls significantly by pay quartile, with a huge 35% difference in the proportion of men and women between the lowest and highest pay bands. Does this suggest a cultural problem in this company? It is possible to construct an argument that “more women enjoy the flexibility offered by low paid, part time work” or similar. It is impossible to tell from the graph above, but one possibility that might help us to dig deeper is to compare with data from a company in the same industry like Company B:

This company is a direct competitor of Company A, and is also large, but as you can see this time there is almost a 50:50 split between men and women at every pay quartile except at the very top. So is this good news? Our comparison seems to suggest that there is no structural reason within this industry for proportionally more women to do lower paid jobs in the bottom three pay quartiles, making the argument that Company A may do well to question their culture. But what about the top end, at boardroom level?
The data is fascinating, but is not enough on its own to explain the patterns above. Upon further investigation it turns out that Company B has recently expanded rapidly. Perhaps, if most of the roles created in this expansion have been in the lower three pay quartiles, we can make a possible conjecture as to the source of this pattern. Potentially, Company A relies on a steady pipeline of internal promotion that favours male members of staff, resulting in the decreasing representation of women at each level. Company B however may have hired large numbers of external candidates in a short space of time, using a more centrally managed process that expects gender bias to be mitigated against in line with modern employment law. There is no guarantee however that over time Company B would not begin to look like Company A, extending the boardroom level bias evident in the current data. There are of course multiple possible explanations for this data of which this is only one; a lovely question might be to consider how we might go about testing this conjecture.
Finally, a look at Company C, in the same sector:

This one is an instant conundrum. In a company where women are over-represented in every pay quartile there is still a significant gender pay gap. Why could this be? Fortunately, this company has published its own analysis beyond the officially reported figures and the results are illuminating:
Source: Marks and Spencer
By looking at the actual pay bands and comparing to the quartiles, this new representation of the data exposes some startling and previously hidden information. The actual number of people in the upper pay bands is tiny compared to the number in the lower pay band, which begs the question of whether it is at all appropriate to compare quartiles in this way here. While the upper quartile is 66% female, this likely covers pay bands D-H, with a significantly higher weighting of employees in band D; the next four bands show a compelling trend: more money, more males. Of the board members, 30% are female. The lower three quartiles largely cover the bottom three pay bands, where woman are more prevalent.
So what have we learnt? No single summary value or graphical representation will provide a complete picture, and the job of budding statisticians is to critically peruse multiple sources, statistics and graphs in order to create a plausible interpretation of the data available – which may be able to be tested for validity by looking at the changing picture over time. Whether the gender pay gap is generally present is not under question here; more important is the question about the data over time, and the inferred causal mechanisms that may be ticking away underneath the numbers.
How might we explore this sensitively in a maths lesson?