Directing attention to long division

A large division symbol in red, over a white background featuring a range of mathematic numbers

One trend in artificial intelligence has been the use of imitation learning – learning how to do something just by “watching,” without being explicitly told what to do. Recently, someone taught a computer to do long division using a similar approach, by demonstrating many examples of solving long division problems using a standard long division algorithm,¹ and asking the computer to do the same. Now, to me this sounded a bit like reinventing the wheel, or maybe the Babbage engine. But then a mischievous part of my brain piped up: “Well, was any it better at it than us?” And my journey down the rabbit hole began.

When we read about studies on humans who are learning to solve long division problems using standard algorithms, researchers often describe categories of errors. They may look at how these arise or be resolved in order to propose what students need to do and understand to help them carry out long division successfully, in a way that benefits their conceptual development as well as their skill in executing the algorithm.

For example, one study² divides these into two types of errors. Systematic errors are described as those where students consistently and accurately follow a procedure which can’t yield a good answer, either because they chose the wrong procedure for the context or because the procedure was learned incorrectly. Slips are those errors where students are attempting to follow a procedure which could be successful but have made a calculation or positioning error by accident. That is, students can either apply the wrong rule (systematic error) or make a mistake and end up not applying the rule as they intended to (slip).

Systematic errors in long division are more than worthy of their own blog post/article/book. However, systematic errors aside, the sheer amount of complexity students must contend with when learning to do long division leaves a lot of room for slips.³ In long division, the longer the numbers involved, the longer the series of steps and positions on the page to keep track of, and the more opportunities there are to go astray. Room for these slips increases with the number of steps involved. Boucheny and Guerinet^{cited in 3} propose four categories of division, numbered in order from least to most complex in terms of the number of steps required.

Decreasing % correct responses to long division problems with increasing complexity³

Category	Single digit	Multiple digits	% Correct responses
1.	Both divisor and quotient	N/A	80%, SD=8
2.	Divisor	Quotient	67%, SD=5
3.	Quotient	Divisor	54%, SD=4
4.	N/A	Both divisor and quotient	52%, SD=15

Camos and Baumer³ created long division problems in each of these categories and analysed the responses of 56 students in France who were aged 10–11. The researchers characterised each of the long division problems according to the number of processing steps (multiplication, addition, or a mix) involved for each strategy. They also calculated what they called the spatial load of each problem; that is, “the number of digits that must be spatially arranged during its solving.”^3(p4) They then recorded the percentage of students which answered each problem correctly; separately they found that attention capacity and knowledge of multiplication facts were significant factors in performance, but regardless, even when grouped by these abilities students were found to have answered more problems correctly the fewer processing steps there were.

Here’s an example for each category:

% correct responses to the most complex long division problem in each category

Category	problem	Steps by repeated addition/subtraction	Spatial load	% correct
1	59 ÷ 8	29	3	70
2	7869 ÷ 4	65	14	68
3	699 ÷ 72	44	5	50
4	7993 ÷ 27	82	15	21

Here’s what they found:^3(p5)

A graph showing % correct responses per problem

A graph showing Processing steps (mixed strategy) per problem

A graph showing Spatial load for each problem

The likelihood of slips appears to scale with aspects of the complexity of the long division problem – of course, you might say! But it’s amazing how many steps and locations there really are to deal with.

82 processing steps and 15 digit placements seemed to be enough to give most students in this study a good chance of slipping up, even when many of them could apply the algorithm correctly to simpler problems.

To err is human, right? So what about a computer? We expect computers to possess unwavering attention and be reliable at avoiding slips in their calculations. However, this is because we have traditionally told them to calculate in ways which take advantage of their capacity for accuracy and consistency. Machine learning, however, can be considered a modelling process; the computer is expected to learn a relationship between inputs and outputs based on patterns in its training data. This allows computers to achieve new capabilities. For example, although it’s very difficult to write an explicit set of instructions that will help a computer to tell the difference between a picture of a cat and a picture of a dog, an appropriate machine learning algorithm trained on many pictures of cats labelled “cat” and dogs labelled “dog” can figure out patterns well enough to label future images accurately.

Machine learning techniques also take advantage of traditional computing power in order to process the large number of examples which many models are trained on. However, the models derived from such training can be inaccurate. Sometimes a little…

A lit pumpkin with a red box surrounding it, notifying that is has been recognised as a Vase

…like this “vase.”

Or sometimes a lot…!

Four photos, two are dogs and the other two are muffins

Used under Creative Commons licences. Clockwise from top left: CCO 1.0; © Famartin, CC BY-SA 4.0; © Steven Shigeo Yamada, CC BY 2.0; © Roozitaa, CC BY-SA 4.0. (All images cropped.)

Back in 2017, “chihuahua or blueberry muffin” was a popular, if apocryphal, example that was widely used to illustrate the idea that machine learning models can have difficulty distinguishing between images with similar features.⁴ This sometimes happens because the features they might be paying the most attention to are not necessarily the best ones for the job, but they don’t have enough experience, or the right kind of experience, to know that.

So what does this have to do with long division? In 2019, Saxton et al. trained a model on 2 million examples which was able to answer all place value questions correctly, nearly 100% of addition and subtraction problems, around 85% of simple division problems, and around 75% of multiplication problems.⁵ However, it didn’t do as well in tasks which mixed those forms of arithmetic; for long division with remainder, it dropped to less than 40% accuracy. The authors attributed this type of discrepancy to the need for intermediate calculations; it seemed the model wasn’t learning to do those steps.

A couple of years later, Recchia used a different approach to train a similar model to solve long division problems with > 80% accuracy using just 200 training examples.¹ He did this by training the model on examples which were accompanied by demonstrations which directed its attention to a series of steps being carried out to produce the right answer, with the result that it was able to learn the standard algorithm and carry out the intermediate calculations successfully (most of the time).

In this example, the model has solved the problem correctly and carried out repeated addition as an intermediate calculation on the right, with subtraction occurring in the proper places and with the proper results in the main algorithm on the left:^{1(used with permission)}

However, even the most successful version of this model still makes mistakes almost 20% of the time. Success boils down to (a) having been exposed to enough examples of all necessary calculation steps (addition and subtraction) to carry them out, (b) being able to look back at previous steps to keep track of where it is in the process, and (c) not making a mistake somewhere along the way, which can suddenly make the problem become more and more unlike the examples it is used to.

Failure can be surprising and amusing because there is no hesitation or realisation. In one example, the model gets everything right up until the last step, when it confidently concludes that the remainder must be 0:

In the next example, some subset of correct things happens at first but it then goes completely off the rails (not just a slip; these are what Recchia calls abject errors):⁶

Long division is a very complex process both to learn and to do. Teaching others – even machines – to do it exposes this. Students can have some of the same problems as computer models (though they’re not likely to go as far off track as the second example). However, they have conceptual tools, including additive and multiplicative reasoning,⁶ at their disposal to help them select appropriate methods to use and detect or avoid slips, which these machine learning models do not. They also have an explicit reservoir of calculation “facts” helping them to correctly do intermediate addition, subtraction, multiplication and division steps depending on their approach.³

Mason reminds us that students’ attention, unlike a computer model, is shaped by what they have learned to see as important based on past experience, including their conceptual development so far.⁷ Teachers can support students’ attention to different parts of the process by helping them to build calculation fluency, develop their additive and multiplicative reasoning, learn and flexibly apply applicable strategies,⁷ and by remembering that even the most straightforward application of standard algorithms naturally requires a lot of support in order for students to learn to do it successfully.

Humans don’t need to learn to manipulate digits from scratch in a void. We can bring prior knowledge to the party. However, some mistakes are inherent to the process. Fortunately, we don’t need to look at 200 examples 10,000 times, incrementally adjusting a matrix of 125 million parameters, to figure it out from scratch! The human equivalent is complex enough.

References:

Join the conversation: You can tweet us @CambridgeMaths or comment below.