Exam grades MUST be reliable and trustworthy
BackIf, on reading that headline, you’re thinking, “Don’t be silly, of course grades must be reliable and trustworthy – and they already are!”, then this blog is for you. For they are not. On average, over the last ten years or so – when exams took place – about 1 grade in every 4 has been wrong. That’s about 1.5 million wrong grades out of a total of some 6 million GCSE, AS and A-level grades ‘awarded’ each summer.
And if the headline is not a surprise, but you have taken no action to exert pressure on Ofqual to fix this problem, then this article is for you too. For – as we have all experienced so vividly this summer – Ofqual only reacts in response to intense pressure.
Ofqual, of course, has never admitted that 1 grade in 4 is wrong – at least not in so many words. But the evidence is from their own research, and, if you listen carefully, you can detect some hints. So, for example, at the hearing of the Education Select Committee on 2nd September, Dame Glenys Stacey, Ofqual’s Interim Chief Regulator, said that exam grades were “reliable to one grade either way”, and Dr Michelle Meadows, Ofqual’s Executive Director for Strategy, Risk and Research, stated that she “took solace” that “98% of A-level grades and 96% of GCSE grades are accurate plus or minus one grade”.
“Reliable to one grade either way.”
What does that actually mean?
What use is “one grade either way” to the candidate who misses a potentially life-changing opportunity because the certificate shows grade B, when “one grade either way” might have been an A – an A that would have made all the difference?
And “98% of A-level grades are accurate plus or minus one grade” sounds so reassuring…
Ofqual’s statements are in fact true. But they very deliberately deflect attention away from the deeper and more fundamental truth that, on average across GCSE, AS and A-level, and across all subjects, about 1 grade in every 4 is wrong, and has been wrong for years.
The evidence for this startling allegation is Figure 12 on page 21 of Ofqual’s November 2018 report Marking Consistency Metrics – An update, as reproduced here:
In this diagram, for each subject, the heavy line in the darker blue box answers this question: “If the scripts submitted by an entire subject’s cohort were to be fairly and independently marked twice, firstly by an ‘ordinary examiner’, and secondly by a ‘senior examiner’ (whose mark determines what Ofqual calls the “definitive grade”), for what percentage of scripts would the originally-awarded grade be confirmed?”
The re-mark process is a ‘second opinion’: a second opinion from an individual whose mark (and hence grade) is regarded by Ofqual as “definitive”.
If grades were fully reliable, then, for all subjects, the answer to the question would be “100%”.
Ofqual’s chart, however, shows that this is not the case.
Rather, the average reliability – as measured by matching the senior examiner’s ‘second opinion’ to the originally-awarded grade – varies by subject from about 96% for (all varieties of) Maths to about 52% for Combined English Language and Literature.
My calculations (available on request) show that if each of those subjects is weighted by the corresponding subject cohort, then the average reliability across the 14 subjects shown is about 75%.
I know of no data for other subjects, but I have made some estimates (also available on request), and I am confident that the average reliability of all GCSE, AS and A-level grades, across all subjects, is about 75% – meaning a number unlikely to be less than 70% or greater than 80%.
In reality, that implies that if all 6 million scripts, as typically submitted for each summer’s exams, were to be fairly re-marked by a senior examiner, some 4.5 million (about 75%) of the originally-awarded grades would be confirmed, and around 1.5 million (about 25%) grades would be changed, approximately half upwards and half downwards.
Or, looking at this another way, about 75% of grades, as originally awarded, are “definitive”, whilst the other 25% aren’t – and if any of those 25% were to be fairly re-marked by a senior examiner, the originally-awarded grade would be changed accordingly. The BIG PROBLEM, of course, is that no one knows which grades are “definitive” and which aren’t – a problem aggravated by Ofqual’s 2016 change in the rules for appeals to make it much harder to discover, and correct, grading errors. And even if a grading error is discovered and corrected, I’d be surprised if a candidate whose grade were changed were to say, “Ah! My originally-awarded grade must have been non-definitive!”. I think the student would say, “That original grade was wrong!” – hence my claim “About 1 exam grade in 4 is wrong”.
In fact, that’s the good news. If you look more deeply at the reliability of grades corresponding to different marks within any given subject, things are even worse – for example, in every subject (including Maths), any script marked at or very close to any grade boundary has a probability of being ‘awarded’ the right grade of about 50% at best. If the exam board were to toss a coin, the result would be just as fair, if not more so.
The statement that “on average, about 1 exam grade in 4 is wrong”, however, still needs to be reconciled with Ofqual’s acknowledgement that “98% of A-level grades and 96% of GCSE grades are accurate plus or minus one grade”.
To do this, I draw on my simulation (details available on request) of the exam results of 2019 A-level Sociology.
If the grades were 100% reliable, then on a fair re-mark by a senior examiner, all 38,015 candidates would be given the same grade as the original grade.
But according to Figure 12, only 63% of the entries – that’s about 23,940 candidates – would have the original grade confirmed, whatever that grade might be.
My simulation shows that a further 13,790 candidates would be re-graded either one grade higher (6,809) or one grade lower (6,981) than the original grade, and so the total given either the original grade, or one higher, or one lower (that’s “accurate plus or minus one grade” or “reliable to one grade either way”) = 23,940 + 13,790 = 37,730, this being 99% of the total cohort of 38,015, comfortably consistent with Ofqual’s statement that “98% of A-level grades are accurate plus or minus one grade”.
Finally, 112 candidates would be re-graded two grades higher, and 173 two grades lower, giving a total of 285 students two grades adrift.
The totals therefore reconcile as
Grade confirmed |
23,940 |
(63%) |
|
|
|
One grade adrift |
13,790 |
(36%) |
Sub-total |
37,730 |
(99%) |
Two grades adrift |
285 |
(1%) |
Grand total |
38,015 |
(100%) |
Let me note that the results of my simulation, as presented, imply a spurious degree of precision and accuracy: so, for example, the number of re-grades one grade higher, shown as 6,809, is more sensibly stated as “around 7,000”. That said, the simulated numbers have the benefit of adding to the total cohort of 38,015, which is a known, precise, number.
My simulation of 2019 A-level Sociology is just one example; I have many more. And all confirm that Ofqual’s statements that “exam grades are reliable to one grade either way” and “98% of A-level exam grades are accurate plus or minus one grade” are true.
But they mask a deeper truth.
That, on average, about 1 exam grade in every 4 is wrong.
A fact that Ofqual has known for many years.
But done nothing to resolve. Nor do they seem to have they any intention to.
Sooner or later, exams will be back. And if you are happy that 1 grade in 4 will be wrong, fine.
But if not, how can pressure be put on Ofqual to fix it?
Dennis Sherwood runs Silver Bullet Machine, a boutique consultancy firm, and is an education campaigner.