Ofsted inspections are (still) unreliable by design
BackThis piece was originally published on Becky's blog here last month.
Another term, another crisis (a desperately sad one this time) and another set of solutions mooted by opposition parties and policy commentators to the perennial school inspection problem.
As usual, it’s easy to jump to quick solutions by starting at “the end”. “The end” is the question of what policy triggers should be attached to the Ofsted judgement. “The end” is whether the information that Ofsted reports should be converted into four numerical grades, pass-fail judgement, six sub-categories, or something else entirely.
To understand why Ofsted judgments are so pernicious, we need to go back to “the beginning” – the school awaiting inspection. According to Teacher Tapp, half of the profession are on tenterhooks this academic year, expecting the call at any moment (an artefact of Ofsted being behind schedule). Achieving great test scores and exam results is no insurance against the risk of failure. And those who claim that, since inadequate judgments are rare, stress is unjustified have clearly never walked across this particular minefield.
The stress is largely a result of the impossibility of knowing how the inspection will go. Six years ago, I wrote that inspectors are human beings, and as such, they will be unreliable and biased. They must employ well-established psychological heuristics to form judgments during short inspections. The insight of my post is that we may be worse off with inspections than using exam data alone to judge schools. While inspection frameworks and Chief Inspectors come and go, the reality of short, human impressions does not change.
Judging school quality is a perpetual challenge. We all agree that school quality is not the same as exam results, but how can we measure it without monitoring millions of hours of instructional time each year? The inspection framework appears to swing between the three legs of the stool of classroom instruction, namely pedagogy, curriculum, and assessment, distorting each leg as it intensely focuses on it. Older teachers lived through the pedagogy era, where individual lesson activities were judged during long inspection weeks. As the length of inspections shrunk, we moved to the assessment era, where inspectors reviewed the internal assessment data of a school that inevitably ‘proved’ that progress was being made. We are now living through the curriculum era, where conversations about curriculum intent and implementation proxy the quality of the learning experience. Each is a partial and distortionary perspective on what makes good teaching and learning, doomed to be abandoned as we search for the elixir of measurement.
Why do headteachers and teachers undertake seemingly irrational actions in preparation for Ofsted, actions that Ofsted itself claims not to want to see? It is because Ofsted demands the impossible, namely, to visit a school and evaluate the quality of teaching and learning that has occurred during a time when they were not present. Importantly, Ofsted does not provide specific guidelines on how these evaluations will be made, forcing educators to make educated guesses.
For instance, primary school heads sometimes require teachers to take photographs of students holding their craftwork or maths manipulatives, as seen in this Twitter thread. The reason behind this seemingly bizarre request is that the school knows that there could be a “deep dive” inspection into any subject, even Design and Technology. However, if there are no Design and Technology lessons on the day of the inspection, how could the inspector evaluate the school’s curriculum implementation and impact if the craftwork is not logged or stored, beyond assuring the inspector it is fine in a conversation? The last thing they want is for the inspectors to rely on conversations with the children: as any parent who has asked their own child about their day in school can attest, getting a reliable recollection of even recent activities can be a challenge.
Taking pictures of work to place in books may seem absurd, but it becomes necessary under a stressful inspection regime that requires evidence of past learning, even if inspectors never specifically ask for it. Ultimately, it is a matter of being caught between the rock of ‘evidence’ and the hard place of ‘teacher workload’.
When we inspect schools to evaluate what teaching and learning occurred during a time when we weren’t present, it is inevitable that “evidencing” will be important, and therefore will create workload. Student work in books might be an excellent indicator of what learning has taken place in some subjects, with some ages of children and for some pedagogical approaches. For others, work in books is a poor proxy for past learning experiences and so schools at risk of inspection understandably create special evidence.
Many suggested solutions to the flaws in the Ofsted inspection system are missing the point because they focus on changing the judgement or the consequences of the judgement. But they’re overlooking the elephant in the room: the entire inspection process is probably quite unreliable. (How unreliable? We don’t know, but we do know that judgements are correlated with HMI characteristics, which can’t be a good sign.) We might marginally improve reliability by enhancing the training and experience of inspectors or by tightening the framework, but these are minor improvements that will never meet the demands we place on the judgement itself.
Improving inspection will not be a popular policy. It may necessitate longer inspections or frequent short inspections for particular elements of a school’s provision, resulting in more time and resources being needed. We’ll need to revisit how data is used to inform both the judgement and the decisions about who to visit. We must conduct independent research to create a realistic picture of what an inspection can and cannot reliably tell us during brief visits to schools. We decide whether inspection can create any meaningful information for either parental choice or for school improvement. We must do all of this in addition to changing the nature and consequences of the judgement.
Becky Allen is an academic and education commentator. She is currently a professor at the University of Brighton, and also works at Teacher Tapp, which she co-founded.