How an AI agent built me an Ofsted results dashboard
My son’s school recently had an Ofsted inspection. As of this year, Ofsted reports no longer gives a single overall rating but instead shows colour-coded grades for seven different areas (here’s a video explaining the change).
I was interested in how the grades my son’s school received compared with other primary schools who had been assessed using the new methodology but couldn’t find an easy way of doing so.
A fellow parent mentioned they’d clicked through a bunch of reports for other schools to try and get a sense but I couldn’t be arsed to do that was after something more scientific.
I came across Watchsted, “a free factual tool that shows the most recent inspection grades and reports from Ofsted in a way that’s quick and easy to view” but it isn’t indexing the new format reports yet.
I decided to throw the challenge at Claude Cowork “a research preview which brings Claude Code’s agentic capabilities to Claude Desktop for knowledge work beyond coding”1.
Here’s what I asked Claude to do:
Analyse the Ofsted reports published for primary schools since 12/1/26 with the new look inspection report [relevant URL] and create an interactive dashboard contextualising the new grades
It hit a challenge straight away as it was blocked from fetching the data from the Ofsted website. Rather than giving up, it decided to find results published on other websites (e.g. Schools Week) and, without any further input from me, created a dashboard presenting the results from ~100 schools.
Claude mentioned that its Chrome browser extension would have enabled it to directly access the Ofsted website so I installed it and gave it permission to browse on my behalf (NB. There are security risks associated with doing this).
This enabled it to update the dashboard with data from all 193 primary schools with the new format reports. Here’s the resulting dashboard.
It’s fair to say the dashboard exceeded my expectations.
From a single, simple natural language prompt, Claude went and gathered data spread across dozens of webpages, analysed the data across multiple dimensions and then - most impressively - made a series of smart decisions about how to present that data in a way that would meet my brief of ‘contextualising the new grades’.
The first few panels immediately convey how ‘Expected’ is the most common grade (accounting for 55.6% of all grades) and how few ‘Exceptional’ grades Ofsted is dishing out (0.8% of grades). The Insights panel, lower down the dashboard, reveals that Achievement is the lowest scoring area, with nearly a quarter of primary schools graded using the new framework graded ‘Needs Attention’ or ‘Urgent Improvement’ for Achievement.
The dashboard also includes an at-a-glance table of all 193 results, which can be filtered by grade, with the option to search by school or location.
I spotted one stacked column (Personal Development) that didn't add up to the expected total. When I pointed this out, Claude investigated the discrepancy and added the missing data - a reminder of the importance of human oversight and validation.
I haven’t checked every data point in the dashboard and whilst I didn’t find any mistakes in the dozen or so reports I did cross check, I would want to put in some more systematic checking processes before declaring the data error-free.
So what?
I often use Claude to build me simple web tools. What struck me about this latest experiment was Claude Cowork’s ability to independently identify, extract, analyse and present data with zero help from me.
All I did was provide the initial brief, give it access to my browser and point out one data anomaly, which it quickly fixed.
At my behest, Claude analysed its log files and concluded it had spent 45 minutes working on the project (spread out over an afternoon and evening, due to a combination of childcare and token limits). I spent less time than that, mostly on reviewing the dashboard and cross-checking results.
There remains a lot of hype and hyperbole around AI agents, but the capability of AI assistants to take on more of the ‘doing’ (not just the ‘thinking’) of well-bounded desktop/browser-based tasks is increasing.
The key challenges when using these capabilities are 1.) clearly describing the desired outcome and any constraints 2.) providing access to tools and data whilst managing token use2 and mitigating potential security concerns 3.) validating the AI’s solution works as intended.
For complex software projects those three challenges can be sizeable. For smaller, more bounded tasks, tools like Claude Cowork are making those challenges increasingly surmountable for non-coders.
Cowork appears as a separate tab in the Claude desktop application for users with a paid Claude subscription.
I used Claude’s latest model, Opus 4.6, for this project, which uses more tokens than less capable models. I hit the token limit on my Claude Pro plan and had to wait a couple of hours for it to reset.



The "bounded browser task" framing is useful - it's exactly where agents stop failing. When you scope the job tightly enough that there's no ambiguity about what "done" looks like, they actually finish things.
I've been running similar autonomous tasks overnight: research, writing drafts, deploying small updates while I sleep. The failure mode is almost always scope creep, not capability: https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1
The Ofsted case works because the data has structure. What happened with edge cases - reports in non-standard formats, or schools with partial inspection histories?