I Built a Public Service in My Spare Time

I built a production-ish public service in my spare time over about a month. Not a prototype. A live service at checkmybenefits.uk that helps people discover what they’re entitled to, tells them what to claim first and why, and links them to applications covering 52 UK entitlements across benefits, NHS costs, childcare, housing, transport and legal support.

It’s an experiment, a proof of what’s possible, not a finished service. But it works, and it’s free.

I built it working with Claude Code as a pair programmer. I’m a product manager and I can code to, like, a prototype level. I’ve been building stuff for the web for around 30 years now — so I know enough to understand architecture, enough to hold my own in technical conversations. But I couldn’t have built this alone; at all let alone over a long enough timeline.

This isn’t a post about AI replacing developers. It’s about what happens when someone who understands users, policy, code, the importance of unit testing/evals and service design can suddenly build at the speed of their own thinking.

The thing nobody’s built

As much as £23 billion in UK benefits goes unclaimed every year. There are tools that help: Entitledto, Policy in Practice, and they’re better than Check My Benefits at precise means-tested calculations, and are built by specialists, not some random PM working in Gov. But every one of them works the same way: fill in a lot of fields, get a flat list of what you might be entitled to, and then you need to figure out what to do next.

Check My Benefits results screen

Nobody models the gateway cascade.

Here’s what I mean by ‘gateway cascade’. Say you’re 74, your mobility’s getting worse, and you don’t think of yourself as someone who claims benefits. An adviser at Citizens Advice would tell you: apply for Attendance Allowance first. It’s not means-tested, you might qualify even with savings and a pension. But here’s the thing. Once you’re getting Attendance Allowance, that changes your Pension Credit calculation. Suddenly you qualify for Pension Credit too. And Pension Credit is a gateway benefit; it automatically qualifies you for Council Tax Support, a Warm Home Discount, free NHS dental and optical care, help with funeral costs. One claim unlocks a chain. Miss the first one, miss them all.

That sequencing lives in the heads of experienced welfare rights advisers, knowledge built over decades of casework that no tool has attempted to model, however crudely. It was that product insight which made me decide to build checkmybenefits.uk.

Check My Benefits models 45 of these dependency edges across the UK benefits system. It will get things wrong, it’s guidance, not advice, and it says so clearly. The 48 rules and 45 edges are a crude approximation of knowledge that advisers hold in far greater depth. But it catches people who don’t yet know what they’re looking for, which is something no existing tool does.

How it works

Check My Benefits starts from “what’s going on in your life?” in plain English. Four to six questions. Two minutes. No accounts, no data stored — you don’t even need to give your full postcode. Conversations pass through AWS Bedrock and are not stored or used for model training. You get a prioritised action plan with claiming order, estimated values, and GOV.UK application links.

The conversational layer uses a small, cheap AI model (Amazon Nova Lite) but the eligibility logic doesn’t. That bit matters. Benefits rules need to produce consistent results every time, so I wrote 48 deterministic rules in code, not prompts. The AI handles the conversation. The rules handle the decisions. The cascade is a dependency graph.

But I learned that the boundary between “AI handles conversation” and “code handles decisions” needs to be thicker than I first thought (partly this is a byproduct of having to use such a tiny and cheap model for the conversational aspects). The AI extracts structured data from natural language; your age, income, housing situation. But it’s non-deterministic. In bereavement scenarios, the AI would sometimes get so focused on being empathetic that in evals, most of the time it forgot to extract “I was a housewife” as an employment status. In one eval scenario, it consistently failed to collect all four required fields before trying to show results.

So I built two safety nets in code. First, a regex-based extraction layer that catches common phrases the AI misses - “housewife” → unemployed, “eight thousand a year” → £8,000, “we own our home” → homeowner. Second, a hard gate: the system won’t show results until four critical fields are present, regardless of what the AI says. The AI is non-deterministic, so you build deterministic guardrails around it. The architecture is deliberately boring where it needs to be reliable.

Tom Loosemore has been arguing that AI agents will strip away the friction that currently suppresses demand for public services — and that governments aren’t remotely ready. He built MissingBenefit.com over a weekend to prove the point. I built Check My Benefits in my spare time over a month or so. Neither of us did anything we wouldn’t normally do in a government role; understand the policy, design the service, think about users. What’s different is that we both pair-programmed with AI and shipped production-ish services in days instead of months.

That’s the change.

What “building with AI” actually looks like for me

I made hundreds of specific architectural decisions. Which AI model for the conversational layer for example - I tested and rejected Nova Micro because it couldn’t follow structured XML output. Where to put eligibility rules. How to structure the gateway cascade. How to handle conflicts between mutually exclusive benefits.

I wrote Terraform for the AWS infrastructure. I debugged Lambda cold starts. I set up SES email forwarding across AWS regions. I configured CloudFront, DKIM records, CI/CD pipelines with Lighthouse accessibility audits. But the work that matters most looks like this:

I wrote 8 multi-turn evaluation scenarios: full simulated conversations that run against the live AI model weekly. ‘MT07’ is a bereaved 68-year-old widow who says “I was a housewife” and “about eight thousand a year.” The acceptance criteria: the system must collect her postcode before showing results, even though the emotional context makes the AI want to rush to help. That scenario failed five times in a row. I diagnosed the root cause (the AI wasn’t extracting employment status from “housewife”), added a single regex pattern to the code fallback layer, and the scenario went from 53% to 96%.

That loop — eval fails, diagnose, fix in code or prompt, re-eval — is the actual workflow. It’s closer to what product managers already do (define acceptance criteria, test, iterate) than to traditional software development. The 230 automated tests and 69 AI evaluation scenarios are acceptance criteria that run themselves. A separate pipeline scrapes GOV.UK for updated benefit rates and commits changes automatically. CloudWatch alerts email me if errors spike. The system mostly runs itself.

None of this is “vibe coding.” But none of it would have been possible at this pace without an AI that could hold the full context of the codebase, suggest implementations, catch mistakes, and work through problems with me. This is a ‘colleague’ who’s deep in everything and experienced in nothing. The fastest and most confidently wrong person you’ve ever worked with. But you give them direction and they get there - quickly.

What it costs to run

Can one person with relatively modest disposable income sustain a public service out of pocket?

Each conversation costs roughly a third of a penny. I ended up using Nova Lite on AWS Bedrock as it costs $0.06 per million input tokens and $0.24 per million output tokens, and is just smart enough to do structured data extraction. A typical conversation (six turns, a ~6,500-token system prompt sent each time, growing message history) uses about 47,000 input tokens and 1,900 output tokens. That’s £0.003 per person helped.

At £100 a year, the service handles around 38,000 conversations. At £500, around 190,000. Lambda compute and CloudFront hosting sit inside AWS free tiers. The domain costs £10 a year.

If even 1% of those potential 38,000 people successfully claim Attendance Allowance (worth £4,000–£5,500 a year), that’s £1.5 million unlocked annually, from a service that costs less than a gym membership to run.

What this changes

I’ve spent nearly ten years building digital products in government. I know the service manual. I’ve been through service assessments and have trained as an assessor. I understand why we do discovery, alpha, beta, live.

The traditional approach to building this would involve a product manager, delivery manager, user researcher, designers, developers, data scientists, cross cutting roles like a BA and a tech lead. Six to twelve months from discovery to public beta. That structure exists for good reasons — and for complex, high-stakes government services, you simply need it. You still need user researchers talking to all kinds of people. You still need governance and deep cyber/DP process.

But there’s a category of Useful Thing that sits between “someone should build this” and “we’ve secured funding for a discovery phase.” Things that could help people now, if someone could just build them. The AI doesn’t understand your users. It doesn’t know that people claiming Attendance Allowance are often in their eighties and may be using a phone with large text. It doesn’t know that benefits language carries stigma. Those are insights which lead to product decisions that require human judgement and real user research (mea culpa here, I made a lot of assumptions with this build).

So, the gap between “someone who understands the problem” and “someone who can build the solution” has narrowed to almost nothing. And in a situation where £23 billion goes unclaimed because people don’t know what they’re entitled to, the ability to build something useful quickly (and even imperfectly) if it helps someone then maybe that’s a form of public service.

checkmybenefits.uk is live, free, and stores nothing. It’s not a government service, it’s not benefits advice, and it will make mistakes.

If you work in benefits, welfare rights, or public service delivery, I’d genuinely value your feedback: feedback@checkmybenefits.uk