In the last post I explained how Atmos Football began as a private stress test: me throwing years of messy football results at different AI models to see if they could actually handle real data.
What I didn't cover was how messy that data actually was — and why it kept breaking every model until early 2026.
Why the App Says 2018 When the Group Started in 2015
If you check the About page or the leaderboard, everything starts in 2018. But the truth is a bit more complicated.
The "pre-history" actually stretches back to 2015. In the very beginning I wasn't trying to be a statistician — I was just trying to make sure everyone paid their £5 subs. The records from those first three years (2015–2017) were purely financial: who was "In" for the week and whether they'd handed over the cash. No team lists. No final scores. No way to calculate who actually won.
The data exists, but it lacks the "DNA" required for any meaningful review. That's why 2018 is our real Year Zero — the moment we started recording the actual football.
The OneNote Nightmare
Even once we started writing down scores, the storage method was… unconventional.
I didn't use a spreadsheet or a database. I used a single chaotic OneNote file, broken down into years.
Imagine a digital canvas where individual text boxes are squished together, stacked on top of each other, and shifted around to save screen space. To a human it was a readable (if messy) archive. To an AI it was a labyrinth.
That chaos is exactly why the early tests with Copilot in 2025 failed so badly. The models might have coped with a tidy CSV, but they had zero chance of navigating my handwritten-style OneNote mess without hours of manual hand-holding I wasn't willing to provide.
January 2026: The Claude Breakthrough
As I left it last time, Gemini 3 could finally read the data… but the daily usage limits killed any serious progress. So the project went back on the shelf.
Then January 2026 arrived.
Work had started rolling out professional Claude accounts to the dev team. Even though I'm not a programmer, I work closely with them and got my own account.
I went in with healthy scepticism — I'd been burned by hallucinations before. I gave Claude a sample of the 2025 results and asked the usual benchmark:
"How many games has James played?"
It didn't guess. It didn't apologise. It just gave me the correct number in a single shot.
"Wow."
Auditing the Human Element
Encouraged, I fed it the full 2025 dataset and asked for a complete list of players and their results.
This time the real power showed up. Instead of inventing numbers, Claude started highlighting my mistakes:
- Typos like "Jmes" instead of "James"
- Conflicting stats (games where the total goals didn't add up or the stated winner had a lower score)
- Outliers (scores that looked suspiciously high or low compared to our usual games)
For years those manual-entry errors had sat quietly in OneNote. The AI acted like a rigorous QA tester, forcing me to fix the data before we could move forward. There's something quietly strange about that experience — being corrected by a tool you expected to impress you. It didn't complete the task and hand it back; it handed the task back to me, half-done, and told me why.
A few careful prompts and some cross-checking later, I had a clean, unified table for the entire year.
The Anomaly Report
Cleaning the data wasn't just about fixing typos. Some records were genuinely contradictory — and two examples stand out as the strangest.
The impossible result. One game from the back catalogue had Team A winning 10-5. Clear enough. Except the winner field said Team B. That's not an edge case the Elo engine can quietly paper over — it would have deducted rating points from the team that actually won and awarded them to the team that lost. The whole leaderboard history would have been quietly corrupted by one bad row.
Claude flagged it immediately as part of its contradiction check: "Score indicates Team A won by 5 goals, but winner is recorded as Team B. Which is correct?" In this case the score was right and the winner field was a data-entry slip — probably a mis-tap when entering results on a phone after the match. One field corrected, crisis averted.
The phantom cricket score. The second one was weirder. A game showed a score of 5-20. In 5-a-side. With a 30-minute half. Team A was listed as the winner, which at least made sense given the scoreline — except that a 5-20 result in a casual Thursday kickabout is basically impossible. We're not Barcelona.
This one took a minute to unpick. The real score was 5-2 to Team A. The "20" was a fat-finger typo: someone entering the result on a phone keyboard hit both the 2 and 0 at the same time and didn't notice. The winner field was actually correct; it was only the score that was broken.
That second example is why I eventually built a basic validation layer into the app itself: if the winning team's score is lower than the losing team's score, the record is flagged before it's saved. A ten-second check that would have caught both of these instantly.
It also made me realise how many years of quiet errors were probably sitting in the old OneNote. The data I thought was clean turned out to need a full audit before it could be trusted.
The Evolution of Elo
Now I could finally ask the question I actually cared about:
"Provide a ranking of everyone in the game data."
Claude thought for a moment… then a second window opened in the chat (one of its canvas features) and out came a live-updating ranked table.
It wasn't perfect at first. The initial version was a basic team-wide Elo system with a major flaw: it didn't account for individual performance. If you were a great player on a bad team, the system punished you too harshly.
Step by step we refined it. We added individual contribution weights, adjusted the K-factor based on how close the game was (a 10-9 nail-biter should shift ratings less than a 10-0 thrashing), and tuned until it started to feel roughly right.
(It's still not perfect today — that's the limitation of the data we actually have — but it's light years ahead of the old spreadsheet.)
The 2021 Pivot: COVID and the Banking Crisis
Once 2025 was clean, the next step was importing the historic data from 2018–2024.
I naively thought it would be quick: just copy-paste year by year — 2024, 2023, 2022, no major issues. Then 2021 broke that plan.
The year before COVID changed everything. To reduce transmission risk we moved from cash subs to bank transfers. Before 2021 we only needed first names. After 2021 the bank statements often showed full names or just surnames.
Suddenly I had multiple "Daves," people who used nicknames on the pitch but formal names on the banking records, and no easy way to match them.
Claude became my digital detective. I fed it the messy OneNote pages before and after the switch and asked it to map first names to full names and de-duplicate the list. It caught most of them. The rest I manually verified against old emails and messages.
The day I finally exported the complete "Master Data Set" — every game since 2018, clean, consistent, with full names — was genuinely one of the best days of the entire project.
In the next post I'll get into what happened after the data was clean: the moment a simple stats prompt turned into the idea for a full app, and how I ended up building it with AI as my co-pilot.
— James