The Self-Improving Tax Agent: How OpenAI and Thrive Built Tax AI with Codex

In a landmark collaboration announced just days ago, OpenAI and Thrive Holdings partnered with Crete Professionals Alliance — a network of over 30 accounting firms — to create Tax AI, a groundbreaking self-improving system for preparing complex U.S. tax returns (primarily 1040 and 1041 forms).

Over just six months, forward-deployed OpenAI engineers and researchers worked hand-in-hand with Thrive’s team inside the real production environment of Crete’s firms. The result? A system that doesn’t just automate data entry — it learns and gets smarter with every correction, turning messy real-world tax work into a continuously improving loop.

Impressive Results in One Tax Season

Tax AI processed 7,000 tax returns across the participating firms. Final drafts reached up to 97% accuracy without corrections. Practitioners saved about one-third of their preparation time per return, and overall throughput increased by 50%. This freed accountants to spend more time on high-value client advisory work instead of tedious data entry.

The Real Magic: The Self-Improving Loop

What makes Tax AI truly special isn’t the initial automation — it’s the closed-loop self-improvement system powered by Codex.

Here’s how it works in practice:

Full Production Traces: Every step is logged in detail — from the original source document, to the extracted field with citation, to the mapping into the tax engine, the accountant’s edit, and the final filed value.
Practitioner Corrections Become Gold: When an accountant fixes something, the system doesn’t just accept the change — it analyzes it. Repeating edits on the same field automatically become **targeted eval datasets**.
Codex Gets a Narrow, Scoped Task: Codex is given a precise package: the failing trace, the new eval set, the full codebase, relevant skills, production data samples, expected tax-engine outputs, code examples, and even eval-runner commands.
Patterns Turn into Product Changes: Codex diagnoses the root cause (bad extraction? weak mapping? unsupported workflow? carryover from prior year?), proposes concrete fixes (schema updates, better source selection, mapper improvements), runs targeted + regression evals, and generates a pull request.
Ambiguous Cases Go to Humans: Only clear, bounded improvements are auto-applied. Tricky edge cases are routed to engineers for review.
The Loop Closes: Each deployed fix creates fresh production data, which fuels the next cycle. It’s continuous, autonomous improvement.

This isn’t generic fine-tuning — it’s a living agent that gets better at the hardest parts of tax prep with every real return filed.

Why This Is Much Harder Than It Seems

Reading a clean W-2 or 1099? Easy for today’s AI.
But real tax season is full of “dirty” data: handwritten K-1 partnership forms, complex rental real-estate schedules, client notes scribbled in emails or spreadsheets, carryovers from last year’s returns, and values that must reconcile perfectly across five different documents.

A single wrong field can come from any (or all) of these five places:

Poor document extraction;
Weak mapping logic;
Unsupported workflow edge case;
Incorrect prior-year carryover;
Genuine human judgment call.

Early in the project, only 25% of returns reached 75% correct field completion. Within six weeks of running the self-improving loop, that jumped to 86% — with 90% and 100% completion metrics improving even faster. The rental-property workflow alone went from painful manual work to 90% precision and recall in the same period.

The Self-Improving Tax Agent: How OpenAI and Thrive Built Tax AI with Codex Also read:

How the Teams Made It Happen

The Self-Improving Tax Agent: How OpenAI and Thrive Built Tax AI with Codex OpenAI’s forward-deployed engineers didn’t just ship a model — they embedded themselves with Thrive’s engineers and Crete’s practicing accountants. They designed the trace infrastructure, built the eval pipeline, and carefully scoped Codex’s tasks so the agent could safely edit only the right parts of the code (schemas, mappers, etc.) while leaving architecture decisions to humans.

The result is a blueprint for any knowledge-work domain: combine deep practitioner feedback, rich production traces, and a powerful agentic engine like Codex — and you get an AI that doesn’t plateau; it keeps climbing.

Tax AI isn’t just another tax-prep tool — it’s proof that truly self-improving agents are here today, already delivering massive value in one of the most complex, regulated professional fields. As the loop continues to turn, the next tax season will be even more efficient, accurate, and human-focused than this one.

The future of professional services isn’t AI replacing accountants — it’s AI that learns alongside them and makes them unstoppable.

The Self-Improving Tax Agent: How OpenAI and Thrive Built Tax AI with Codex

Impressive Results in One Tax Season

The Real Magic: The Self-Improving Loop

Why This Is Much Harder Than It Seems

How the Teams Made It Happen

Subscribe to our newsletter