← Back to blog

2026-06-17 · 9 min read

How to write and debug SQL with AI (without it hallucinating joins)

AI writes SQL that looks right, runs clean, and returns the wrong number. Here is the practitioner pattern that makes AI-generated SQL trustworthy, plus how to debug it.

South Asian woman data analyst reviewing SQL code on a laptop in a dark home office lit with purple light

The first time I asked an AI to write SQL, it gave me a query in four seconds that would have taken me fifteen minutes. It ran. It returned a number. The number was wrong by 30%.

Nothing errored. No red text. The query was clean, indented, commented. It just quietly fanned out a join and double-counted half my orders. I only caught it because the revenue looked too good and I have trust issues with numbers that look too good.

That is the real risk with AI and SQL. It is not that the AI fails loudly. It is that it succeeds confidently and wrongly. A junior analyst who is unsure will tell you they are unsure. The model never does. So the skill you need is not "prompt it better." It is a pattern that makes the wrongness visible before it reaches a stakeholder.

Here is the pattern I use and teach. It is five moves. None of them are clever. All of them are the difference between AI-SQL being a time bomb and being the best pair programmer you have ever had.

Why AI hallucinates joins in the first place

The model does not know your schema. When you type "total revenue by city last month," it has to guess which table has revenue, which has city, and how they connect. It guesses from the millions of schemas it saw in training. Your warehouse is not one of them.

So it invents a plausible join. Usually it picks the obvious key. Sometimes that key is one-to-many in a way you forgot, and now every order row is multiplied by the number of sessions that user had. The SQL is syntactically perfect. The grain is silently broken. This is the single most common failure, and all five moves below exist to kill it.

Move 1: Paste the schema, not just the question

The biggest upgrade you can make costs ten seconds. Before you ask for anything, paste the schema of every table in play. The CREATE TABLE statements are ideal. If you do not have those handy, a list of columns with types works.

The moment the model can see that orders.user_id is a foreign key to users.user_id, and that sessions also has user_id, it stops guessing and starts reasoning. Most "the AI is bad at SQL" complaints are really "I asked the AI to write SQL for a database it could not see."

Move 2: Name the grain out loud

Grain is the level of detail of one row in your answer. "One row per user." "One row per user per week." "One row per city." Almost every wrong AI query is a grain error, and almost every grain error is invisible until you check.

So state it. "I want one row per city with total GMV for last month." Now the model has a target shape, and when it joins in a way that breaks that shape, it has a reason to catch itself. You will be surprised how often naming the grain makes the model say "to keep one row per city, I will aggregate sessions before joining" entirely on its own.

Move 3: Ask for the plan in English before the SQL

This is the move nobody does and everybody should. Before you let it write a single line of SQL, ask: "Describe the joins and the grain you will use, in plain English. Do not write SQL yet."

You will get something like: "I will join orders to users on user_id, then join sessions on user_id, then sum gmv grouped by city." And right there, in English, you can see the bug: joining sessions before aggregating will fan out the orders. You caught it in a sentence instead of in a number that already went to your VP. Reading a wrong join in English takes five seconds. Reverse-engineering a wrong number takes an afternoon.

Move 4: Run it small before you trust it big

When the SQL comes, do not run it on the full table and stare at one aggregate number. Run it with LIMIT 20 and look at the actual rows first.

Duplicated keys jump out immediately. If you asked for one row per city and you see Mumbai four times, the grain is broken and you know before you have aggregated the evidence away. Aggregation hides fan-out. Raw rows expose it. Look at the rows.

Move 5: Verify against one number you already know

The last move is the cheapest insurance in analytics. Before you ship anything an AI wrote, check it against one number you can confirm independently. Last week's total orders from a dashboard. A monthly revenue figure finance already signed off. Any single ground-truth number.

If the AI query reproduces that number, your joins and grain are probably sound and you can trust the rest. If it does not match, stop. The query is wrong no matter how elegant it reads. This one habit has caught more of my mistakes, AI-written and hand-written, than any other.

What I would not do

  • Paste the AI query straight into a dashboard or a scheduled job without running it small first. Clean SQL is not correct SQL.
  • Ask for the SQL and the explanation in the same breath. Get the plan in English first, approve it, then ask for SQL. Separating them is where the bug-catching happens.
  • Trust a number because the query was complex. Complexity is not correctness. The 30% error that started this post came from a beautifully written query.
  • Give up after one wrong answer. The model was guessing your schema. Feed it the schema and the same prompt usually gets it right on the next try.

The shift this actually creates

Once these five moves are muscle memory, the job changes. You stop being the person who hand-writes every CTE and starts being the person who reviews the plan, checks the grain, and verifies the number. The AI drafts; you adjudicate. You get through ten ad-hoc questions in the time one used to take.

That is the real unlock. Not that AI writes SQL for you. That AI writes the first draft and you spend your senior judgement on the part that actually needs it: is this number true. That is a more valuable use of an analyst than typing JOIN forty times a day.

Want to build this against a real, messy database?

I teach this exact pattern in the 3-Hour Champion Sprint, live, against a real Postgres warehouse with 65K rows across 4 tables, nulls and all. You write AI-assisted SQL against data that fights back, then wire it into an automated report and a Slack bot that answers your team's data questions. You leave with the workflows, the prompts, and the habit. ₹1,499 Early-Bird. One Saturday.

Free Download

Steal 6 AI prompts senior analysts use to skip busywork

The prompts a practicing Director of BI uses to skip the busywork.

6prompts₹990Free10+ yrsof BI
  • Debug broken SQL fast
  • Dashboard to 1-para story
  • Explain a metric drop
  • Monday-morning TL;DR
  • Root cause to hypothesis
  • Metric definitions that stick

No spam. One email with your free guide.

Preview 2 prompts first →