When a bank analyst asks "top-5 branches by term-deposit revenue last month", the right answer already lives inside SQL. Getting it out safely takes more than a model call: is the user authorised, which metric definition applies, can a national-ID or IBAN leak in the result, and do the numbers actually match the dataframe shown? The CentraQL AI BI Copilot pipeline manages that surface in 11 stages.
Pipeline goal
Turn a natural-language question into a traceable, auditable, provable answer. *Traceable* — every stage logs which LLM ran, with what context. *Provable* — the SQL and the semantic definitions behind the answer are visible to the user.
1. Guard
Before the query touches the surface, the policy layer runs: user role, data classification, ComplianceProfile (e.g. RegulatedFinance). Prompt-injection patterns are checked with a lightweight regex + similarity filter; high-injection inputs are rejected and logged.
2. Intent
A short LLM call (a 7B planner) classifies the question: aggregation, trend, drill-down, anomaly explanation. Each type routes to a different template and few-shot set.
3. Retrieve (Qdrant)
Certified few-shots and KPI definitions in the domain pack are embedded with nomic-embed-text. Semantic neighbours of the question are pulled; top-k (~5) is fed into the planner as context.
4. Plan — QuerySpec JSON
The planner LLM receives the few-shot + question prompt. Its output is not free SQL but a JSON QuerySpec: which metric, which dimensions, which filters, which sort, which limit. A schema validator enforces the shape; a non-conforming output is rejected and one retry is allowed.
5. Validate
The QuerySpec is checked against the semantic layer. Is the metric defined, are the dimensions valid, is the filter field within the user's access? Under RegulatedFinance, additional checks run: rangeOf, joinDepthMax, allowFreeText=false. A rejection returns to the user with a reason.
6. Synth — SQL
QuerySpec is converted to T-SQL by a deterministic builder. No LLM is used here, so hallucination is impossible. SQL is formatted, hashed, and written to audit.
7. Execute (read-only)
The SQL runs over a read-only connection. Timeout is 30 s by default, 10 s under RegulatedFinance. Row limits are profile-bound (typically 1000 rows).
8. Mask (PII)
The result dataframe is filtered against ColumnPolicy: national IDs, IBANs, emails, phone numbers are masked or dropped. Masking happens after execution and before narration — sensitive fields never reach the LLM.
9. Chart
A chart type (line, bar, pie, table) and axes are derived from the result shape. The QuerySpec may have predetermined this; otherwise a heuristic decides.
10. Narrate (LLM)
The masked numeric result + chart spec is given to the narrator LLM. A sanitization stage compares numbers, percentages and currency strings in the narrative to the underlying table; mismatches are corrected.
11. Audit
Metadata for every stage is written to PromptAuditLog: query hash, user, timestamp, planner / narrator model and token counts, returned rows, masked columns, p50/p95 latency. Retention defaults to 365 days for BDDK.
Latency profile
On 1×RTX 4090 + Qdrant + SQL Server, p95 ≈ 6-9 s:
- Guard + Intent: 200 ms
- Retrieve: 80 ms
- Plan (7B q4): 2-3 s
- Validate + Synth: 60 ms
- Execute: 150-800 ms
- Mask: 30 ms
- Narrate (14B q4): 1.5-3 s
- Audit (async): 20 ms
Conclusion
The CentraQL Copilot pipeline inverts the "let the LLM write SQL" pattern: the LLM only handles language and narration, while SQL flows from a deterministic builder, through the semantic layer and catalog boundary, with masking before any model sees the data. The result is a regulator-grade architecture with zero hallucination risk on numbers.
