When a bank analyst asks the CentraQL Copilot for "the underwriting cycle of the last 30 days", the planner LLM must map the Turkish word *tahsis* to the cluster *underwriting + approval + disbursement* before producing the QuerySpec. A generic 7B Qwen or Llama gets it right roughly half the time; the other half it interprets *tahsis* as *allocation* and drifts off-domain.
The fix is a small LoRA fine-tune over a banking domain pack. Not a full fine-tune — a parameter-efficient adapter. Cost and time controlled.
Why LoRA?
A full 7B fine-tune needs 14 GB+ VRAM and days of training. LoRA (Low-Rank Adaptation) attaches low-rank updates to selected matrix pairs only. Result:
- Adapter file size: ~50-200 MB (vs. 4-14 GB base model).
- Training time: ~2-4 hours on a single RTX 4090 against Qwen2.5-7B.
- Reversible: removing the adapter restores the original model.
- Multi-adapter: banking-tr and banking-en can sit side by side over the same base.
Dataset structure (banking-tr example)
Three layers:
- Lexicon (1500 rows): domain terms and synonyms. *tahsis* → underwriting/approval/disbursement, *mevduat* → deposit, *KKB* → Credit Bureau Turkey.
- Question → QuerySpec (800 examples): real bank scenarios prepared as certified few-shots. Each example pairs a question with the expected QuerySpec JSON and the expected narrative.
- Bad examples to reject (200): prompt-injection, off-domain, unauthorised-field requests. The model is trained to respond "reject + reason".
4-week plan
Week 1 — data collection:
- Sweep the last 6 months of IT tickets; extract 200-300 recurring analyst questions.
- 8-10 senior analysts convert questions to clean QuerySpec.
- Dataset version is committed and signed by the owner.
Week 2 — domain pack:
- Build the 1500-row lexicon and the 200 rejection examples.
- Stamp every few-shot as "certified" in PromptAuditLog format.
- Hold out 20% as eval set.
Week 3 — training:
- Qwen2.5-7B-Instruct + LoRA r=16, alpha=32, targets q_proj, v_proj, k_proj, o_proj.
- 3 epochs, lr 2e-4, cosine scheduler.
- 2-3 h on a single RTX 4090. (q4 quantization for vLLM serving adds ~30 min.)
- Eval target: QuerySpec exact-match > 85%, narrative semantic-match > 90%.
Week 4 — pilot and calibration:
- 2-week live pilot with 5-10 analysts.
- Wrong QuerySpecs are folded back into the dataset; a second LoRA pass runs.
- Production decision: end-of-pilot accuracy + analyst NPS.
Typical outcome
In a real CentraQL deployment, before/after LoRA:
- QuerySpec exact-match: 62% → 89%.
- Narrative semantic-match: 78% → 94%.
- Out-of-domain rejection: 43% → 88%.
- p95 latency unchanged (LoRA inference overhead < 5%).
Important detail: model version in PromptAuditLog
When the LoRA adapter is updated, model_version in the audit record must change. This is mandatory for re-evaluating older answers. CentraQL writes both base and adapter into every audit row, so a regulator asking "which model produced this query in October?" gets a provable answer.
On-prem and leak control
LoRA training stays inside the organisation. No training data goes to cloud services — analyst questions tend to contain real customer segments. The CentraQL training scripts never reach Anthropic Cloud; EgressGuard is hard-on.
Conclusion
LoRA is the practical form of "train your own LLM" for a bank. Four weeks, one RTX 4090 and a curated 1000-example dataset bring a generic 7B model up to bank-specific accuracy. CentraQL packages the process with dataset templates, training scripts and audit-aware adapter management.
