# Pattern: AI research citation verification

An AI legal research assistant answers case-law and statutory questions, drafts memo passages, and supports brief writing. The output cites cases, statutes, and regulations. A single fabricated citation in a filing is a sanctionable event — see *Mata v. Avianca, Inc.*, No. 22-cv-1461 (S.D.N.Y. June 22, 2023).

This pattern shows how to evaluate every research-tool output so a fabricated citation can never reach a court filing.

## What's at stake

| Risk dimension                                        | Magnitude                                    | Framework                               |
| ----------------------------------------------------- | -------------------------------------------- | --------------------------------------- |
| Court sanctions per *Mata*-style filing               | Cited sanctions $5K + reputational loss      | Public court orders                     |
| Bar discipline                                        | Public reprimand to suspension; case-by-case | State bar rules of professional conduct |
| Malpractice exposure for filed hallucinated citations | Multi-million-dollar settlements             | Public legal-malpractice filings        |
| Firm reputation in client and ABA Journal coverage    | Long-tail damage cycle                       | Industry trade press                    |

## The evaluation pattern

A two-stage evaluation runs on every research-tool output:

1. **Custom code grader (citation existence)** — every cited case, statute, or regulation is looked up against an authoritative legal database. Any citation that doesn't resolve = fabricated. Hallucination rate must be **0%** for the evaluation to pass.
2. **Holding-accuracy judge** (GEPA-tuned against ≥50 attorney-labeled examples — scored output) — for cases that exist, does the cited case actually stand for the proposition the AI claims? Does the quoted passage match the source verbatim?

> Don't have labels yet? See [Bootstrap a judge before GEPA](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/guides/bootstrap-judges.md) for the week-1 setup.

3. **Continuous trace evaluation** — every production research session is scored. Sessions with any unverified citation are flagged in-app for attorney review **before any output can be acted on**. The flag is a hard gate; attorneys can override only after explicit acknowledgment.
4. **Compare-models** — when prompt or model changes are proposed, run the existing regression set across both; pick the variant with the lower hallucination rate at equal or better holding-accuracy.

The same pattern works for statutes, regulations, and internal precedent citations.

## Configuration in code

```python
# Python (SDK)
from layerlens import Stratix
client = Stratix()

citation_scorer = client.scorers.create_code(
 name="citation-existence",
 code="""
unresolved = [c for c in extract_citations(output) if not legal_db.lookup(c)]
result = {'passed': len(unresolved) == 0, 'fabricated': unresolved}
""",
)

holding_judge = client.judges.create(
 name="holding-accuracy",
 evaluation_goal="For each cited case, does the case actually stand for the proposition the OUTPUT claims? Quoted passages must match the source verbatim.",
)

trace_eval = client.trace_evaluations.create(
 trace_set={"tags": {"feature": "legal-research"}},
 scorers=[citation_scorer.id],
 judges=[holding_judge.id],
 schedule="per-session",
)
```

```typescript
// TypeScript (REST) — same trace eval from a Node service
const r = await fetch("https://stratix.layerlens.ai/api/v1/trace-evaluations", {
 method: "POST",
 headers: {
 "X-API-Key": process.env.LAYERLENS_STRATIX_API_KEY!,
 "Content-Type": "application/json",
 },
 body: JSON.stringify({
 trace_set: { tags: { feature: "legal-research" } },
 scorers: [citationScorerId],
 judges: [holdingJudgeId],
 schedule: "per-session",
 }),
});
```

## What you get

* Hallucination rate held at 0% on filed-output paths — the gate is a hard block, not a warning.
* Holding-accuracy judge agreement with senior associates above 90% after GEPA optimization.
* Documented evaluation evidence — the evaluation IDs and per-citation verification artifacts — to show partners, clients, and (if needed) the court that AI-assisted research operates under verification controls.
* Attorneys treat the AI as a research-acceleration tool, not a citation source. Which is what the AI should be.

## Stratix capabilities used

* [Custom code grader](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/cookbook/custom-code-scorer.md) — citation-existence verification against legal databases
* [Judges with GEPA optimization](/8.-evaluate-score-the-outputs/judges-1.md) — holding accuracy and quote accuracy
* [Trace evaluations](/8.-evaluate-score-the-outputs/trace-evaluations.md) — every production session
* [Notifications](https://github.com/LayerLens/gitbook-full/blob/main/13-reference/sdk-python/notifications.md) — flag unverified citations for attorney review

## Replicate this

* [Industry → Legal](https://github.com/LayerLens/gitbook-full/blob/main/04-use-cases/industry/legal/README.md)
* [Cookbook: catch hallucinations](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/cookbook/catch-hallucinations.md)
* [Cookbook: legal contract-review faithfulness](https://github.com/LayerLens/gitbook-full/blob/main/04-use-cases/industry/legal/cookbook/industry-legal-contract.md) (analogous pattern for transactional work)
* [Use case: RAG evaluation](/4.1-general-use-cases/rag-evaluation.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/4.2-industry-use-cases/pattern-2.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.