# Continuous evaluation

Pre- and post-deployment evaluations catch regressions before merge. **Continuous evaluation** catches them after merge — when something slips through, when production traffic shifts, when the model provider quietly updates their backend.

## The shape of the work

1. **Ingest live traces.** Your application emits trace JSON; Stratix ingests via API or SDK.
2. **Define a trace evaluation.** Pick the scorers and judges you want applied. The same scorers and judges from your pre- and post-deployment evaluations work here.
3. **Schedule it.** Run the trace evaluation on a recurring cadence — daily, hourly, per-batch.
4. **Watch the trend.** Stratix shows score-over-time per scorer per dimension.
5. **Get notified on drift.** When a score crosses a threshold, surface a notification (Premium) or a Slack alert.

## Why it works on Stratix

* **Same engine across pre-deploy and continuous.** The scorers and judges you built for CI gates are reusable.
* **Trace-first design.** Stratix's trace pipeline is shaped for continuous workflows; ingestion is rate-limited and persistent.
* **GEPA-optimized judges.** Your judge agreement with humans stays tight as live traffic shifts.
* **Org-scoped and tenant-isolated.** One tenant's burst doesn't affect another.

## Tools you'll use

* [Stratix Premium — Trace evaluations](/8.-evaluate-score-the-outputs/trace-evaluations.md)
* [Stratix Premium — Traces](/7.-observe-see-whats-happening/traces.md)
* [SDK: trace ingestion](/4.1-general-use-cases/general.md)
* [Notifications](https://github.com/LayerLens/gitbook-full/blob/main/13-reference/sdk-python/notifications.md)

## Outcomes you should see

You'll know this is working when:

* **Mean time to detect a quality regression drops to <1 hour** (versus days/weeks of customer-driven discovery).
* **Score-over-time charts are reviewed weekly** by the team that owns the AI feature.
* **Threshold alerts fire and resolve** — silence is suspicious; chronic firing means the threshold needs tuning.
* **Sample bias is measured**, not assumed — you can show the sample distribution matches production.

## Anti-patterns

* **Only scoring sampled traces.** Sampling is fine, but watch sample bias. Production distribution drifts; sample drifts with it.
* **Mixing pre-deploy and continuous in one config.** They have different cadence and different cost profiles. Keep them separate.
* **No alerting.** A continuously-evaluated trace pipeline that nobody reads is decoration.

## Where to next

* [Concept: Continuous evaluation](/4.1-general-use-cases/continuous-evaluation.md)
* [Tutorial: Score live traces](/8.-evaluate-score-the-outputs/04-score-traces.md)
* [Workflow: Observe](/9.-improve-tune-the-system/workflow.md)
* [Cookbook: continuous-eval recipes](/2.-get-started/all-cookbook-recipes.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/4.1-general-use-cases/continuous-evaluation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
