# AI quality gates in CI/CD Code regressions are caught by tests. AI regressions slip through unless you treat the eval as a test. The CI/CD-quality-gates use case wires Stratix evaluations into your pipeline so a prompt change that drops accuracy can't merge. ## The shape of the work 1. **Pick your gating evaluation.** A small, fast subset of your benchmarks and judges. Goal: meaningful signal in under 5 minutes. 2. **Add the eval as a CI step.** GitHub Actions, GitLab CI, Buildkite — call the SDK or CLI with the eval config. 3. **Compare against baseline.** Stratix maintains an evaluation history. The CI step compares the new run to the most recent main-branch run. 4. **Fail the build on regression.** If any tracked dimension drops by more than your tolerance, fail. 5. **Annotate the PR.** Post the score table back to the PR for reviewer context. ## Why it works on Stratix * **Evaluation history is built-in.** No need to roll your own diff infrastructure. * **SDK and CLI both support CI.** Pick whichever feels native. * **Compare-models lets you stage upgrades.** Switch the model in CI, verify scores, then promote. * **GEPA-optimized judges keep gates honest.** Out-of-the-box judges flake; tuned ones don't. ## Tools you'll use * [SDK: `client.evaluations.create()`](/4.1-general-use-cases/general.md) * [CLI: `layerlens evaluate`](/4.1-general-use-cases/general.md) * [Stratix Premium — Evaluations](/8.-evaluate-score-the-outputs/evaluations.md) ## Outcomes you should see You'll know this is working when: * **Zero AI regressions reach production** because the CI gate caught them. * **CI gate run time stays under 5 minutes** even as the eval grows. * **Tolerance is calibrated** — false-positive rate <5%, false-negative rate <2%. * **Engineers cite the eval result in the PR description** as a matter of habit. ## Anti-patterns * **5-hour CI evals.** If your team waits 5 hours for a merge gate, they'll ignore it. Pick a fast subset. * **Whole-org tolerance.** Different repos have different quality bars. Set per-repo or per-feature tolerances. * **No alerting on baseline drift.** If the baseline silently drops over 4 PRs, you never noticed because each PR was within tolerance. Watch the trend. ## Where to next * [Tutorial: Wire CI/CD quality gates](/6.-build-wire-your-code/03-cicd-gates.md) * [Workflow: Govern](/9.-improve-tune-the-system/workflow.md) * [Cookbook: CI/CD recipes](/2.-get-started/all-cookbook-recipes.md) * [Integrations: GitHub Actions](/4.1-general-use-cases/general.md) --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.layerlens.ai/4.1-general-use-cases/ai-quality-gates-cicd.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.