# Why evaluate?

Most teams ship AI by intuition. They pick a model because it's popular. They iterate prompts because the last few examples felt better. They watch CSAT in production because that's the only signal that survives. By the time they know something's wrong, customers have noticed.

## The cost of unmeasured AI

* **Wrong model from the start.** A 4-week feature build gets thrown out when the foundation model can't do the task.
* **Silent regressions.** A prompt change five PRs ago broke an edge case nobody noticed; customers find it first.
* **Vendor migrations driven by hunches.** Switching from one provider to another without quantitative justification.
* **Eval-by-eyeballing.** A handful of cherry-picked examples drive ship/no-ship decisions.

Evaluation discipline gets you out of all four traps.

## What evaluation discipline looks like

1. **Quantitative model selection.** You pick a model because it scored higher on the benchmarks closest to your task. Not because it was on the front page of HN.
2. **Versioned regression suites.** Every prompt change runs against a known set with a known baseline.
3. **Continuous scoring on production traffic.** When live behavior drifts, you know before customers do.
4. **Auditable evidence.** Releases cite the evaluation that gated them.

## What it takes

* A shared **mental model** for evaluation (this documentation)
* A **catalog** to anchor your model choices ([Stratix Public](/1.-introduction/01-introduction.md))
* A **workspace** to run private evaluations ([Stratix Premium](https://github.com/LayerLens/gitbook-full/blob/main/05-select/catalog/premium-workspace-overview.md))
* **Automation** to wire it into CI/CD ([SDK](/6.-build-wire-your-code/sdk-python.md), [CLI](/13.1-sdk-and-apis/cli.md))

## Three failure modes Stratix is shaped to prevent

1. **The "hopeful" launch.** Shipping without quantitative evidence the AI works.
2. **The slow regression.** Quality erosion over a quarter that nobody catches because nobody's watching the trend.
3. **The vendor lock-in mistake.** Picking a model based on superficial features and discovering, six months later, that switching is now expensive.

## Where to next

* [What is LayerLens Stratix?](/1.-introduction/what-is-layerlens-stratix.md)
* [Workflow](/1.-introduction/the-stratix-workflow.md)
* [Use cases](/4.1-general-use-cases/general.md)
* [Industry patterns](/4.2-industry-use-cases/travel-hospitality.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/1.-introduction/why-evaluate.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
