# Glossary

## Agentic evaluation

Pre- and post-deployment evaluation of multi-step AI agents. Combines natural-language assertions, deterministic rules, and LLM judges to verify that an agent reaches a correct state, takes legal actions along the way, and satisfies regression criteria. Surfaced in Stratix Premium under **Agent Evaluation** and exercised at `stratix.layerlens.ai/dashboard/agent-evaluation`. See [Agentic evaluations](/4.1-general-use-cases/agentic-evals-overview.md).

## Benchmark

A standardized dataset and grading procedure used to measure model performance. Stratix hosts 52+ public benchmarks (MMLU, HumanEval, GSM8K, etc.) in the public catalog. See [Benchmarks catalog](/5.-select-pick-the-model/benchmarks-catalog.md).

## BYOK (Bring Your Own Key)

The ability to use your own LLM provider API keys with Stratix. Two scopes:

* **BYOK custom models** — register an OpenAI-compatible endpoint as a custom model the Premium UI can target.
* **BYOK provider keys** — supply your own OpenAI/Anthropic/Google keys for evaluations that consume tokens.

See [BYOK custom models](/5.-select-pick-the-model/byok-custom-models.md) and [BYOK models concept](/5.-select-pick-the-model/byok-models.md).

## Compare models

A Stratix Public and Premium feature that puts two models head-to-head across a benchmark or your own dataset. See [Compare models](/5.-select-pick-the-model/compare-models.md).

## Continuous evaluation

The practice of running evaluations on a recurring schedule against live traffic — not only at release time. Stratix supports continuous evaluation via trace-evaluations on real production traces. See [Continuous evaluation](/7.-observe-see-whats-happening/continuous-evaluation.md).

## ECU (Evaluation Compute Unit)

The Stratix billing unit for compute-intensive operations: model inference, judge runs, GEPA optimization, etc. Free tier includes a starter ECU balance; Premium tier is pay-as-you-go. See [ECU credits](/11.-admin/concepts-ecu-credits.md).

## Evaluation

A run that scores model output against a benchmark or dataset. Results are stored, browsable, and shareable. Two flavors:

* **Public evaluations** — runs visible to everyone on `stratix.layerlens.ai`.
* **Private evaluations** — runs scoped to your organization in `stratix.layerlens.ai`.

See [Evaluations](/8.-evaluate-score-the-outputs/evaluations-1.md).

## Evaluation space

A workspace bundling a model selection, dataset/benchmark selection, and scoring config. Spaces let you re-run the same evaluation as conditions change. Public spaces are shareable; Premium spaces are org-scoped. See [Evaluation spaces](/8.-evaluate-score-the-outputs/evaluation-spaces.md).

## GEPA judge optimization

Automatic tuning of an LLM judge's prompt against a labeled ground-truth set. GEPA finds the prompt that best matches your labels, raising judge agreement with humans. See [Judge optimization (GEPA)](/8.-evaluate-score-the-outputs/judges-1.md#judge-optimization-gepa).

## Judge

An LLM-based grader that scores model output along subjective dimensions (helpfulness, accuracy, faithfulness, etc.). Stratix Premium supports building custom judges, optimizing them with GEPA, and applying them to evaluations and traces. See [Judges](/8.-evaluate-score-the-outputs/judges-1.md).

## Model

An LLM (e.g., GPT-5.3, Claude Opus 4.6, Gemini 3.1 Pro). Stratix catalogs 175+ models with metadata, benchmark scores, and per-context information. See [Models and benchmarks](/5.-select-pick-the-model/models-and-benchmarks.md).

## Multi-org / Organization

Stratix supports users belonging to multiple organizations. After sign-in, the user picks an active organization. Org-switch updates the session and reloads org-scoped data. See [Organizations](https://github.com/LayerLens/gitbook-full/blob/main/13-reference/sdk-python/organizations.md).

## Public catalog

The set of models, benchmarks, evaluations, and spaces visible without authentication on `stratix.layerlens.ai`. See [Stratix Public](https://github.com/LayerLens/gitbook-full/blob/main/assets/screenshots/README.md).

## Quarterly Reports

Long-form research reports (Q1-Q4) summarizing model performance, benchmark winners, and AI evaluation trends. See [Quarterly reports](/5.-select-pick-the-model/quarterly-reports.md).

## Scorer

A code grader (regex match, JSON-schema validation, exact match, semantic similarity, etc.). Scorers are fast and cheap and suit objective dimensions. See [Scorers](/8.-evaluate-score-the-outputs/scorers-1.md).

## Span

A child unit of work within a trace. A span represents one tool call, one LLM call, or one logical step. See [Traces and spans](/6.-build-wire-your-code/traces-and-spans.md).

## Stratix Premium

The logged-in workspace at `stratix.layerlens.ai`. Run private evaluations, build judges, score traces, manage scorers, run agentic evals, govern AI quality.

## Stratix Public

The anonymous browsing experience at `stratix.layerlens.ai`. Browse models, benchmarks, public evaluations, public spaces, and compare models.

## Trace

A record of an AI call (or a chain of calls) — inputs, outputs, latencies, costs, errors, and any spans. Traces are uploaded to Stratix Premium for inspection and trace-evaluation. See [Traces and spans](/6.-build-wire-your-code/traces-and-spans.md).

## Trace evaluation

A scoring run applied to ingested traces using one or more scorers and judges. Lets you grade real production traffic. See [Trace evaluations](/8.-evaluate-score-the-outputs/trace-evaluations.md).

***

Seeded from the [LayerLens "25 AI evaluation terms" blog post](https://layerlens.ai/blog) and the live product surface. If a term is missing, open an issue or PR.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/glossary.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
