# Models and benchmarks

The two primitives that anchor every Stratix evaluation.

## Model

A specific LLM identified by `provider/model-name` (e.g., `openai/gpt-4o`, `anthropic/claude-opus-4-7`). In Stratix:

* Public models live in the global catalog (200+)
* Private models are BYOK custom models in your org

Each model has metadata: provider, capabilities, context window, modalities, licensing, pricing, score history.

### Why it matters

The model is the substrate. Picking the wrong substrate is the most expensive AI mistake.

## Benchmark

A standardized dataset and grading procedure. In Stratix:

* Public benchmarks live in the catalog (52+, e.g., MMLU, HumanEval, GSM8K)
* Private benchmarks are datasets you upload to your org

Each benchmark has metadata: capability tag, sample size, methodology, license, score history.

### Why it matters

Benchmarks are how you make "is the model good?" measurable. Pick benchmarks whose distribution matches your task; ignore the ones that don't.

## How they relate

Every public evaluation is `(model, benchmark, scoring config)`. The leaderboard is "all public evaluations of `benchmark X`, sorted by score."

## Capability tags

Both models and benchmarks carry capability tags:

* `chat` — general dialog
* `code` — code generation
* `reasoning` — chain-of-thought, math, logic
* `multilingual`
* `vision`
* `multi-turn`
* `tool-use`

When picking, match the model's strongest tag against the benchmark's tag.

## Score history

The catalog preserves history. A model evaluated against MMLU last quarter gets a score; if it's re-evaluated against an updated MMLU, both scores remain visible with methodology versions.

## Common confusions

* **A benchmark is not a "test set" in the ML sense.** It's a public, curated, citable evaluation harness. Your private dataset is your "test set."
* **A model's score on benchmark X doesn't predict its score on your data.** Public scores are a leading indicator; your private evaluation is the verdict.

## Where to next

* [Evaluations](/8.-evaluate-score-the-outputs/evaluations-1.md)
* [Stratix Public — Models catalog](/5.-select-pick-the-model/models-catalog.md)
* [Stratix Public — Benchmarks catalog](/5.-select-pick-the-model/benchmarks-catalog.md)
* [BYOK models](/5.-select-pick-the-model/byok-models.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/5.-select-pick-the-model/models-and-benchmarks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
