# Your first evaluation

A private evaluation runs a model against a dataset and scores the outputs. Here's the fastest way to do one.

## Prerequisites

* [ ] A Stratix Premium account ([sign up](/2.-get-started/sign-up.md))
* [ ] Either: a small dataset (CSV or JSONL with input → expected output columns), or willingness to use a starter dataset

## Steps

### 1. Open Evaluations

In the left rail, click **Evaluations**. Click **New evaluation**.

### 2. Pick a model

Search and select. The full public catalog is searchable, plus any BYOK custom models you've registered.

### 3. Pick a dataset

Two options:

* **Upload a dataset** — drag a CSV/JSONL onto the upload zone. Stratix infers the schema; you confirm input and expected-output columns.
* **Use a starter dataset** — pick a benchmark from the public catalog (e.g., MMLU subset).

### 4. Pick scoring

For your first eval, pick a code grader that fits your data:

* **Exact match** — outputs equal to expected
* **Semantic similarity** — fuzzy match using an embedding model
* **Regex match** — outputs match a pattern
* **JSON schema** — outputs validate against a schema

Skip judges for now — you'll build one later.

### 5. Preview cost

Stratix shows the worst-case ECU consumption before you run. Approve.

### 6. Run

Click **Run**. The evaluation queues; results stream in row-by-row.

### 7. Read the results

The results page shows:

* Overall score
* Per-row scores (input → output → score)
* Score distribution
* Latency and cost per row

## Verify

You should have a completed evaluation with a top-line score. Click any failing row to see the input, the model output, and the expected output side-by-side.

## What just happened

Stratix called the model on each input row, ran the scorer against each output, and aggregated the result. Every row is stored — you can re-open this evaluation later or compare it against a future run.

## What to try next

* **Run the same evaluation against a different model.** Stratix's compare-models view shows both runs side-by-side.
* **Iterate on the prompt.** If your dataset includes a prompt template column, change the template and rerun.
* **Add a judge.** See [First judge](/2.-get-started/first-judge.md) for subjective dimensions.

## Where to next

* [First judge](/2.-get-started/first-judge.md)
* [First trace](/2.-get-started/first-trace.md)
* [Tutorial: First evaluation in 10 minutes](/8.-evaluate-score-the-outputs/01-first-evaluation.md)
* [Stratix Premium — Evaluations](/8.-evaluate-score-the-outputs/evaluations.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/2.-get-started/first-evaluation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
