# Tutorial 5: Optimize a judge with GEPA

**Time:** \~45 minutes (most of which is GEPA running) **Level:** Intermediate **You'll build:** A judge tuned against ≥30 labels, with measurable agreement-rate lift.

## What you'll learn

* How to prepare a labeled dataset for GEPA
* How to run a GEPA optimization
* How to interpret the before/after agreement rates
* When to re-optimize

## Prerequisites

* [ ] Completed [Tutorial 2](/8.-evaluate-score-the-outputs/02-first-judge.md) — you have a judge
* [ ] At least 30 labeled examples — input/output pairs with the human verdict you want the judge to produce
* [ ] ECU credits

## Step 1: Prepare the labeled dataset

Create a JSONL with one row per example:

```json
{"input": "...", "output": "...", "label": "good"}
{"input": "...", "output": "...", "label": "bad"}
```

Upload to Premium as a labeled dataset:

```python
from layerlens import Stratix
client = Stratix()

dataset = client.datasets.create_from_file("./labels.jsonl", labels_column="label")
print(dataset.id)
```

## Step 2: Inspect baseline agreement

Run the judge over the labeled set without optimization:

```python
baseline = client.judges.evaluate_against_labels(
 judge_id="judge-id",
 dataset_id=dataset.id,
)
print(f"Baseline agreement: {baseline.agreement_rate:.3f}")
```

Anything below 80% is noisy. GEPA usually closes the gap.

## Step 3: Run GEPA

```python
opt = client.judge_optimizations.create(
 judge_id="judge-id",
 labeled_examples_dataset_id=dataset.id,
 iterations=20,
)
result = client.judge_optimizations.wait_for_completion(opt.id, timeout=1800)
```

GEPA explores rubric variations, runs each against your labels, picks the best.

## Step 4: Read the result

```python
print(f"Before: {result.before:.3f}")
print(f"After: {result.after:.3f}")
print(f"Diff: +{result.after - result.before:.3f}")
print()
print("Updated rubric:")
print(result.updated_rubric)
```

## Step 5: Validate on held-out examples

If you have a held-out labeled set, run the optimized judge against it. If agreement is similar, you're not over-fitting; if much lower, expand your training set or check for label inconsistencies.

## When to re-optimize

* After labeling ≥30 more examples
* When your team's quality bar changes
* When the underlying judge model is upgraded

## What's next

* [Concept: Judges](/8.-evaluate-score-the-outputs/judges-1.md)
* [Stratix Premium — Judge optimization](/9.-improve-tune-the-system/judge-optimization.md)
* [Cookbook: GEPA recipes](/2.-get-started/all-cookbook-recipes.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/9.-improve-tune-the-system/05-gepa-optimize.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
