# Your first judge

A judge is an LLM that scores subjective dimensions — helpfulness, faithfulness, tone, safety — that a code grader can't grade. Building one takes about 15 minutes.

## Prerequisites

* [ ] A Stratix Premium account ([sign up](/2.-get-started/sign-up.md))
* [ ] A clear definition of what you want to grade (write it in plain English first)
* [ ] At least 10 example outputs labeled "good" or "bad" — more is better

## Steps

### 1. Open Judges

Left rail: **Agent Evaluation → Judges**. Click **New judge**.

### 2. Name and describe

* **Name:** e.g., "helpfulness-customer-support"
* **Description:** what this judge grades, in one sentence
* **Output type:** binary (pass/fail), score (e.g., 1-5), or labeled (e.g., "helpful" / "neutral" / "unhelpful")

### 3. Pick a judging model

The model that runs the rubric. Default is a balanced choice; pick a stronger model for harder rubrics.

### 4. Write the rubric

The rubric is the prompt the judging model uses. Best practices:

* State the dimension you're grading
* Describe what "good" looks like with examples
* Describe what "bad" looks like with examples
* Show the output format you want back

The Premium UI provides a starter template you can adapt.

### 5. Test on a few examples

Paste 3-5 sample outputs and run the judge. Read the verdicts. If they don't match your intuition, iterate the rubric.

### 6. Optional: GEPA-optimize

If you have ≥30 labeled examples, run **GEPA optimization**. Stratix tunes the rubric to better match your labels. Agreement with humans typically rises 10-20 percentage points.

[More about GEPA optimization](/8.-evaluate-score-the-outputs/judges-1.md#judge-optimization-gepa)

### 7. Save

The judge is now reusable in any evaluation, trace evaluation, or agentic evaluation in your org.

## Verify

You should be able to apply your judge to a single sample input/output and get back a verdict.

## What to try next

* **Apply the judge to your first evaluation.** Open the eval, add the judge, rerun.
* **Apply the judge to a trace evaluation.** Real production data, judged.
* **Build a second judge** for a different dimension. Stratix evaluations let you stack many judges.

## Where to next

* [Tutorial: Build your first judge](/8.-evaluate-score-the-outputs/02-first-judge.md)
* [Tutorial: Optimize a judge with GEPA](/9.-improve-tune-the-system/05-gepa-optimize.md)
* [Concept: Judges](/8.-evaluate-score-the-outputs/judges-1.md)
* [Stratix Premium — Judges](/8.-evaluate-score-the-outputs/judges.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/2.-get-started/first-judge.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
