# Pattern: AI tutor

An EdTech provider operates an AI tutor that helps students through math, reading, and science. The product's value comes from **teaching**, not answering — a tutor that gives away the answer accelerates the assignment but undermines the learning. COPPA and FERPA frame the regulatory bar; the pedagogical dimension is what differentiates the product.

This pattern shows how to evaluate tutor pedagogical quality alongside accuracy and safety.

## What's at stake

| Risk dimension                                              | Magnitude                                            | Framework                         |
| ----------------------------------------------------------- | ---------------------------------------------------- | --------------------------------- |
| Learning-outcome harm from answer-giving instead of guiding | Long-term student-outcome impact                     | Education research                |
| COPPA violation (under-13 users)                            | Per-violation civil penalties                        | COPPA                             |
| FERPA violation (student records in trace bodies)           | Loss of federal funding eligibility, civil penalties | FERPA                             |
| Age-inappropriate content exposure                          | District contract termination, brand damage          | District procurement requirements |
| Accessibility (Section 508 / WCAG)                          | Procurement disqualification                         | Section 508 / WCAG 2.1 AA         |

## The evaluation pattern

A **multi-criteria evaluation** runs against curated tutoring-conversation traces.

1. **Pedagogical-quality judge** (GEPA-tuned against ≥50 educator-labeled examples — scored output, "guide vs. give") — penalize responses that hand the student the answer when the assignment expects the student to derive it.
2. **Subject-accuracy scorer** — math and science answers verified against a deterministic answer key; language and reading answers checked against rubric criteria.
3. **Age-appropriateness judge** (GEPA-tuned against ≥50 examples — scored output) — content matches the age band declared in the user's profile.
4. **PII-redaction scorer** (deterministic regex) — no student-record fields appear in any logged span (FERPA).
5. **Reading-level scorer** — explanations at or below the configured grade level for the student's age band.

> Don't have labels yet? See [Bootstrap a judge before GEPA](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/guides/bootstrap-judges.md) for the week-1 setup.

**Continuous trace evaluation:** sampled at 0.5% of tutoring sessions, daily. Pedagogical-quality trends visible to the curriculum-design team.

## Configuration in code

```python
# Python (SDK)
from layerlens import Stratix
client = Stratix()

pedagogical = client.judges.create(
 name="socratic-vs-give",
 evaluation_goal="Score 1-5: did the tutor guide the student to the answer (5) or hand it over (1)? Penalize answers that bypass the learning step the assignment expects.",
)

age_appropriate = client.judges.create(
 name="age-appropriate",
 evaluation_goal="Content matches the student's declared age band (input.age_band).",
)

pii_redaction = client.scorers.create_code(
 name="ferpa-pii-redaction",
 code="result = {'passed': not contains_student_record_fields(trace)}",
)

trace_eval = client.trace_evaluations.create(
 trace_set={"tags": {"feature": "tutor"}, "sample_rate": 0.005},
 scorers=[pii_redaction.id, subject_answer_key_id],
 judges=[pedagogical.id, age_appropriate.id],
 schedule="daily",
)
```

```typescript
// TypeScript (REST)
const r = await fetch("https://stratix.layerlens.ai/api/v1/trace-evaluations", {
 method: "POST",
 headers: {
 "X-API-Key": process.env.LAYERLENS_STRATIX_API_KEY!,
 "Content-Type": "application/json",
 },
 body: JSON.stringify({
 trace_set: { tags: { feature: "tutor" }, sample_rate: 0.005 },
 scorers: [piiRedactionId, subjectAnswerKeyId],
 judges: [pedagogicalId, ageAppropriateId],
 schedule: "daily",
 }),
});
```

## What you get

* Direct-answer rate measured per subject and grade band; product-led iteration drives it down over time.
* Age-appropriateness regressions blocked at the per-student session boundary.
* FERPA-compliant trace handling enforced by code graders.
* Auditor-ready evidence for district procurement and state-board reviews.

## Stratix capabilities used

* [Judges with GEPA optimization](/8.-evaluate-score-the-outputs/judges-1.md) — pedagogical-quality and age-appropriateness
* [Custom code graders](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/cookbook/custom-code-scorer.md) — subject-accuracy answer-key check, PII-redaction regex
* [Trace evaluations](/8.-evaluate-score-the-outputs/trace-evaluations.md) — continuous sampled

## Replicate this

**Get started:** [Cookbook: education tutor helpfulness](https://github.com/LayerLens/gitbook-full/blob/main/04-use-cases/industry/education/cookbook/industry-education-tutor.md) is the runnable starter.

* [Industry → Education](/4.2-industry-use-cases/education.md)
* [Concept: Judges](/8.-evaluate-score-the-outputs/judges-1.md)
* [Workflow: Evaluate](/9.-improve-tune-the-system/workflow.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/4.2-industry-use-cases/pattern-8.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
