# Pattern: content moderation

A social platform, marketplace with user-generated content, or media-streaming service uses AI to moderate user content — text, image, video, audio — for policy violations across hate speech, harassment, CSAM, violence, misinformation, and more. The cost of under-moderation includes regulatory enforcement (EU DSA, OSA, state-level US laws); the cost of over-moderation is creator churn and First-Amendment-style litigation.

This pattern shows how to evaluate cross-language, per-category moderation accuracy.

## What's at stake

| Risk dimension                         | Magnitude                          | Framework                        |
| -------------------------------------- | ---------------------------------- | -------------------------------- |
| EU Digital Services Act non-compliance | Up to 6% of global annual revenue  | EU DSA                           |
| UK Online Safety Act non-compliance    | Up to 10% of global annual revenue | OSA                              |
| Advertiser-boycott revenue impact      | Multi-quarter advertiser pauses    | Industry brand-safety reporting  |
| Over-moderation creator-churn          | Long-tail platform-health impact   | Public creator-platform research |

## The evaluation pattern

A **multi-criteria evaluation** runs over a labeled trace set spanning every supported language and policy category.

1. **Per-category classifier scorer** — recall by policy category (hate speech, CSAM, violence, harassment, misinformation, sexual content, self-harm, regulated goods). Recall must meet per-category minimums; CSAM and self-harm categories require near-100%.
2. **Cross-language parity scorer** (custom code) — for each policy category, the per-language accuracy must be within 5 percentage points of the English baseline. Disparities above the threshold flag as regressions.
3. **Borderline judge** (GEPA-tuned against ≥50 trust-and-safety-team-labeled examples — scored output) — handles satire, news commentary, cultural context, reclaimed slurs. Higher false-positive cost than false-negative.
4. **Latency scorer** — moderation decisions must complete under per-tier SLOs (e.g., 200ms p95 for live-stream, 5s p95 for upload).

> Don't have labels yet? See [Bootstrap a judge before GEPA](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/guides/bootstrap-judges.md) for the week-1 setup.

**Continuous trace evaluation:** sampled at 0.5% of moderation decisions, hourly. Per-language and per-category dashboards visible to T\&S leadership; thresholds wired to PagerDuty for critical-category drift.

## Configuration in code

```python
# Python (SDK)
from layerlens import Stratix
client = Stratix()

cross_language = client.scorers.create_code(
 name="cross-language-parity",
 code="""
en_acc = mean(s for t, s in zip(traces, scores) if t.tags['lang'] == 'en')
gaps = {l: en_acc - mean(s for t, s in zip(traces, scores) if t.tags['lang'] == l) for l in supported_langs}
result = {'passed': max(abs(g) for g in gaps.values()) <= 0.05, 'gaps': gaps}
""",
)

borderline = client.judges.create(
 name="borderline-content",
 evaluation_goal="Distinguish policy-violating content from satire, news commentary, cultural context, and reclaimed slurs. Score 1 (clearly fine) to 5 (clearly violating).",
)

trace_eval = client.trace_evaluations.create(
 trace_set={"tags": {"feature": "moderation"}, "sample_rate": 0.005},
 scorers=[cross_language.id],
 judges=[borderline.id],
 schedule="hourly",
)
```

```typescript
// TypeScript (REST)
const r = await fetch("https://stratix.layerlens.ai/api/v1/trace-evaluations", {
 method: "POST",
 headers: {
 "X-API-Key": process.env.LAYERLENS_STRATIX_API_KEY!,
 "Content-Type": "application/json",
 },
 body: JSON.stringify({
 trace_set: { tags: { feature: "moderation" }, sample_rate: 0.005 },
 scorers: [crossLanguageId],
 judges: [borderlineId],
 schedule: "hourly",
 }),
});
```

## What you get

* Cross-language parity becomes a measured dimension on the T\&S dashboard.
* Per-category recall floors enforced — CSAM and self-harm regressions block release.
* Auditor-readable evaluation history for EU DSA / OSA risk-assessment requirements.
* Appeal-rate signal correlates with judge false-positive trend; T\&S can investigate appeals upstream.

## Stratix capabilities used

* [Custom code graders](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/cookbook/custom-code-scorer.md) — cross-language parity
* [Judges with GEPA optimization](/8.-evaluate-score-the-outputs/judges-1.md) — borderline-content scoring
* [Trace evaluations](/8.-evaluate-score-the-outputs/trace-evaluations.md) — continuous sampled
* [Notifications](https://github.com/LayerLens/gitbook-full/blob/main/13-reference/sdk-python/notifications.md) — PagerDuty + Slack routing

## Replicate this

**Get started:** [Use case: Continuous evaluation](/4.1-general-use-cases/continuous-evaluation.md) describes the rolling-sample shape this pattern uses.

* [Industry → Media and entertainment](/4.2-industry-use-cases/media-entertainment.md)
* [Concept: Continuous evaluation](/7.-observe-see-whats-happening/continuous-evaluation.md)
* [Cookbook: catch hallucinations](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/cookbook/catch-hallucinations.md) (analogous pattern for misinformation moderation)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/4.2-industry-use-cases/pattern-6.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
