# Pattern: product description generation

A retail or marketplace platform uses AI to generate product titles, descriptions, and SEO copy from a structured catalog (specs, dimensions, compatibility, brand attributes). At scale (hundreds of thousands to millions of SKUs), AI replaces a content team — but description errors directly drive returns, regulatory exposure, and brand erosion.

This pattern shows how to evaluate generated descriptions for specification accuracy and brand voice before they ship.

## What's at stake

| Risk dimension                                         | Magnitude                                                    | Framework                                    |
| ------------------------------------------------------ | ------------------------------------------------------------ | -------------------------------------------- |
| Lift in return rate from inaccurate specs              | 15-30% return-rate increase documented                       | E-commerce returns research                  |
| FTC false-advertising exposure                         | Per-violation civil penalties + corrective-advertising costs | FTC Endorsement Guides + product claim rules |
| SEO penalties on duplicate / thin / inaccurate content | Ranking loss, organic traffic impact                         | Search-engine quality guidelines             |
| Brand-voice drift across categories                    | Cross-category brand erosion                                 | Internal brand-equity research               |

## The evaluation pattern

A **pre-publication evaluation** runs on every batch of AI-generated descriptions:

1. **Custom code grader (specification accuracy)** — extracts every claimed dimension, weight, compatibility, and material from the description; compares against the catalog row exactly. Any mismatch = fail.
2. **Substring scorer** — required disclaimers (e.g., regulated-category warnings, age restrictions) present where required.
3. **Brand-voice judge** (GEPA-tuned against ≥50 brand-marketing-team-labeled examples — scored output) — voice consistency across categories.
4. **Plagiarism / uniqueness scorer** — generated text must score above a similarity threshold against the existing catalog.

> Don't have labels yet? See [Bootstrap a judge before GEPA](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/guides/bootstrap-judges.md) for the week-1 setup.

**Continuous trace evaluation:** sampled at 1% of newly published descriptions, daily. Threshold alerts on specification-accuracy regressions route to the catalog-ops team.

## Configuration in code

```python
# Python (SDK)
from layerlens import Stratix
client = Stratix()

spec_scorer = client.scorers.create_code(
 name="spec-accuracy",
 code="""
claimed = extract_specs(output)
truth = catalog.get(input['sku']).specs
result = {'passed': all(claimed.get(k) == v for k, v in truth.items()), 'mismatches': diff(claimed, truth)}
""",
)

brand_voice = client.judges.create(
 name="brand-voice",
 evaluation_goal="Score brand-voice consistency on a 1-5 scale. Voice should match the brand guidelines for category {category}.",
)

trace_eval = client.trace_evaluations.create(
 trace_set={"tags": {"feature": "product-description"}, "sample_rate": 0.01},
 scorers=[spec_scorer.id],
 judges=[brand_voice.id],
 schedule="daily",
)
```

```typescript
// TypeScript (REST)
const r = await fetch("https://stratix.layerlens.ai/api/v1/trace-evaluations", {
 method: "POST",
 headers: {
 "X-API-Key": process.env.LAYERLENS_STRATIX_API_KEY!,
 "Content-Type": "application/json",
 },
 body: JSON.stringify({
 trace_set: { tags: { feature: "product-description" }, sample_rate: 0.01 },
 scorers: [specScorerId],
 judges: [brandVoiceId],
 schedule: "daily",
 }),
});
```

## What you get

* Specification-error rate stays under 0.5% (vs. 4-15% baseline reported in industry research).
* Return-rate on AI-described products converges to or beats human-written baselines.
* Pre-publication block is a hard gate; descriptions that fail any code grader never reach the storefront.
* Brand voice variance across categories is a measured dimension on the dashboard.

## Stratix capabilities used

* [Custom code graders](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/cookbook/custom-code-scorer.md) — specification-extraction comparison
* [Judges with GEPA optimization](/8.-evaluate-score-the-outputs/judges-1.md) — brand voice
* [Trace evaluations](/8.-evaluate-score-the-outputs/trace-evaluations.md) — continuous sampled
* [Notifications](https://github.com/LayerLens/gitbook-full/blob/main/13-reference/sdk-python/notifications.md)

## Replicate this

**Get started:** [Cookbook: retail product Q\&A](https://github.com/LayerLens/gitbook-full/blob/main/04-use-cases/industry/retail-ecommerce/cookbook/industry-retail-product-qa.md) is the closest runnable starter.

* [Industry → Retail and e-commerce](/4.2-industry-use-cases/retail-ecommerce.md)
* [Workflow: Evaluate](/9.-improve-tune-the-system/workflow.md)
* [Use case: Continuous evaluation](/4.1-general-use-cases/continuous-evaluation.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/4.2-industry-use-cases/pattern-5.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
