# Pattern: multilingual booking assistant

A global hotel chain, airline, or OTA operates an AI booking assistant in 15-20+ languages. The assistant takes natural-language requests, parses dates and traveler preferences, and produces structured bookings. Date-format ambiguity (DD/MM vs. MM/DD), traveler-name transliteration, currency handling, and cultural appropriateness all vary by locale — and a wrong-day check-in is a measurable revenue and brand event.

This pattern shows how to evaluate cross-language booking accuracy.

## What's at stake

| Risk dimension                                                        | Magnitude                                              | Framework                              |
| --------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------- |
| Per-incident wrong-booking cost                                       | $50–$500 (cancellation + rebooking + service-recovery) | Hospitality industry benchmarks        |
| Brand impact from cultural-appropriateness failures                   | Localized review-platform damage                       | Public review / OTA platform reporting |
| Accessibility (Section 508 / WCAG) for travelers using assistive tech | Procurement disqualification (gov / corp travel)       | Section 508 / WCAG                     |
| Currency / payment compliance failures                                | PCI exposure if mishandled                             | PCI DSS                                |

## The evaluation pattern

A **per-language evaluation** runs against booking-conversation traces in every supported language.

1. **Booking-detail accuracy scorer (custom code)** — extracted dates, names, traveler counts, room/cabin types, dietary preferences, and accessibility needs match the labeled ground truth exactly. Date-format parsing scored as a separate dimension.
2. **Cross-language parity scorer** — per-language booking-detail accuracy stays within 5 percentage points of the primary-language baseline.
3. **Cultural-appropriateness judge** (GEPA-tuned against ≥50 native-speaker-reviewer-labeled examples per language family — scored output) — tone and cultural fit per locale.
4. **Currency / payment scorer** — currency conversions cited explicitly with timestamp; PCI fields never appear in trace bodies.
5. **Tone-consistency judge** — across multilingual conversation turns, tone and persona remain stable.

> Don't have labels yet? See [Bootstrap a judge before GEPA](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/guides/bootstrap-judges.md) for the week-1 setup.

**Continuous trace evaluation:** sampled at 1% of bookings, hourly during peak season. Per-language dashboards visible to localization and revenue-management teams.

## Configuration in code

```python
# Python (SDK)
from layerlens import Stratix
client = Stratix()

booking_accuracy = client.scorers.create_code(
 name="booking-detail-accuracy",
 code="""
extracted = output['booking']
truth = expected['booking']
fields = ['check_in', 'check_out', 'guest_name', 'guests', 'room_type']
result = {'passed': all(extracted[f] == truth[f] for f in fields), 'date_format_ok': extracted['check_in'] == truth['check_in']}
""",
)

cross_language = client.scorers.create_code(
 name="locale-parity",
 code="""
en_acc = mean(s for t, s in zip(traces, scores) if t.tags['lang'] == 'en')
gaps = {l: en_acc - mean(s for t, s in zip(traces, scores) if t.tags['lang'] == l) for l in supported_langs}
result = {'passed': max(abs(g) for g in gaps.values()) <= 0.05, 'gaps': gaps}
""",
)

cultural = client.judges.create(
 name="cultural-fit",
 evaluation_goal="Score 1-5: is the response culturally appropriate for the locale, including tone, formality, and idiom?",
)

trace_eval = client.trace_evaluations.create(
 trace_set={"tags": {"feature": "booking-assistant"}, "sample_rate": 0.01},
 scorers=[booking_accuracy.id, cross_language.id],
 judges=[cultural.id],
 schedule="hourly",
)
```

```typescript
// TypeScript (REST)
const r = await fetch("https://stratix.layerlens.ai/api/v1/trace-evaluations", {
 method: "POST",
 headers: {
 "X-API-Key": process.env.LAYERLENS_STRATIX_API_KEY!,
 "Content-Type": "application/json",
 },
 body: JSON.stringify({
 trace_set: { tags: { feature: "booking-assistant" }, sample_rate: 0.01 },
 scorers: [bookingAccuracyId, crossLanguageId],
 judges: [culturalId],
 schedule: "hourly",
 }),
});
```

## What you get

* Per-language accuracy is measured, not assumed from aggregate metrics.
* Date-parsing and locale-specific failures detected in CI before they reach travelers.
* PCI compliance enforced by code graders; no payment data flows into trace bodies.
* Localization-team prioritization driven by data — which languages need investment and where.

## Stratix capabilities used

* [Custom code graders](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/cookbook/custom-code-scorer.md) — cross-language parity, booking-detail extraction
* [Judges with GEPA optimization](/8.-evaluate-score-the-outputs/judges-1.md) — cultural appropriateness, tone consistency
* [Trace evaluations](/8.-evaluate-score-the-outputs/trace-evaluations.md) — continuous sampled

## Replicate this

**Get started:** [Cookbook: multilingual scoring](https://github.com/LayerLens/gitbook-full/blob/main/08-evaluate/cookbook/multilingual.md) is the runnable starter.

* [Industry → Travel and hospitality](/4.2-industry-use-cases/travel-hospitality.md)
* [Concept: Continuous evaluation](/7.-observe-see-whats-happening/continuous-evaluation.md)
* [Workflow: Evaluate](/9.-improve-tune-the-system/workflow.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.layerlens.ai/4.2-industry-use-cases/pattern-12.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
