Blog
Engineering

Measuring CSAT Across Languages: The Metrics That Matter

Aggregate CSAT hides what matters: your English CSAT might be 4.7 while German is 3.1, and your 4.4 average looks fine. Here's how to structure multilingual CSAT measurement correctly.

Ali Osman DelismenMay 24, 2026 · 8 min
Photo: Deng Xiang / Unsplash

CSAT scores are useful. CSAT scores broken down by language are essential. Aggregate CSAT hides the most important insights: which markets are underserved, which language-specific support gaps exist, and where AI deflection is working versus failing.

Here's how to structure multilingual CSAT measurement correctly.

Why Aggregate CSAT Lies

Imagine this scenario: your English CSAT is 4.7/5. Your German CSAT is 3.1/5. Your overall CSAT is 4.4/5 — which looks fine on a dashboard.

But you're actively losing German-speaking customers while your leadership sees a green metric. Aggregate CSAT is a coverage problem disguised as a performance metric.

Every support team with multiple languages should treat each language as a separate support operation from a metrics perspective.

The Core CSAT Framework for Multilingual Support

Tier 1: Per-Language Baseline

For each language you support, track:

  • CSAT score: 1–5 or thumbs up/down, sent in the customer's language
  • Response rate: % of conversations that receive a CSAT response
  • Sample size: are you measuring enough conversations to be statistically significant?

A common mistake: sending CSAT surveys in English to non-English speakers. Survey response rates drop 40–60% when the survey language doesn't match the conversation language.

Tier 2: AI vs. Human CSAT Split

For each language, measure CSAT separately for:

  • AI-deflected conversations: resolved by AI without human escalation
  • Human-handled conversations: escalated to a human agent
  • Mixed conversations: started by AI, escalated mid-conversation

This split reveals whether your AI is maintaining quality parity with human agents. If AI CSAT is 3.8 and human CSAT is 4.6 in German, your AI is underperforming specifically for German — likely a KB coverage or translation quality issue.

Tier 3: Issue Category CSAT

Break CSAT down by issue category within each language:

  • Billing questions
  • Technical support
  • Feature requests
  • Account management
  • Onboarding

This reveals category-specific gaps. You might find that billing support in French is excellent (4.8) but technical support in French is poor (3.2) — pointing to a specific knowledge base gap.

CSAT Survey Design for Multilingual Teams

Language Matching

Always send the CSAT survey in the conversation language. This requires:

  • Detecting the conversation language (not just the account language)
  • Having survey templates translated for every supported language
  • Routing survey responses back to the correct language segment

Timing

Optimal CSAT survey timing varies by conversation type:

  • Resolved conversations: immediately on resolution + 24h follow-up
  • AI-deflected: 2 hours after the conversation ends (gives time to verify the answer worked)
  • Escalated: 1 hour after human resolution

Question Design

Keep it short. One question works better than five:

"Did we resolve your issue today?" (Yes / No / Partially)

Optional follow-up open text for "No" and "Partially" responses only. Anything more and response rates drop.

Closing the Loop: CSAT to Improvement

CSAT data is only valuable if it drives action:

  1. Weekly review per language: what's the trend? What dropped?
  2. Low-CSAT conversation audit: read the actual conversations behind low scores
  3. KB gap identification: map low CSAT to missing or poor-quality articles
  4. AI response review: for low-CSAT AI conversations, review the actual AI responses
  5. Translation quality audit: low CSAT in a language may signal translation issues, not content issues

Teams that close this loop — using CSAT to identify specific KB gaps, then filling them — see 0.3–0.7 point CSAT improvements per quarter per language.

Read next