CSAT scores are useful. CSAT scores broken down by language are essential. Aggregate CSAT hides the most important insights: which markets are underserved, which language-specific support gaps exist, and where AI deflection is working versus failing.
Here's how to structure multilingual CSAT measurement correctly.
Why Aggregate CSAT Lies
Imagine this scenario: your English CSAT is 4.7/5. Your German CSAT is 3.1/5. Your overall CSAT is 4.4/5 — which looks fine on a dashboard.
But you're actively losing German-speaking customers while your leadership sees a green metric. Aggregate CSAT is a coverage problem disguised as a performance metric.
Every support team with multiple languages should treat each language as a separate support operation from a metrics perspective.
The Core CSAT Framework for Multilingual Support
Tier 1: Per-Language Baseline
For each language you support, track:
- CSAT score: 1–5 or thumbs up/down, sent in the customer's language
- Response rate: % of conversations that receive a CSAT response
- Sample size: are you measuring enough conversations to be statistically significant?
A common mistake: sending CSAT surveys in English to non-English speakers. Survey response rates drop 40–60% when the survey language doesn't match the conversation language.
Tier 2: AI vs. Human CSAT Split
For each language, measure CSAT separately for:
- AI-deflected conversations: resolved by AI without human escalation
- Human-handled conversations: escalated to a human agent
- Mixed conversations: started by AI, escalated mid-conversation
This split reveals whether your AI is maintaining quality parity with human agents. If AI CSAT is 3.8 and human CSAT is 4.6 in German, your AI is underperforming specifically for German — likely a KB coverage or translation quality issue.
Tier 3: Issue Category CSAT
Break CSAT down by issue category within each language:
- Billing questions
- Technical support
- Feature requests
- Account management
- Onboarding
This reveals category-specific gaps. You might find that billing support in French is excellent (4.8) but technical support in French is poor (3.2) — pointing to a specific knowledge base gap.
CSAT Survey Design for Multilingual Teams
Language Matching
Always send the CSAT survey in the conversation language. This requires:
- Detecting the conversation language (not just the account language)
- Having survey templates translated for every supported language
- Routing survey responses back to the correct language segment
Timing
Optimal CSAT survey timing varies by conversation type:
- Resolved conversations: immediately on resolution + 24h follow-up
- AI-deflected: 2 hours after the conversation ends (gives time to verify the answer worked)
- Escalated: 1 hour after human resolution
Question Design
Keep it short. One question works better than five:
"Did we resolve your issue today?" (Yes / No / Partially)
Optional follow-up open text for "No" and "Partially" responses only. Anything more and response rates drop.
Closing the Loop: CSAT to Improvement
CSAT data is only valuable if it drives action:
- Weekly review per language: what's the trend? What dropped?
- Low-CSAT conversation audit: read the actual conversations behind low scores
- KB gap identification: map low CSAT to missing or poor-quality articles
- AI response review: for low-CSAT AI conversations, review the actual AI responses
- Translation quality audit: low CSAT in a language may signal translation issues, not content issues
Teams that close this loop — using CSAT to identify specific KB gaps, then filling them — see 0.3–0.7 point CSAT improvements per quarter per language.