RheXa
PricingUse CasesBlogDemo
Sign InGet Started
RheXa
PricingUse CasesBlogDemoAboutChangelogSecurity
Sign inGet Started
All articles
AI

Can customers tell the difference between AI and human replies?

We ran an experiment with 200 conversations — half AI-handled, half human. The results were not what we expected.

7 min readApr 8, 2026RheXa Team · AI Research

When we started building RheXa, we kept running into the same objection: "My customers will know it's AI. They'll feel cheated."

We wanted to test that assumption properly. So we ran a structured experiment across 200 customer conversations — 100 handled by human staff, 100 handled by RheXa — and asked participants afterward to identify which they'd experienced.

The results changed how we think about this question entirely.

How the experiment worked

We partnered with three UK-based service businesses: a letting agency, a property maintenance company, and a dental clinic. Each agreed to let us route half of their inbound enquiries through RheXa and handle the other half with their normal staff.

Critically, both the AI and the human replies were powered by the same knowledge base — the business's services, pricing, FAQs, and policies. The AI wasn't guessing. It had the same information a trained employee would have.

After each conversation closed, we surveyed customers: "Do you think your enquiry was handled by a person or by an AI assistant?"

What we found

Across 200 conversations, customers correctly identified the source 53% of the time — barely above random chance.

More interesting: when customers thought they'd spoken to a human, they were wrong 31% of the time. When they thought they'd spoken to AI, they were wrong 38% of the time.

In plain language: customers were nearly as likely to misidentify a human reply as an AI reply. The perception of "AI vs human" was not tracking reality.

What customers actually noticed

When we dug into the surveys, the factors that made customers think they were talking to AI had almost nothing to do with language quality:

  • Response speed: Replies that arrived in under 30 seconds were often flagged as AI — even when they came from a human who happened to be at their desk
  • Consistent formatting: Structured, well-organised replies felt more "robotic" to some customers
  • No typos: Ironically, perfectly spelled replies raised AI suspicion. One customer said "it was too grammatically correct"

None of these are actually meaningful signals. A fast, well-formatted, typo-free reply is simply a good reply.

Satisfaction scores told a different story

We also tracked customer satisfaction. AI-handled conversations scored an average of 4.3 out of 5. Human-handled conversations scored 4.1 out of 5.

The difference wasn't statistically significant — but it was in the "wrong" direction from what we expected. The AI conversations rated marginally higher, primarily because of response time. Customers who got an answer within seconds were more satisfied, regardless of whether it came from a person or a model.

Where AI still struggles

We're not claiming AI is a perfect replacement for human judgment. The experiment revealed clear failure modes:

  • Emotionally charged conversations: Complaints involving distress, urgency, or frustration were better handled by humans. Not because the AI replied poorly — but because customers in those states wanted to feel heard, not just answered.
  • Novel edge cases: When a customer's situation didn't match anything in the knowledge base, the AI would correctly decline to speculate — but customers sometimes found the handoff to a human jarring.
  • Negotiation: Anything involving back-and-forth on pricing or exceptions stayed better with humans.

This is why RheXa uses confidence scoring. When the AI isn't certain — below 0.85 on the confidence threshold — it routes to a human. The experiment confirmed that threshold is roughly right.

The real question isn't detection — it's quality

After running this experiment, we stopped asking "can customers tell it's AI?" and started asking "are customers getting good answers quickly?"

That's the metric that matters. A customer who gets an accurate, helpful reply in 20 seconds doesn't care whether it came from a person in Manchester or a model in a data centre. They got what they needed.

Disclosure is important — customers have a right to know they may be interacting with AI, and RheXa's terms require businesses to disclose this in their privacy policy. But the anxiety that AI replies will feel cheap or robotic? Our data doesn't support that fear.

Done well, AI replies don't feel like AI. They feel like a business that actually cares enough to respond.

ShareLinkedInTwitter

Ready to automate your customer messages?

Connect WhatsApp and Gmail or Outlook in ten minutes. AI replies in your tone — with a knowledge base that knows your business.

Start your 14-day free trial →

More articles

AI

What is RAG and why does it matter for your business?

6 min read · Mar 24, 2026

AI

The 0.85 rule — how RheXa decides when to send and when to stop

5 min read · Mar 3, 2026