We ran an experiment with 200 conversations — half AI-handled, half human. The results were not what we expected.
When we started building RheXa, we kept running into the same objection: "My customers will know it's AI. They'll feel cheated."
We wanted to test that assumption properly. So we ran a structured experiment across 200 customer conversations — 100 handled by human staff, 100 handled by RheXa — and asked participants afterward to identify which they'd experienced.
The results changed how we think about this question entirely.
We partnered with three UK-based service businesses: a letting agency, a property maintenance company, and a dental clinic. Each agreed to let us route half of their inbound enquiries through RheXa and handle the other half with their normal staff.
Critically, both the AI and the human replies were powered by the same knowledge base — the business's services, pricing, FAQs, and policies. The AI wasn't guessing. It had the same information a trained employee would have.
After each conversation closed, we surveyed customers: "Do you think your enquiry was handled by a person or by an AI assistant?"
Across 200 conversations, customers correctly identified the source 53% of the time — barely above random chance.
More interesting: when customers thought they'd spoken to a human, they were wrong 31% of the time. When they thought they'd spoken to AI, they were wrong 38% of the time.
In plain language: customers were nearly as likely to misidentify a human reply as an AI reply. The perception of "AI vs human" was not tracking reality.
When we dug into the surveys, the factors that made customers think they were talking to AI had almost nothing to do with language quality:
None of these are actually meaningful signals. A fast, well-formatted, typo-free reply is simply a good reply.
We also tracked customer satisfaction. AI-handled conversations scored an average of 4.3 out of 5. Human-handled conversations scored 4.1 out of 5.
The difference wasn't statistically significant — but it was in the "wrong" direction from what we expected. The AI conversations rated marginally higher, primarily because of response time. Customers who got an answer within seconds were more satisfied, regardless of whether it came from a person or a model.
We're not claiming AI is a perfect replacement for human judgment. The experiment revealed clear failure modes:
This is why RheXa uses confidence scoring. When the AI isn't certain — below 0.85 on the confidence threshold — it routes to a human. The experiment confirmed that threshold is roughly right.
After running this experiment, we stopped asking "can customers tell it's AI?" and started asking "are customers getting good answers quickly?"
That's the metric that matters. A customer who gets an accurate, helpful reply in 20 seconds doesn't care whether it came from a person in Manchester or a model in a data centre. They got what they needed.
Disclosure is important — customers have a right to know they may be interacting with AI, and RheXa's terms require businesses to disclose this in their privacy policy. But the anxiety that AI replies will feel cheap or robotic? Our data doesn't support that fear.
Done well, AI replies don't feel like AI. They feel like a business that actually cares enough to respond.
Connect WhatsApp and Gmail or Outlook in ten minutes. AI replies in your tone — with a knowledge base that knows your business.
Start your 14-day free trial →