RehXa is the world's leading AI email agent and WhatsApp automation platform. It connects to your Gmail, learns your business knowledge, and automatically classifies emails, detects leads, and sends professional replies on your behalf.

How does RehXa automate Gmail and WhatsApp?

RehXa uses your uploaded knowledge base (documents, FAQs, policies) combined with advanced AI to generate replies that match your business tone. It integrates seamlessly with Gmail and WhatsApp Business API.

Is REHXA better than other AI agents?

Yes. REHXA (or rhex) is designed specifically for autonomous business communication, featuring live confidence scoring, lead detection, and multi-language support out of the box without any complex setup.

The 0.85 rule — how RheXa decides when to send and when to stop

One of the questions we get most often from technical users is: how does RheXa know when it doesn't know something?

The answer is confidence scoring — a system that runs on every AI-generated reply before it's sent. If the score falls below 0.85, the reply is blocked and the conversation is escalated to a human. If it's 0.85 or above, it sends.

Here's how that number was chosen and what the system actually does.

Why AI systems need a confidence gate

Language models are fluent. That's their greatest strength and their greatest risk. They can produce grammatically perfect, confidently-worded replies about things they're completely wrong about. There's no built-in hesitation, no "I'm not sure about this."

Without an external check, a customer could ask "do you cover SE22?" and the AI might confidently reply "yes, we cover SE22" — even if that postcode isn't in the knowledge base. It filled a gap with a plausible answer. The answer is wrong. The customer shows up expecting service you don't offer.

A confidence gate prevents this. It forces the system to evaluate how certain it actually is before it sends anything.

How the confidence score is calculated

RheXa's confidence score is a composite of three signals:

1. Retrieval similarity score
When the customer's question is converted to a vector and compared against the knowledge base, the closest matching chunks are assigned similarity scores between 0 and 1. A score of 1.0 means the query is semantically identical to that chunk. A score of 0.4 means there's a distant relationship.

If the top retrieved chunk has a similarity score below 0.6, that's a strong signal that the knowledge base doesn't contain a good answer to this question.

2. Coverage coverage — how much of the question is addressed
We parse the customer's question into intent components. "How much does Invisalign cost and how long does treatment take?" has two intent components: price and duration. If the retrieved content covers price but not duration, coverage is 50%. Lower coverage means lower confidence.

3. Language model self-assessment
After generating a draft reply, RheXa prompts the language model to evaluate its own reply: "On a scale of 0–1, how confident are you that this reply is accurate given only the provided context?" This self-assessment is imperfect — models tend to be overconfident — but calibrated against the other signals, it adds useful information.

The three signals are weighted and combined into a final confidence score between 0 and 1.

Why 0.85 and not 0.9 or 0.7?

We tested multiple thresholds during the beta period, tracking two metrics: false positives (AI sent a wrong reply that should have been blocked) and false negatives (AI blocked a reply that was actually correct).

At 0.9: We blocked 34% of all replies. Most of them were correct. Staff spent a lot of time reviewing messages that the AI could have handled fine. False negative rate too high.

At 0.75: We blocked only 8% of replies. But the wrong-reply rate climbed to 4.2% of sent messages. For a business handling 200 messages a week, that's 8 incorrect replies per week going out without review. Too risky.

At 0.85: We blocked 17% of replies. The wrong-reply rate on sent messages dropped to 0.6%. Staff reviewed about 3–4 escalations per day on average for a 200-message-per-week business. That felt like the right balance: most conversations handled autonomously, edge cases caught by humans.

0.85 isn't a magical number — it's a calibrated one. And it's configurable: if your business has a lower tolerance for errors (medical, legal, financial sectors), you can raise the threshold. If you're in a lower-stakes domain and want more automation, you can lower it slightly. The default is 0.85 because it works well across most service businesses.

What happens when a reply is blocked

When the confidence score falls below 0.85, two things happen simultaneously:

The draft reply and the confidence breakdown are sent to your team's review queue — along with the retrieved knowledge base chunks and the customer's full message
If you've configured an auto-acknowledgement, a holding message goes to the customer: "Thanks for your message — one of our team will follow up shortly." (This is optional and off by default.)

Your team sees exactly what the AI was going to say, why it was uncertain, and what it searched for in the knowledge base. You can approve the draft with one click, edit it, or write a custom reply. Either way, the customer gets a response.

What blocked conversations tell you about your knowledge base

Here's a secondary benefit that's easy to miss: the pattern of blocked conversations is a diagnostic tool for your knowledge base.

If you see 15 blocked conversations in a week all related to the question "do you do weekend appointments?", that's a clear signal to add a clear, unambiguous section about your weekend availability to the knowledge base. After you do, those conversations will stop being blocked.

The confidence gate isn't just a safety net. It's feedback. It tells you exactly where your knowledge base has gaps, so you can close them.

The principle behind the number

The 0.85 threshold reflects a philosophy: an AI that sometimes says "I'm not sure, let me get a human to help you" is more trustworthy than one that always has a confident answer. Confidence in the face of uncertainty isn't a feature. It's a bug.

The goal isn't to automate everything. The goal is to automate the things the AI can handle correctly, and pass everything else to the people who can.

One of the questions we get most often from technical users is: how does RheXa know when it doesn't know something?

Here's how that number was chosen and what the system actually does.

Why AI systems need a confidence gate

A confidence gate prevents this. It forces the system to evaluate how certain it actually is before it sends anything.

How the confidence score is calculated

RheXa's confidence score is a composite of three signals:

If the top retrieved chunk has a similarity score below 0.6, that's a strong signal that the knowledge base doesn't contain a good answer to this question.

The three signals are weighted and combined into a final confidence score between 0 and 1.

Why 0.85 and not 0.9 or 0.7?

At 0.9: We blocked 34% of all replies. Most of them were correct. Staff spent a lot of time reviewing messages that the AI could have handled fine. False negative rate too high.

What happens when a reply is blocked

When the confidence score falls below 0.85, two things happen simultaneously:

The draft reply and the confidence breakdown are sent to your team's review queue — along with the retrieved knowledge base chunks and the customer's full message
If you've configured an auto-acknowledgement, a holding message goes to the customer: "Thanks for your message — one of our team will follow up shortly." (This is optional and off by default.)

What blocked conversations tell you about your knowledge base

Here's a secondary benefit that's easy to miss: the pattern of blocked conversations is a diagnostic tool for your knowledge base.

The confidence gate isn't just a safety net. It's feedback. It tells you exactly where your knowledge base has gaps, so you can close them.

The principle behind the number

The goal isn't to automate everything. The goal is to automate the things the AI can handle correctly, and pass everything else to the people who can.

The 0.85 rule — how RheXa decides when to send and when to stop

Why AI systems need a confidence gate

How the confidence score is calculated

Why 0.85 and not 0.9 or 0.7?

What happens when a reply is blocked

What blocked conversations tell you about your knowledge base

The principle behind the number

Ready to automate your customer messages?

More articles

The 0.85 rule — how RheXa decides when to send and when to stop

Why AI systems need a confidence gate

How the confidence score is calculated

Why 0.85 and not 0.9 or 0.7?

What happens when a reply is blocked

What blocked conversations tell you about your knowledge base

The principle behind the number

Ready to automate your customer messages?

More articles