How webhook delivery works in voice AI stacks — and where it breaks
A voice AI webhook chain moves through four sequential handoffs, each of which is a potential failure point. Understanding the handoff diagram is the prerequisite for knowing where to look when data goes missing.
Handoff 1: Provider fires the event. When a Twilio call ends, Twilio sends an HTTP POST to your configured status callback URL with the call details. When an ElevenLabs TTS generation completes, ElevenLabs sends a webhook event to your configured endpoint. The provider's responsibility ends when they receive a 2xx response from your endpoint — or exhaust their retry attempts.
Handoff 2: Your endpoint receives and processes the event. Your server receives the POST, validates it (checking the Twilio signature header or ElevenLabs webhook secret), and processes the data — parsing the payload, writing to your database, triggering downstream logic.
Handoff 3: Your endpoint sends data to downstream systems. Your webhook handler calls the HubSpot API, the Salesforce API, or writes to a PostgreSQL database. Each of these calls can fail independently — rate limits, authentication expiry, schema mismatches.
Handoff 4: Downstream systems acknowledge receipt. This handoff is often ignored — most teams assume that a 200 response from the HubSpot API means the data was saved. In practice, CRM APIs can return 200 for deduplication scenarios where the record was not actually created because it already exists.
Most webhook failure investigations focus on Handoff 1 because it is the most visible — providers expose delivery logs. Handoffs 2, 3, and 4 are nearly invisible unless you have built explicit logging for them.
The 4 webhook failure types and how to identify each
Provider delivery failure occurs when the provider cannot reach your endpoint — you return a 4xx or 5xx response, or you do not respond within the timeout window. Twilio will retry status callbacks 3 times with a 5-second wait between attempts. ElevenLabs retries webhook events up to 5 times with exponential backoff. If all retries fail, the event is dropped. Detection: check the provider's delivery log in their developer console. Twilio shows status callback delivery in the Call resource — the StatusCallbackEvent and StatusCallbackMethod fields, plus any error codes logged against the call. ElevenLabs shows webhook delivery attempts in the API keys section.
Endpoint receipt but processing failure occurs when your server receives the webhook and returns 200, but the processing logic fails after the response is sent. This is the most invisible failure mode because the provider considers the event successfully delivered. Your data was never written. Detection: application-level logging in your webhook handler that explicitly logs success/failure at each processing step, with the event ID and timestamp. Without this logging, this failure type is undetectable until a count-compare reveals missing data.
Endpoint-to-CRM failure occurs when your webhook handler successfully processes the event but the downstream API call fails. Rate limits are the most common cause — HubSpot's API has strict rate limits (100 requests per 10 seconds on the free tier, 150 on paid tiers) that production call volumes can exceed. Authentication expiry is the second most common cause — OAuth tokens expire and must be refreshed. Detection: explicit error logging on all CRM API calls, with the HTTP status code and response body from the CRM API included in the log entry.
Duplicate delivery causes the inverse problem — the same event is processed twice, creating duplicate records. Twilio and ElevenLabs both retry events when they do not receive a timely 2xx response. If your server processed the event but was slow to respond, you may have received the event, processed it successfully, and then received a retry that you also processed. Detection: count discrepancies where your record count exceeds the provider's call count.
How to detect webhook failures proactively
Reactive detection — discovering webhook failures when a customer reports missing data or a sales team notices a gap in the CRM — is the mode most teams operate in. It is also the most expensive mode, because by the time the failure is discovered, days or weeks of data are missing.
The simplest proactive detection method is a count-compare check: once per day, pull the total call count from Twilio for the prior 24 hours and compare it to the total records created in your CRM or database for the same window. If the numbers diverge by more than 2–3% (accounting for calls that did not result in a record by design), you have a webhook delivery gap. This check takes under 10 lines of code to implement and can run as a scheduled job.
Twilio's delivery logs are the most detailed source of truth for Handoff 1 failures. The Twilio API exposes webhook delivery attempts via the Call resource — any call where the status callback returned a non-2xx response will have an error code logged against it. Pulling calls with error codes in the StatusCallbackUrl field daily gives you a precise count of delivery failures.
For Handoff 2 and 3 failures, the only reliable detection method is application-level structured logging with event IDs. Log three events per webhook: receipt (with the event ID and timestamp), processing complete (with the outcome — success or failure code), and downstream API response (with the status code from the CRM or database). Query these logs daily for events where receipt was logged but processing complete was not. This is your processing failure rate.
Building a reliable webhook pipeline from scratch
A reliable webhook pipeline requires four components that are absent from most initial implementations.
Idempotency key implementation prevents duplicate processing. The deduplication key is the provider's call identifier — Twilio's CallSid or ElevenLabs' history item ID. Before processing any webhook event, check whether the key already exists in a processed-events table. If it does, return 200 (to prevent further retries) and skip processing. If it does not, insert the key and proceed. This check must be atomic (a database upsert or a Redis SET NX operation) to prevent race conditions under concurrent delivery.
Synchronous 200 + async processing is the pattern that prevents timeouts from causing duplicate delivery. Receive the webhook → validate the signature → immediately enqueue the event payload in a job queue (Redis, SQS, or a database table) → return 200. A separate worker process dequeues and processes events. This decouples the provider's delivery confirmation from your processing time, eliminating the most common cause of retry-induced duplicates.
Retry queue with exponential backoff handles transient failures in downstream systems. When a CRM API call fails with a retryable error (429 rate limit, 503 service unavailable), the event goes back into the retry queue with a delay — 30 seconds, then 2 minutes, then 10 minutes, then 1 hour. After 4 retry attempts, the event moves to the dead-letter queue. Exponential backoff prevents a CRM rate limit event from immediately flooding the queue with retry attempts.
Dead-letter queue stores events that have permanently failed. Alert when DLQ depth exceeds zero — every event in the DLQ represents lost data that requires manual intervention. Include the original event payload, the error that caused the failure, and the timestamp of the last retry attempt in the DLQ entry. Recovery from a DLQ requires identifying the root cause of the permanent failure, fixing it, and replaying the events.
Quick fixes for common webhook failures
For Twilio status callback timeouts: your callback URL must respond within 15 seconds. If your handler does synchronous database writes, CRM API calls, or any other I/O before responding, measure the response time. Add timing logs around each operation. Move everything after the initial validation to async processing as described above.
For Twilio signature validation failures: if you are behind a load balancer or reverse proxy, the X-Twilio-Signature header is validated against the full URL including protocol, domain, path, and query string — exactly as Twilio constructed it. If your proxy changes the URL in any way (stripping HTTPS to HTTP, changing the path, adding query parameters), the signature validation will fail. Log the full URL your handler receives and compare it to the URL you configured in Twilio.
For ElevenLabs webhook delivery failures: the most common cause is your endpoint not being publicly accessible. If you are testing locally with a tool like ngrok, the ngrok tunnel URL changes on each restart — update the webhook URL in ElevenLabs every time the tunnel restarts, or use a fixed subdomain (ngrok paid tier). In production, verify that your webhook URL is accessible from the public internet and that your firewall allows inbound traffic from ElevenLabs' delivery IP ranges.
For HubSpot rate limit errors (429): HubSpot enforces rate limits per OAuth token, not per account. If you have multiple webhook handlers using the same OAuth token, they share the same rate limit budget. Either implement client-side rate limiting (a token bucket in Redis) to spread requests across the rate limit window, or switch to the HubSpot batch API endpoints that process multiple records in a single request.
How Sherlock detects webhook delivery failures automatically across your stack
The count-compare check and application-level logging described above are the right detection mechanisms. They are also tedious to implement correctly and require maintenance as your stack changes — new providers, new CRM integrations, new call types that should or should not generate records.
Sherlock runs the count-compare and delivery monitoring automatically across your connected providers. When the Twilio call count diverges from your CRM record count by more than the configured threshold, Sherlock posts a case file in Slack: the size of the gap, the time window it covers, the specific calls that appear in Twilio but not in your downstream system, and the most likely failure type based on the pattern.
For webhook delivery failures that Sherlock can identify from provider delivery logs (Handoff 1 failures), the case file includes the specific error codes Twilio logged against the failed callbacks and the first checks for resolving them. For Handoff 2 and 3 failures (invisible without application-level logging), Sherlock surfaces the gap and the probable location of the failure based on timing correlation.
The free tier includes webhook delivery monitoring across all connected providers. Connect your Twilio and ElevenLabs accounts at [usesherlock.ai](https://usesherlock.ai/?utm_source=blog&utm_medium=content&utm_campaign=webhook-failures-guide) to enable automatic detection on your stack.
Ready to investigate your own calls?
Connect Sherlock to your voice providers in under 2 minutes. Free to start — 100 credits, no credit card.