TutorialsFebruary 20, 20267 min readby Jorge

Vapi Webhook Debugging Guide: Why Webhooks Fail and How to Fix Them

A complete guide to debugging Vapi webhook failures — delivery timeouts, retry patterns, silent failures, and how to correlate webhook events with call outcomes.

TL;DR — The short answer

1
Vapi webhook failures fall into three distinct categories — delivery failures (Vapi cannot reach your endpoint), processing failures (your handler receives the event but fails silently), and correlation failures (your handler runs but cannot match the event to the right record) — and each requires a different debugging approach.
2
The fast-ack pattern — returning HTTP 200 immediately and processing asynchronously — is the most important architectural change for high-volume Vapi deployments and eliminates the majority of timeout-induced delivery failures.
3
Idempotency keys based on Vapi event IDs are required for correct webhook handling because Vapi's retry logic means any event can be delivered multiple times under normal operating conditions.
4
Correlating Vapi call events with Twilio CallSids uses the `call.phoneCallProviderId` field and is the correct key for cross-provider incident investigation.

Sherlock Holmes inspects broken pneumatic tubes labeled with Vapi webhook events, debugging missed payloads and failures. — "The payload arrived, Watson, but I deduce it was meant for an entirely different endpoint."

Vapi webhook architecture: understanding the delivery mechanism

Vapi's webhook system delivers call lifecycle events to a server URL you configure under Settings > Server URL in the Vapi dashboard. Every Vapi phone number and assistant inherits the account-level server URL by default, but both can be overridden at the phone number level and at the assistant level — meaning you can route different event types to different handlers in a multi-product deployment.

The event types that matter for production debugging:

call.started: fires when Vapi initiates the call. Contains the Vapi call ID, phone number, and — if Twilio is your telephony provider — the phoneCallProviderId field with the Twilio CallSid. This is the earliest possible moment to record a call attempt in your CRM or analytics system.

call.ended: fires when the call terminates for any reason. Contains call duration, end reason (customer-ended, assistant-ended, error), and a summary of the conversation if transcript processing is enabled. This is the event most teams use for CRM updates, conversion tracking, and analytics writes.

function-call: fires when the AI assistant executes a tool call defined in your assistant configuration. Your server must respond to this event synchronously (within the timeout window) with the function result. This is the only Vapi webhook event type where a slow response directly degrades the caller's experience — the call literally waits for your function-call response before the AI can continue.

transcript: fires after call completion with the full transcript. Useful for CRM enrichment and analysis but not time-sensitive.

Vapi's delivery mechanism uses HTTPS POST with a JSON body. The Content-Type header is application/json. Your endpoint must return a 2xx status within 20 seconds. Non-2xx responses and timeouts are both treated as delivery failures and trigger the retry schedule.

The three failure modes: delivery, processing, and correlation

Debugging Vapi webhook failures requires correctly identifying which of the three failure modes you are dealing with. They have different symptoms, different evidence sources, and different fixes.

Delivery failures occur when Vapi cannot successfully POST to your endpoint. Symptoms: the event appears in the Vapi dashboard call logs as 'webhook delivery failed', or you observe a call in Vapi with no corresponding record in your downstream system and the call happened during a period where your server was unavailable. Root causes: your server is down or unreachable, your SSL certificate is invalid, your server returned a non-2xx status, or your server exceeded the 20-second timeout. Evidence source: Vapi call log webhook delivery status, your server access logs. Fix: verify server availability, certificate validity, and handler response time.

Processing failures occur when Vapi successfully delivers the event (your server returns 200) but your server's subsequent processing fails. Symptoms: delivery shows as successful in Vapi dashboard, but your CRM has no record and your application has no error alert. Root causes: silent exception handling in your webhook handler, missing environment variables in production, CRM API authentication failures, or database connection failures. Evidence source: your application logs — specifically, any log entries between 'webhook received' and 'processing complete'. If there are no log entries in that range, your exception handler is swallowing errors. Fix: add structured logging at each processing step, ensure exceptions are logged before being swallowed, add an error tracking service like Sentry to catch uncaught exceptions in webhook handlers.

Correlation failures occur when your handler processes the event correctly but cannot match it to the right CRM record or analytics session. Symptoms: processing logs show success, CRM shows a new record was created, but it is a duplicate or is associated with the wrong contact. Root causes: race conditions between call.started and call.ended handlers running simultaneously, missing or incorrect lookup key (phone number format mismatch — E.164 in Vapi, local format in your CRM), or asynchronous CRM writes that completed in the wrong order. Fix: use Vapi call ID as the primary lookup key rather than phone number; ensure your database schema stores the Vapi call ID for lookup; handle the case where call.ended arrives before call.started's CRM write has committed (Vapi events are sometimes delivered out of order).

The fast-ack pattern: the most important architectural change

The single most impactful change you can make to a Vapi webhook handler is to implement the fast-ack pattern: return HTTP 200 immediately upon receiving the request, before any processing logic runs, and then perform all processing asynchronously.

The standard (wrong) architecture: ``


POST /webhook/vapi
1. Parse webhook body
2. Look up CRM contact by phone number (CRM API call: 300-800ms)
3. Update contact record (CRM API call: 200-500ms)
4. Write to database (DB write: 10-50ms)
5. Send Slack notification (HTTP call: 100-300ms)
6. Return 200 OK
Total: 600-1,650ms

The fast-ack architecture: ``` POST /webhook/vapi 1. Parse and validate webhook body (5-10ms) 2. Enqueue processing job (Redis/queue: 5-15ms) 3. Return 200 OK (total: 10-25ms)

[Async worker] 4. Look up CRM contact 5. Update contact record 6. Write to database 7. Send Slack notification ```

The fast-ack pattern eliminates delivery failures caused by slow processing, but it introduces a new requirement: your async worker must be robust. Add dead letter queues for jobs that fail repeatedly. Add job-level logging so you can diagnose failures in the async path. Add a monitoring alert for job queue depth — a growing queue indicates your workers are falling behind and processing delays are accumulating.

For the function-call event type, fast-ack is not applicable: Vapi waits synchronously for your response with the function result. For this event type, your handler must be fast by construction — use in-memory lookups or cached data where possible, and keep function implementations under 3 seconds to stay well within the timeout window with margin for network variance.

Idempotency: handling Vapi's retry-induced duplicate deliveries

Vapi's retry schedule means that under normal operating conditions — not error conditions, but any conditions where your server has transient unavailability or returns a 5xx — you will receive duplicate deliveries of the same event. Your webhook handler must be idempotent by design, not as an afterthought.

Vapi includes a unique event identifier in each webhook payload. The field path varies by event type but is consistently at the call.id level for call events. Use this as your idempotency key.

Implementation using Redis: ``

typescript
async function handleVapiWebhook(payload: VapiWebhookPayload) {
  const idempotencyKey =

vapi:event:${payload.call.id}:${payload.type}

;
  
  // Atomic check-and-set: returns null if key already exists
  const acquired = await redis.set(idempotencyKey, '1', 'EX', 86400, 'NX');
  
  if (!acquired) {
    // Already processed — return 200 to prevent further retries
    return { status: 200, body: { message: 'duplicate, ignored' } };
  }
  
  // Process the event
  await processVapiEvent(payload);
  return { status: 200, body: { message: 'processed' } };
}

The TTL on the idempotency key should be set to at least Vapi's maximum retry window plus a safety margin. Vapi's final retry attempt is at 30 minutes; a 24-hour TTL is a reasonable default that handles all retries while not accumulating excessive Redis memory.

Critical detail: the check-and-set operation must be atomic. Using GET followed by SET in two operations creates a race condition where two concurrent deliveries of the same event can both read 'key not found' before either writes. Redis's SET NX (set if not exists) is atomic and correct. The Node.js equivalent using ioredis is redis.set(key, value, 'EX', ttl, 'NX') — the four-argument form with the NX option.

Correlating Vapi events with Twilio and ElevenLabs data

In production voice AI deployments, Vapi sits between Twilio (telephony) and ElevenLabs (TTS) in the stack. A failed call investigation may require pulling logs from all three providers and aligning them on the same incident timeline. The correlation mechanism differs for each provider pair.

Vapi to Twilio correlation: Vapi exposes call.phoneCallProviderId on every call object when your Vapi account is configured to use Twilio as the underlying telephony provider. This field contains the Twilio CallSid (CA-prefixed, 34 characters). Store this field in your webhook handler when you receive call.started and use it as the lookup key when querying the Twilio REST API for the corresponding call record. This correlation is reliable — it is a direct foreign key relationship, not a timestamp estimation.

Vapi to ElevenLabs correlation: there is no direct foreign key between Vapi call IDs and ElevenLabs session IDs. The correlation must be done by timestamp. When Vapi fires call.started, record the timestamp. Query the ElevenLabs history API for sessions with created_at within ±1,000ms of the Vapi event timestamp, filtered by your ElevenLabs agent ID. In high-volume deployments where multiple calls start within the same second, use additional context — the voice model in use, the first few characters of the TTS input — to disambiguate.

The timestamp drift between Vapi and ElevenLabs events is consistent but not fixed. Vapi timestamps events at call orchestration time; ElevenLabs timestamps the session at WebSocket connection establishment, which can be 200–600ms later. Use ±1,000ms as a minimum window.

For incident investigation that spans all three providers simultaneously, manual correlation is feasible for individual calls but impractical at volume. Sherlock Calls automates the three-way alignment: given a Vapi call ID, Sherlock retrieves the Twilio CallSid via the phoneCallProviderId field, queries both Twilio and ElevenLabs APIs in parallel, aligns the three event timelines, and returns a sourced cross-provider case file in Slack within 60 seconds.

Testing Vapi webhooks in production without breaking live traffic

The safest approach to testing webhook handler changes in production is a shadow deployment pattern: route a copy of incoming Vapi webhook events to a new handler in parallel with the existing handler, without the new handler's behavior affecting production.

For Vapi specifically: the platform does not natively support sending the same event to multiple URLs. The common workaround is to implement a webhook fan-out proxy: your production webhook endpoint receives Vapi events and forwards them to multiple downstream handlers — one for production processing, one for your new handler under test. The fan-out proxy returns 200 to Vapi immediately (fast-ack), regardless of the downstream handlers' status.

For local testing during development, the full testing loop is: 1. Run ngrok http 3000 to expose your local server. 2. Update the Server URL in your Vapi test assistant's configuration (not the production assistant). 3. Create test calls via the Vapi dashboard's phone call UI or via the Vapi REST API with your test assistant ID. 4. Observe webhook delivery in your local server logs. 5. Use Vapi's call log to verify delivery status and response code.

For regression testing webhook handler logic without live calls, use the Vapi API's /call endpoint to retrieve historical call event payloads and replay them against your handler directly via curl or your test runner. This avoids the need to create real calls for every test scenario.

The most important production monitoring addition: log the Vapi webhook delivery status alongside every call record in your database. When a call has a Vapi call ID but no corresponding CRM record, the webhook delivery status log is the first place to look. Without it, you are in the position of knowing a failure occurred but having no evidence trail for why.

How Sherlock detects Vapi webhook failures across your stack

The count-compare check is the baseline detection mechanism for Vapi webhook failures at scale: Vapi call count (from the Vapi API) minus CRM-recorded call count (from your CRM API or database) equals your unprocessed event count. When this delta exceeds your baseline noise level — account for calls that legitimately do not create CRM records, such as unanswered outbound dials — you have evidence of a systemic webhook failure.

But the count-compare check only tells you that a failure occurred. Root cause identification requires correlating the Vapi call log, the delivery status for each affected event, your server response time logs, and your application-level processing logs — four data sources across three systems.

Sherlock runs this correlation automatically. When Vapi call count diverges from downstream record count, Sherlock identifies the specific Vapi call IDs with missing records, checks their webhook delivery status in the Vapi API, cross-references with Twilio CallSids via phoneCallProviderId, and posts a case file in Slack with the failure window, affected call count, and the most likely root cause based on the pattern — delivery timeout vs. processing failure vs. CRM authentication error.

For teams running Vapi in production, the free tier at [usesherlock.ai](https://usesherlock.ai/?utm_source=blog&utm_medium=content&utm_campaign=vapi-webhook-guide) includes cross-provider call count monitoring and automatic divergence alerts. Connect your Vapi and CRM accounts to enable it.

See how Sherlock compares

vs Datadog vs Sentry vs New Relic vs Arize AI vs Langfuse vs Galileo

Explore Sherlock for your voice stack

Twilio ElevenLabs Vapi Retell AI Bland AI Genesys

Frequently asked questions

What is Vapi's webhook timeout window?

Vapi requires your webhook endpoint to return a 2xx HTTP response within 20 seconds of the delivery attempt. If your endpoint does not respond within 20 seconds, Vapi marks the delivery as failed and proceeds to its retry schedule. The retry schedule as of 2026 is: immediate retry after failure, then exponential backoff at 30 seconds, 2 minutes, 10 minutes, and 30 minutes — a maximum of 5 delivery attempts per event. After 5 failures, the event is dropped and logged as undeliverable in the Vapi dashboard under Call Logs. Note that Vapi's 20-second timeout is per delivery attempt — your endpoint must consistently respond under that ceiling, not just on average. A handler that averages 5 seconds but occasionally spikes to 25 seconds will produce sporadic delivery failures that are difficult to reproduce.

How do I test Vapi webhooks locally during development?

The standard approach is ngrok: run `ngrok http <your_local_port>` to create a public tunnel to your local development server, copy the generated HTTPS URL, and paste it into the Server URL field in your Vapi dashboard under Settings. The HTTPS URL is required — Vapi rejects HTTP webhook URLs. When using ngrok's free tier, the tunnel URL changes on every restart; update the Vapi dashboard each time. For a more stable local setup, use ngrok's paid fixed-subdomain feature or consider Cloudflare Tunnel, which supports fixed hostnames on the free tier. Verify your webhook is being received by adding a log statement at the very top of your handler, before any processing — this confirms delivery is reaching your server before you debug processing logic.

What should I do when a Vapi webhook fires but my CRM is not updated?

A webhook that fires but does not update the CRM is a Handoff 2 failure — the webhook was delivered successfully to your server, but your server's processing failed silently. Diagnose this by adding structured logging at each processing step: log the incoming webhook payload, log the parsed data, log the CRM API call and its response, and log success or failure at the end. Compare the Vapi delivery log timestamp with your application logs to confirm delivery reached your server. If delivery is confirmed but CRM is not updated, check your CRM API client for silent error swallowing — many HTTP client libraries catch exceptions by default and log them without re-raising. Verify also that your server's environment variables for CRM credentials are correctly set in the deployment environment, not just locally. A misconfigured environment variable is the most common cause of CRM writes that succeed in development and fail silently in production.

How do I implement idempotent Vapi webhook handling?

Vapi may deliver the same event multiple times due to network retries or Vapi's own retry logic. Your webhook handler must be idempotent — processing the same event twice should produce the same result as processing it once. The standard pattern is to use Vapi's event ID (present in the webhook payload) as an idempotency key: before processing any event, check whether you have already processed an event with that ID. If yes, return 200 immediately without processing. Store processed event IDs in Redis with a TTL of 24 hours (or your maximum retry window, whichever is longer). The check-then-process operation must be atomic — use a Redis SET NX (set if not exists) command to prevent race conditions when your webhook handler runs on multiple servers simultaneously.

How do I correlate Vapi call events with Twilio call SIDs?

Vapi exposes a `call.phoneCallProviderId` field on call objects that contains the underlying Twilio CallSid when Vapi is configured to use your Twilio account for telephony. This is the correct correlation key. In your webhook handler, extract this field from every call.ended or call.failed event and log it alongside the Vapi call ID. To look up a Vapi call in Twilio: use the Twilio REST API with the extracted CallSid. The inverse lookup (finding the Vapi call from a Twilio CallSid) requires querying the Vapi REST API with the `call.phoneCallProviderId` filter. Be aware of the 200–500ms timestamp drift between Vapi and Twilio event timestamps described above — when correlating by time rather than ID, use a ±1,000ms window.

Ready to investigate your own calls?

Connect Sherlock to your voice providers in under 2 minutes. Free to start — 100 credits, no credit card.

Start for free

← Back to the blog