How to Audit a Voice AI Platform Before Committing

Most agencies pick a voice AI platform based on a demo call and a pricing page, then discover the real limitations six months later when a client in healthcare asks about HIPAA or the platform has its third outage in a month. A structured audit before signing prevents costly migrations down the line. This checklist covers 10 areas that separate production-grade platforms from demo-stage ones, with specific red and green flags for each. Use it regardless of which platform you are evaluating.

The 10 audit points are: architecture (native vs wrapper), pricing transparency, compliance certifications, uptime SLA, latency benchmarks, support channels, white-label depth, API quality, money-back guarantee, and company stability. A structured evaluation against these checkpoints is what separates a platform you can build a business on from one that strands your clients when it raises prices, loses a certification, or shuts down. This article walks through each checkpoint so you can apply that rigor to your voice AI decision. For the broader framework this audit sits inside, see the white-label voice AI platform guide for agencies.

Updated for June 2026: Refreshed the company-stability examples to reflect that Air.ai is now defunct (the FTC settled its case in March 2026) and that PlayAI's platform was wound down after Meta's 2025 acqui-hire.

1. Architecture: Native Platform vs Wrapper

Native voice AI platforms own their entire technology stack, from telephony to speech processing to the AI engine. Wrapper platforms add a dashboard on top of someone else's infrastructure, typically Vapi or Retell. This distinction determines who you call when something breaks.

A native platform means one vendor, one support relationship, one point of accountability. When a call fails, the platform's engineering team can diagnose and fix it because they built the system. A wrapper platform means your support ticket gets bounced between the wrapper company, the underlying API provider, and the telephony layer. Nobody owns the full problem.

The reliability math matters too. If each layer in a wrapper stack has 99.5% uptime, five layers compound to roughly 97.5% effective uptime, which translates to about 18 hours of downtime per month. A native platform with a single stack does not face this compounding.

Green flags: The vendor states explicitly that they own their infrastructure. They can explain their architecture without referencing third-party API providers as core dependencies. They publish uptime data from their own monitoring.

Red flags: The vendor's documentation references Vapi, Retell, or Twilio as required dependencies. Support responses include "we've escalated to our provider." The platform went down the same day a known Vapi or Retell outage occurred.

What to check: Ask directly: "Do you own your voice AI infrastructure, or do you build on top of another provider?" Then verify the answer by checking their status page history against known Vapi/Retell outage dates.

2. Pricing Transparency

A platform's pricing page should tell you exactly what you will pay at 5, 10, 20, and 50 clients without requiring a sales call. The most common pricing traps in voice AI are: hidden per-seat fees that scale with client count, tiered plans that force expensive upgrades at arbitrary thresholds, and separate charges for compliance features that should be included.

Calculate your total cost at three scales: current client count, 2x current, and 5x current. Include base plan fee, per-minute usage, per-seat or per-sub-account fees, phone number costs, compliance add-ons, and integration fees. If the pricing page does not give you enough information to complete this calculation, that is itself a red flag.

Green flags: Published per-minute rates. Clear sub-account pricing (or unlimited sub-accounts). No separate charges for compliance certifications. Annual pricing discount disclosed upfront.

Red flags: "Contact sales for pricing." Per-seat fees that compound with client count. Compliance (HIPAA, SOC 2) listed as paid add-ons. Usage rates that change between tiers without clear documentation. Minute bundles that expire monthly with no rollover.

What to check: Build a spreadsheet with three scenarios. If you cannot fill in every cell from the pricing page alone, the platform fails the transparency test. Compare your results against published white-label pricing breakdowns to verify your numbers are in the right range.

3. Compliance Certifications

Compliance is not a feature you can add later. If the platform does not hold HIPAA certification with a Business Associate Agreement (BAA), SOC 2 Type II audit status, GDPR data processing agreements, and TCPA compliance tooling, you cannot serve clients in healthcare, legal, financial services, or any industry that handles sensitive data. That cuts out some of the highest-value, highest-retention agency verticals.

The difference between "we are working toward compliance" and "here is our SOC 2 Type II audit report" is the difference between a promise and a fact. Ask for documentation, not claims.

Green flags: Published compliance certifications on the website. BAA available for signing without negotiation. SOC 2 Type II audit report (not just Type I) available on request. Compliance included on all plans at no extra cost.

Red flags: Compliance listed as "coming soon" or "available on Enterprise plans." HIPAA offered as a paid add-on ($500+/month is common). No BAA template available. The vendor cannot name their auditing firm. GDPR mentioned without a published Data Processing Agreement.

What to check: Request the SOC 2 Type II audit report, the BAA template, and the GDPR Data Processing Agreement. If any of these take more than 48 hours to produce, the vendor likely does not have them. Cross-reference with compliance comparison guides to understand what should be standard.

4. Uptime SLA

An uptime SLA without financial penalties is a marketing statement, not a guarantee. Look for a contractual commitment that specifies measurement methodology (how uptime is calculated), exclusion windows (planned maintenance), and service credits or refunds when the SLA is breached.

The standard for production voice AI is 99.9% uptime (8.7 hours of downtime per year). Enterprise-grade platforms guarantee 99.99% (52.6 minutes per year). Wrapper platforms rarely publish SLAs because they cannot guarantee uptime across dependencies they do not control.

Green flags: Financially backed SLA with defined service credits. Public status page with historical uptime data. Uptime measured at the call-completion level, not just server availability. Planned maintenance windows disclosed in advance.

Red flags: "99.9% uptime" on the marketing page with no contractual backing. No public status page. Uptime measured by server ping rather than successful call completion. History of extended outages visible on community forums but not acknowledged on the status page.

What to check: Visit the vendor's status page and review the last 90 days. Count incidents, duration, and resolution time. Ask for the SLA document (not the marketing page) and look for the service credit table.

5. Latency Benchmarks

Voice AI latency is the time between when a caller finishes speaking and when the AI begins responding. Anything under 2 seconds feels natural in a phone conversation. Anything over 3 seconds creates awkward pauses that make callers hang up or lose confidence in the interaction.

Latency has two components: AI processing time (how fast the model generates a response) and telephony overhead (the time it takes for audio to travel through the phone network). Vendors that quote only AI processing latency without telephony overhead are giving you an incomplete number. End-to-end latency, measured from the caller's last word to the first word of the AI response, is what matters.

Green flags: Published end-to-end latency numbers (not just AI inference time). Latency benchmarks under 2 seconds. Latency data measured under load, not just single-call testing. The ability to test latency yourself during a trial period.

Red flags: Latency claims that reference only "AI response time" without telephony. Numbers below 500ms that seem too good to be true (telephony alone adds 200-400ms). No latency data published at all. Demo calls that sound fast but production calls that lag.

What to check: During your trial, make 20 test calls at different times of day (including peak hours) and time the response delays yourself. Compare your measurements against published latency benchmarks across platforms.

6. Support Channels

Support quality determines how fast you can resolve client-facing issues. An agency with 10 clients running voice AI agents cannot afford to wait 48 hours for an email response when a client's agent stops booking appointments correctly. The support structure should match the urgency of voice AI failures, which are immediately visible to your clients' customers.

Green flags: Dedicated support channel (Slack, direct line) for agency plans. Response times measured in hours, not days. Access to engineering staff, not just tier-1 support reading scripts. Community resources (playbooks, templates, live Q&A) that reduce support dependency.

Red flags: Email-only support on all plans. Discord server as the primary "support" channel (community help is not vendor support). "Priority support" that still takes 24-48 hours. No escalation path to engineering. Support staff cannot reproduce or diagnose technical issues.

What to check: Submit a technical support ticket during your trial and measure the response time, resolution quality, and whether the responder understood the issue. Check community forums or Discord for unresolved complaints older than 7 days.

7. White-Label Depth

White-labeling is not binary. Some platforms let you swap a logo. Others give you a fully branded client experience under your own domain with custom emails, branded dashboards, and minute markup controls. The depth of white-labeling directly affects whether clients perceive your agency as the provider or as a reseller of someone else's product, which in turn shapes how you can differentiate your agency in a crowded market.

Green flags: Custom domain support (clients visit youragency.com, not platform.com). Custom branded emails from your domain. Branded client dashboards with your logo and colors. Custom minute markup (you set your own per-minute price). Clients never see the underlying platform brand anywhere.

Red flags: Logo swap only, no custom domain. Platform branding visible in emails, invoices, or dashboard footers. No client-facing dashboard (clients must contact you for everything). White-label features locked to the highest pricing tier. Branding customization requires CSS or code changes rather than a settings panel.

What to check: Sign up for a trial and walk through the full client experience: onboarding, dashboard login, email notifications, call reports. Note every place you see the platform's brand instead of yours. Compare depth against what full white-label branding should include.

8. API Quality

API quality determines whether you can automate client onboarding, build custom integrations, and scale beyond what the platform's dashboard supports. A platform without an API locks you into manual workflows for every new client. A platform with a poorly documented API wastes engineering time on trial-and-error integration.

Green flags: REST API with complete documentation. Webhook support for real-time event notifications. Authentication via API keys with role-based access. SDKs or code examples in common languages. Rate limits that are documented and reasonable for agency scale. Native integrations with major CRMs (GoHighLevel, HubSpot) and calendar systems.

Red flags: "API coming soon." Documentation that is incomplete, outdated, or auto-generated without examples. No webhook support (you must poll for updates). API access locked to Enterprise plans. No sandbox or test environment. Zapier listed as the primary integration method with no native API.

What to check: Open the API documentation during your evaluation. Try to answer three questions from the docs alone: How do I create a new sub-account? How do I deploy an agent? How do I retrieve call data? If any of these require a support ticket to answer, the API documentation is insufficient.

9. Money-Back Guarantee

A money-back guarantee signals that the vendor is confident enough in their product to let you test it with real client deployments and walk away if it does not work. A 10-minute free trial tells you almost nothing about how the platform performs under real conditions. You need weeks, not minutes, to evaluate voice AI for agency use.

Green flags: 28-day or longer money-back guarantee with no questions asked. Full-featured access during the guarantee period (not a limited sandbox). Refund process that does not require negotiation or manager approval. No contracts or lock-in periods.

Red flags: Free trial limited to minutes or days that are insufficient for real testing. No money-back guarantee at all. "Satisfaction guarantee" with vague terms. Contracts that require 30-60 day cancellation notice. Refund requires contacting a specific department with a written request.

What to check: Read the terms of service, not just the marketing page. Look for the refund process, exclusions, and timeline. A vendor that makes the guarantee easy to claim is more confident in their retention than one that buries the process.

10. Company Stability

Voice AI is a young market, and recent history shows how fast a platform can vanish from under the agencies that built on it. PlayAI's voice platform was wound down after Meta acqui-hired the team in July 2025, with the API cut off within weeks and the platform fully retired on December 31, 2025. Air.ai is now defunct: the FTC settled its case in March 2026, imposing an $18M monetary judgment (largely suspended for inability to pay) over allegations the company bilked customers out of roughly $19M with false earnings claims, and the platform is no longer operating. Other platforms simply raise prices dramatically (Voicerr went from $28/month to $199-$299/month). Building your agency on an unstable platform means rebuilding from scratch when the platform disappears or becomes unaffordable. The missed-call math behind agency client retention only works if the platform underneath your clients is still there next quarter.

Green flags: Company has been operating for 2+ years with consistent product development. Revenue model is sustainable (not VC-funded with no path to profitability). Transparent communication about product roadmap. Published case studies with named clients. Growing team with public hiring activity.

Red flags: Company is less than 12 months old with no disclosed funding or revenue. Pricing changed dramatically in the past year. Key team members have left recently. No published case studies. The platform's social media accounts went quiet for weeks. Platform stability warning signs include rapid feature announcements without shipping, frequent "pivots," and community complaints about broken promises.

What to check: Search for the company name plus "shutdown," "lawsuit," "price increase," and "outage" to surface issues the marketing site will not show you. Check LinkedIn for team growth or contraction. Review the company's blog or changelog for consistent product updates over the past 12 months.

The Audit Scoring Framework

Score each of the 10 areas on a 1-3 scale: 1 (red flags present), 2 (acceptable but not strong), 3 (green flags confirmed). A platform scoring below 20 out of 30 has structural risks that will surface within 6-12 months of agency operation. A platform scoring 25+ is production-grade for agency use.

Audit Area	Weight	What a Score of 3 Looks Like
Architecture	High	Native platform, owns infrastructure
Pricing Transparency	High	Full cost calculable from public page
Compliance	High	HIPAA/SOC 2/GDPR included, documentation available
Uptime SLA	Medium	Financially backed, public status page
Latency	Medium	Sub-2s end-to-end, published benchmarks
Support	Medium	Dedicated channel, hours not days
White-Label Depth	Medium	Custom domain, branded everything
API Quality	Medium	Complete docs, webhooks, native integrations
Money-Back Guarantee	Low	28+ days, no questions asked
Company Stability	High	2+ years operating, sustainable model

Weight "High" areas as dealbreakers: if architecture, pricing transparency, compliance, or company stability scores a 1, do not proceed regardless of the total score. A platform with excellent latency and support but no compliance certifications is not a production option for agencies serving regulated verticals. Once you are live, the same rigor applies to your own delivery: track the client success metrics that prove voice AI ROI so you can defend the platform choice you made during this audit.

Trillet scores well on this framework: native architecture, $0.12/min with unlimited sub-accounts on the Agency plan ($299/month), HIPAA/SOC 2 Type II/GDPR/TCPA included free, 28-day money-back guarantee, and 2.5M+ calls processed across 12,000+ active agents. That said, no platform is perfect for every agency. Trillet's $299/month Agency plan may be more than what a solo operator with 2 clients needs, and its white-label features require the Agency plan rather than the $99/month Studio entry point. Run the audit on every platform you consider, including Trillet. To put Trillet through your own checklist, explore the Trillet white-label platform and the full white-label voice AI platform guide for agencies.

Frequently Asked Questions

How long should I evaluate a voice AI platform before committing?

A minimum of 14 days with at least 3 real client deployments (or simulated client setups). You need enough time to test call quality at different times of day, verify support response times, and put the white-label experience in front of an actual client. A 10-minute free trial or single demo call is not sufficient for an agency-level evaluation.

What is the most important factor when choosing a voice AI platform for an agency?

Architecture (native vs wrapper) is the most consequential long-term decision because it determines uptime reliability, support quality, and pricing stability. An agency can adapt to different pricing models or support structures, but it cannot fix a wrapper platform's upstream dependency failures. Start with architecture, then evaluate compliance and pricing transparency.

Should I choose the cheapest voice AI platform?

Not necessarily. The cheapest per-month platform often has hidden costs that exceed more expensive alternatives at scale: per-seat fees, compliance add-ons, limited sub-accounts requiring plan upgrades, and support gaps that cost you client relationships. Calculate total cost of ownership at your target client count, not just the base monthly fee.

How do I verify a vendor's compliance claims?

Request three documents: the SOC 2 Type II audit report (from a named auditing firm), the HIPAA Business Associate Agreement (BAA) template, and the GDPR Data Processing Agreement. Legitimate certifications can be produced within 24-48 hours. If the vendor says compliance is "in progress" or cannot provide documentation, treat the certification as non-existent for your evaluation.

What happens if my voice AI platform shuts down?

You migrate your clients to a new platform, which typically takes 1-2 weeks for a 10-client agency. The disruption is real but manageable if you have documented your agent configurations and client settings. Agencies using call forwarding (rather than number porting) can migrate without their clients' customers noticing any change. The key is having a contingency plan before you need one.

How to Audit a Voice AI Platform Before Committing