9 comments

  • WD-427 hours ago
    What is the application of this that makes anything better for anyone? All I can think of is more spammers, scammers, horrible customer support lines.
    • Kim_Bruning26 minutes ago
      This might be good for Back-office hands-free tool access, for employees who are on the road. (they shouldn't be looking at the screen, and they might be limited to voice calling due to coverage issues besides). Aka: really weird terminal.
    • dmos621 hour ago
      Good customer support lines? Is there a reason why it can't provide good support. I often use chatgpt's voice function.
      • WD-421 hour ago
        How? Businesses will use this to justify removing what few actual human support staff they have left. Nobody, and I mean it, nobody calls customer support because they want to talk to a computer. It’s the last resort for problems that usually can’t be accomplished via the already existing technical flows available via computer.
        • dmos627 minutes ago
          That's not true. I recently called to make an appointment. I don't care if it's an AI. I would actually prefer it, because I wouldn't feel bad about taking a long time to pick the best time. Don't you think you're being a bit dogmatic about this?
        • b11230 minutes ago
          Beyond true.<p>I wonder what Amazon&#x27;s goals are, as an example. Currently, at least on the .ca website, there is no way to even get to chat to fix problems. All their spider text of help options, now always lead back to the return page.<p>So it&#x27;s call them (which you can only find the number via Google.)<p>I suspect they&#x27;re so disfunctional, that they don&#x27;t understand why the massive uptick in calls, so then they slap AI in via phone too.<p>And so now that&#x27;s slow and AI drivel. I guess soon I&#x27;ll just have to do chargebacks!? Eg, if a package is missing or whatever.
      • TZubiri1 hour ago
        The expectation of customer support lines is that customers want to speak to humans. It isn&#x27;t just the fact that these are semantics that aren&#x27;t written anywhere and are open to change, because by using a human-like voice agent on a customer support line, you are pretending that that is a human, which is a scam or fraud.<p>If you really believe that the support can be good, then use a robotic text to speech, don&#x27;t pretend it&#x27;s a human. And make it clear to users that they are talking to a human, phone is a protocol that has the semantic that you speak to a human. Use something else.<p>The bottom line is that you have clients that registered under the belief that they could call a phone number and speak to a human, businesses are performing a short-term switcheroo at the expense of their clients, it&#x27;s a scam.
    • asdfsfds5 hours ago
      Microsoft costumer support saar
      • mcny4 hours ago
        It is funny you mention Microsoft s customer support because it is a publicly known issue at this point that I&#x27;d you are a Microsoft employee or a v dash, the first level of support you talk to is basically something you have to overcome to get any help at all.
    • anhner5 hours ago
      &gt; What is the application of this?<p>spammers, scammers and horrible customer support lines.
  • aftbit12 hours ago
    This opens up new possibilities for interactive phone services. Retro-futuristic for sure.
  • wild_egg12 hours ago
    The baseline configurations all note &lt;2s and &lt;3s times. I haven&#x27;t tried any voice AI stuff yet but a 3s latency waiting on a reply seems rage inducing if you&#x27;re actually trying to accomplish something.<p>Is that really where SOTA is right now?
    • dnackoul9 hours ago
      I&#x27;ve generally observed latency of 500ms to 1s with modern LLM-based voice agents making real calls. That&#x27;s good enough to have real conversations.<p>I attended VAPI Con earlier this year, and a lot of the discussion centered on how interruptions and turn detection are the next frontier in making voice agents smoother conversationalists. Knowing when to speak is a hard problem even for humans, but when you listen to a lot of voice agent calls, the friction point right now tends to be either interrupting too often or waiting too long to respond.<p>The major players are clearly working on this. Deepgram announced a new SOTA (Flux) for turn detection at the conference. Feels like an area where we&#x27;ll see even more progress in the next year.
      • gessha57 minutes ago
        I wonder if it’s possible to do the apple trick of hiding latency using animations. The audio equivalent can be the chime that Siri does after receiving a request.
      • hogrug3 hours ago
        I think interruptions had better be the top priority. I find text LLMs rage inducing with their BS verbiage that takes multiple prompts to reduce, and they still break promises like one sentence by dropping punctuation. I can&#x27;t imagine a world where I have to listen to one of these things.
    • daneel_w2 hours ago
      From my personal experience building a few AI IVR demos with Asterisk in early 2025, testing STT&#x2F;TTS&#x2F;inference products from a handful of different vendors, a reliable maximum latency of 2-3 seconds sounds like a definite improvement. Just a year ago I saw times from 3 to 8 seconds even on short inputs rendering short outputs. One half of this is of course over-committed resources. But clearly the executional performance of these models is improving.
    • russdill7 hours ago
      Been experimenting with having a local Home Assistant agent include a qwen 0.5B model to provide a quick response to indicate that the agent is &quot;thinking&quot; about the request. It seems to work ok for the use case, but it feels like it&#x27;d get really repetitive for a 2 way conversation. Another way to handle this would be to have the small model provide the first 3-5 words of a (non-commital) response and feed that in as part of the prompt to the larger model.
    • duckkg511 hours ago
      Absolutely not.<p>500-1000ms is borderline acceptable.<p>Sub-300ms is closer to SOTA.<p>2000ms or more means people will hang up.
      • matt-p52 minutes ago
        160ms is essentially optimal and you can get down to about 200ms AFAIK.
      • fragmede9 hours ago
        play &quot;Just a second, one moment please &lt;sounds of typing&gt;&quot;.wave as soon as input goes quiet.<p>ChatGPT app has a audio version of the spinner icon when you ask it a question and it needs a second before answering.
    • mohsen15 hours ago
      Just try Gemini Live on your phone. That&#x27;s state of the art
    • coderintherye11 hours ago
      Microsoft Foundry&#x27;s realtime voice API (which itself is wrapping AI models from the major players) has response times in the milliseconds.
    • wellthisisgreat11 hours ago
      No, there are models with sub-second latency for sure
    • bharrison2 hours ago
      Perhaps you didnt read that these are &quot;production-ready golden baselines validated for enterprise deployment.&quot;<p>How does their golden nature not dissuade these concerns for you?
    • echelon8 hours ago
      Sesame was the fastest model for a bit. Not sure what that team is doing anymore, they kind of went radio silent.<p><a href="https:&#x2F;&#x2F;app.sesame.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;app.sesame.com&#x2F;</a>
  • looneysquash11 hours ago
    That seems like bad news for Allison. Though I know she already had some TTS voices available, so many not.
  • eugene330611 hours ago
    I&#x27;ve created Asterisk Codex Skill, but turns out there is ten seconds timeout for scripts
  • krater2313 hours ago
    Please don&#x27;t. I had a talk with a shitty AI bot on a Fedex line. It&#x27;s absolute crap. Just give me a &#x27;Type 1 for x, type 2 for y&#x27;. Then I don&#x27;t need to guess what are the possibilities.
    • EvanAnderson12 hours ago
      Voice-controlled phone systems are hugely rage-inducing for me. I am often in loud setting with background chatter. Muting my audio and using a touchtone keypad is so much more accurate and easy than having to find a quiet place and worrying that somebody is going to say something that the voice response system detects.
    • 9x3912 hours ago
      One problem is once you’re in deep building a phone IVR workflow beyond X or Y (yes, these are intentional), callers don’t care about some deep and featured input menu. They just mash 0 or pick a random option and demand a human finish the job and transfer them - understandably.<p>When you’re committed to phone intent complexity (hell), the AI assisted options are sort of less bad since you don’t have to explain the menu to callers, they just make demands.
      • tartoran11 hours ago
        What if the goal is to keep gaslighting you until you give up your demands?
        • 9x3911 hours ago
          Most voice agents for large companies are a calculated game to deter customers from expensive humans as we know, but not always.<p>Sort of like how Jira can be a streamlined tool or a prison of 50-step workflows, it&#x27;s all up to the designer.
        • 8note10 hours ago
          you bought something from the wrong company, and you arent gonna get helped by phone, bot, or person
    • russdill7 hours ago
      The problem here is that if it&#x27;s something a voice assistant can solve, I can solve it from my account. I&#x27;m calling because I need to speak to an actual human.
      • hectormalot7 hours ago
        Im in this business, and used to think the same. It turns out this is a minority of callers. Some examples:<p>- a client were working does advertising in TV commercials, and a few percent of their calls is people trying to cancel their TV subscriptions, even though they are in healthcare - in the troubleshooting flow for a client with a physical product, 40% of calls are resolved after the “did you try turning it off and on again” step. - a health insurance client has 25% of call volume for something that is available self-service (and very visible as well), yet people still call. - a client in the travel space gets a lot of calls about: “does my accommodation include X”, and employees just use their public website to answer those questions. (I.e., it’s clearly available for self-service)<p>One of the things we tend to prioritize in the initial conversation is to determine in which segment you fall and route accordingly.
        • mcny4 hours ago
          (reposting because something ate your newlines, I&#x27;ve added comments in line)<p>Im in this business, and used to think the same. It turns out this is a minority of callers. Some examples:<p>- a client were working does advertising in TV commercials, and a few percent of their calls is people trying to cancel their TV subscriptions, even though they are in healthcare<p>I guess these are probably desperate people who are trying to get to someone, anyone. In my opinion, the best thing people can do is get a really good credit card and do a charge back for things like this.<p>- in the troubleshooting flow for a client with a physical product, 40% of calls are resolved after the “did you try turning it off and on again” step.<p>I bought a Chinese wifi mesh router and it literally finds a time between two am and five am and reboots itself every night, by default. You can turn this behavior off but it was interesting that it does this by default.<p>- a health insurance client has 25% of call volume for something that is available self-service (and very visible as well), yet people still call.<p>In my defense, I&#x27;ve been on the other side of this. I try to avoid calling but whenever I use self service, it feels like ny settings never stick and always switch back to what they want the next billing cycle. If I have to waste time each month, you have to waste time each month.<p>- a client in the travel space gets a lot of calls about: “does my accommodation include X”, and employees just use their public website to answer those questions. (I.e., it’s clearly available for self-service)<p>These public websites are regularly out of date. Someone who is actually on site confirm that yes, they have non smoking rooms or ice machines that aren&#x27;t broken is valuable.<p>One of the things we tend to prioritize in the initial conversation is to determine in which segment you fall and route accordingly.
    • cyberax10 hours ago
      Well, the future is here: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=HbDnxzrbxn4" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=HbDnxzrbxn4</a>
  • johnebgd13 hours ago
    I welcome the spam calls from our asterisk overlords.
    • haroldp9 hours ago
      I was more thinking I could add it to my Asterisk server to honey-pot the spam callers into an infinite time waster cycle.
      • Daviey5 hours ago
        &quot;Hello, this is Lenny&quot; - well known Asterisk configuration from 20 years ago.
    • VladVladikoff12 hours ago
      I’m honestly surprised it hasn’t been more prevalent yet. I still get call centre type spam calls where you can hear all the background noise of the rest of the call centre.
      • userbinator12 hours ago
        Is the background noise real, or is it also AI-generated to make you think that it&#x27;s a human?
        • tartoran11 hours ago
          The background noise is a recording for sure, no AI needed, just a background noise audiofile in a loop would do.
          • VladVladikoff11 hours ago
            Why though? It adds nothing positive, it only makes me sure it is a scam call.
            • the_af11 hours ago
              I assume it&#x27;s to make it seem like an actual call center rather than a scam. I recently got two phone scam attempts (credit card related) that sounded exactly like this.
              • ldenoue10 hours ago
                I built a voice AI stack and background noise can be really helpful to a restaurant AI for example. Italian background music or cafe background is part of the brand. It’s not meant to make the caller believe this is not a bot but only to make the AI call on brand.
                • grim_io9 hours ago
                  You can call it what ever you like, but to me this is deceptive.<p>Where is the difference between this and Indian support staff pretending to be in your vicinity by telling you about the local weather? Your version is arguably even worse because it can plausibly fool people more competently.
              • SoftTalker10 hours ago
                you actually answer unknown callers?
                • Loughla10 hours ago
                  Yes. I own a business.
                  • mcny4 hours ago
                    Also, it only takes one legitimate collect call from a jail from a loved one and now I&#x27;m all in favor of reform in our jail system.<p>No, it does not cost over thirty dollars to allow someone accused to call their loved ones. We pay taxes. I want my government to use the taxes and provide these calls for free.
                • the_af10 hours ago
                  Yes. Sometimes it&#x27;s a legit call. Not often, though.<p>Example of legit calls: the pizza delivery guy decided to call my phone instead of ringing the bell, for whatever reason.
                  • mcny4 hours ago
                    I worked door dash for a couple of days and there were multiple people who wrote in all caps to not ring the door bell. Why? I have no idea.
  • nextworddev13 hours ago
    Can I connect this to Twilio
    • kwindla12 hours ago
      One easy way to build voice agents and connect them to Twilio is the Pipecat open source framework. Pipecat supports a wide variety of network transports, including the Twilio MediaStream WebSocket protocol so you don&#x27;t have to bounce through a SIP server. Here&#x27;s a getting started doc.[1]<p>(If you do need SIP, this Asterisk project looks really great.)<p>Pipecat has 90 or so integrations with all the models&#x2F;services people use for voice AI these days. NVIDIA, AWS, all the foundation labs, all the voice AI labs, most of the video AI labs, and lots of other people use&#x2F;contribute to Pipecat. And there&#x27;s lots of interesting stuff in the ecosystem, like the open source, open data, open training code Smart Turn audio turn detection model [2], and the Pipecat Flows state machine library [3].<p>[1] - <a href="https:&#x2F;&#x2F;docs.pipecat.ai&#x2F;guides&#x2F;telephony&#x2F;twilio-websockets" rel="nofollow">https:&#x2F;&#x2F;docs.pipecat.ai&#x2F;guides&#x2F;telephony&#x2F;twilio-websockets</a> [2] - <a href="https:&#x2F;&#x2F;github.com&#x2F;pipecat-ai&#x2F;pipecat-flows&#x2F;" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pipecat-ai&#x2F;pipecat-flows&#x2F;</a> [3] - <a href="https:&#x2F;&#x2F;github.com&#x2F;pipecat-ai&#x2F;smart-turn" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pipecat-ai&#x2F;smart-turn</a><p>Disclaimer: I spend a lot of my time working on Pipecat. Also writing about both voice AI in general and Pipecat in particular. For example: <a href="https:&#x2F;&#x2F;voiceaiandvoiceagents.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;voiceaiandvoiceagents.com&#x2F;</a>
      • ldenoue10 hours ago
        The problem with PipeCat and LiveKit (the 2 major stacks for building voice ai) is the deployment at scale.<p>That’s why I created a stack entirely in Cloudflare workers and durable objects in JavaScript.<p>Providers like AssemblyAI and Deepgram now integrate VAD in their realtime API so our voice AI only need networking (no CPU anymore).
        • nextworddev10 hours ago
          let me get this straight, you are storing convo threads &#x2F; context in DOs?<p>e.g. Deepgram (STT) via websocket -&gt; DO -&gt; LLM API -&gt; TTS?
      • nextworddev11 hours ago
        This is good stuff.<p>In your opinion, how close is Pipecat + OSS to replacing proprietary infra from Vapi, Retell, Sierra, etc?
    • ldenoue10 hours ago
      I developed a stack on Cloudflare workers where latency is super low and it is cheap to run at scale thanks to Cloudflare pricing.<p>Runs at around 50 cents per hour using AssemblyAI or Deepgram as the STT, Gemini Flash as LLM and InWorld.ai as the TTS (for me it’s on par with ElevenLabs and super fast)
      • pugio9 hours ago
        Do you have anything written up about how you&#x27;re doing this? Curious to learn more...
    • VladVladikoff12 hours ago
      Technically yes, twilio has sip trunks.