43 comments

  • lxe12 hours ago
    I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. whisper large-v3-turbo with CUDA on an RTX card transcribes a full paragraph in under a second. Even on CPU, parakeet is near-instant for short utterances.The "deep context" feature is clever, but screenshotting and sending to a cloud LLM feels like massive overkill for fixing name spelling. The accessibility API approach someone mentioned upthread is the right call — grab the focused field's content, nearby labels, window title. That's a tiny text prompt a 3B local model handles in milliseconds. No screenshots, no cloud, no latency.The real question with Groq-dependent tools: what happens when the free tier goes away? We've seen this movie before. Building on local models is slower today but doesn't have a rug-pull failure mode.
    • wolvoleo8 hours ago
      Yeah local works really fine. I tried this other tool: <a href="https:&#x2F;&#x2F;github.com&#x2F;KoljaB&#x2F;RealtimeVoiceChat" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;KoljaB&#x2F;RealtimeVoiceChat</a> which allows you to live chat with a (local) LLM. With local whisper and local LLM (8b llama in my case) it works phenomenally and it responds so quickly that it feels like it&#x27;s interrupting me.<p>Too bad that tool no longer seems to be developed. Looking for something similar. But it&#x27;s really nice to see what&#x27;s possible with local models.
    • Wowfunhappy11 hours ago
      &gt; The &quot;local is too slow&quot; argument doesn&#x27;t hold up anymore if you have any GPU at all.<p>By &quot;any GPU&quot; you mean a physical, dedicated GPU card, right?<p>That&#x27;s not a small requirement, especially on Macs.
      • arach11 hours ago
        My M1 16GB Mini and M2 16GB Air both deliver insane local transcription performance without eating up much memory - I think the M line + Parakeet delivers insane local performance and you get privacy for free
        • ghrl9 hours ago
          Yeah, that model is amazing. It even runs reasonably well on my mid-range Android phone with this quite simple but very useful application, as long as you don&#x27;t speak for too long or interrupt yourself for transcribing every once in a while. I do have handy.computer on my Mac too.<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=46640855">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=46640855</a><p>I find the model works surprisingly well and in my opinion surpasses all other models I&#x27;ve tried. Finally a model that can mostly understand my not-so-perfect English and handle language switching mid sentence (compare that to Gemini&#x27;s voice input, which is literally THE WORST, always trying to transcribe in the wrong language and even if the language is correct produces the uttermost crap imaginable).
          • arach2 hours ago
            Ack for dictations but Gemini voice is fun for interactive voice experiments -&gt; <a href="https:&#x2F;&#x2F;hud.arach.dev&#x2F;" rel="nofollow">https:&#x2F;&#x2F;hud.arach.dev&#x2F;</a> honestly blown away by how much Gemini could assist with with basically no dev effort
      • grosswait5 hours ago
        No. Give it a try I think you’ll be surprised
    • wazoox8 hours ago
      I&#x27;ve installed murmure on my 2013 Mac, and it works through 1073 words&#x2F;minute. I don&#x27;t know about you, but that&#x27;s plenty faster than me :D
  • digitalbase19 hours ago
    Was searching for this this morning and settled on <a href="https:&#x2F;&#x2F;handy.computer&#x2F;" rel="nofollow">https:&#x2F;&#x2F;handy.computer&#x2F;</a>
    • d4rkp4ttern15 hours ago
      Big fan of handy and it’s cross platform as well. Parakeet V3 gives the best experience with very fast and accurate-enough transcriptions when talking to AIs that can read between the lines. It does have stuttering issues though. My primary use of these is when talking to coding agents.<p>But a few weeks ago someone on HN pointed me to Hex, which also supports Parakeet-V3 , and incredibly enough, is even faster than Handy because it’s a native MacOS-only app that leverages CoreML&#x2F;Neural Engine for extremely quick transcriptions. Long ramblings transcribed in under a second!<p>It’s now my favorite fully local STT for MacOS:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;kitlangton&#x2F;Hex" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kitlangton&#x2F;Hex</a>
      • james2doyle50 minutes ago
        I was having the same journey but landed on <a href="https:&#x2F;&#x2F;github.com&#x2F;hoomanaskari&#x2F;mac-dictate-anywhere" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;hoomanaskari&#x2F;mac-dictate-anywhere</a>
      • Barbing12 hours ago
        I installed a few different STT apps at the same time that used Parakeet and I think they disagreed with each other. But Hex otherwise would’ve won for me I think. Wanna reformat the Mac &amp; try again (been a while anyway).<p>My comment on this from a month back: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=46637040">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=46637040</a>
        • arach10 hours ago
          Hex is great and not trying to pull you away from them - would love to get your pov when you give these a spin next time. email or DM me
    • zachlatta19 hours ago
      I just learned about Handy in this thread and it looks great!<p>I think the biggest difference between FreeFlow and Handy is that FreeFlow implements what Monologue calls &quot;deep context&quot;, where it post-processes the raw transcription with context from your currently open window.<p>This fixes misspelled names if you&#x27;re replying to an email &#x2F; makes sure technical terms are spelled right &#x2F; etc.<p>The original hope for FreeFlow was for it to use all local models like Handy does, but with the post-processing step the pipeline took 5-10 seconds instead of &lt;1 second with Groq.
      • sipjca15 hours ago
        There&#x27;s an open PR in the repo which will be merged which adds this support. Post processing is an optional feature if you want to use it, and when using it, end to end latency can still be under 3 seconds easily
        • zachlatta14 hours ago
          That’s awesome! The specific thing that was causing the long latency was the image LLM call to describe the current context. I’m not sure if you’ve tested Handy’s post-processing with images or if there’s a technique to get image calls to be faster locally.<p>Thank you for making Handy! It looks amazing and I wish I found it before making FreeFlow.
      • k92947 hours ago
        You can try ottex for this use case - it has both context capture (app screenshots), native LLMs support, meaning it can send audio AND screenshot directly to gemini 3 flash to produce the bespoke result.
      • lemming17 hours ago
        Could you go into a little more detail about the deep context - what does it grab, and which model is used to process it? Are you also using a groq model for the transcription?
        • zachlatta15 hours ago
          It takes a screenshot of the current window and sends it to Llama in Groq asking it to describe what you’re doing and pull out any key info like names with spelling.<p>You can go to Settings &gt; Run Logs in FreeFlow to see the full pipeline ran on each request with the exact prompt and LLM response to see exactly what is sent &#x2F; returned.
      • stavros19 hours ago
        As a very happy Handy user, it doesn&#x27;t do that indeed. It will be interesting to see if it works better, I&#x27;ll give FreeFlow a shot, thanks!
    • jimmySixDOF7 hours ago
      I didn&#x27;t try Handy but been using Whisper-Key its super simple get out of your way all local single file executable (portable so zero install too) -- thats for Windows idk about the Mac version<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;PinW&#x2F;whisper-key-local" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;PinW&#x2F;whisper-key-local</a>
      • hackernewds6 hours ago
        the astroturfing here off topic of op post is unbearable
    • odiroot4 hours ago
      Not sure if it&#x27;s just me but Handy crashes on my Arch setup. Never mind which version I run. Could be something with Wayland or Pipewire but didn&#x27;t see anything obvious in the logs.
      • arach3 hours ago
        <a href="https:&#x2F;&#x2F;github.com&#x2F;goodroot&#x2F;hyprwhspr" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;goodroot&#x2F;hyprwhspr</a> have you tried this? I have a nice 64GB new linux machine waiting to be set up for me to kick the tires on this<p>pretty sure it&#x27;s awesome - sorry OP about mentioning another project, we&#x27;re all learning here :)
    • arach12 hours ago
      Handy&#x27;s great! I find the latency to be just a bit too much for my taste. Like half the people on this thread, built my own but with a bit more emphasis on speed<p><a href="https:&#x2F;&#x2F;usetalkie.com" rel="nofollow">https:&#x2F;&#x2F;usetalkie.com</a>
    • gurjeet14 hours ago
      Thanks for the recommendation! I picked the smallest model (Moonshine Base @ 58MB), and it works great for transcribing English.<p>Surprisingly, it produced a better output (at least I liked its version) than the recommended but heavy model (Parakeet V3 @ 478 MB).
      • sipjca12 hours ago
        Great feedback :) also support for the v2 versions of the moonshine models should be out today!
    • vogtb18 hours ago
      Handy rocks. I recently had minor surgery on my shoulder that required me to be in a sling for about a month, and I thought I&#x27;d give Handy a try for dictating notes and so on. It works phenomenally well for most text-to-speech use cases - homonyms included.
    • smcleod13 hours ago
      Handy is nothing short of fantastic, really brilliant when combined with Parakeet v2!
    • irrationalfab17 hours ago
      Handy is genuinely great and it supports Parakeet V3. It’s starting to change how I &quot;type&quot; on my computer.
    • hendersoon19 hours ago
      Yes, I also use Handy. It supports local transcription via Nvidia Parakeet TDT2, which is extremely fast and accurate. I also use gemini 2.5 flash lite for post-processing via the free AI studio API (post-processing is optional and can also use a locally-hosted LM).
    • stavros19 hours ago
      I use handy as well, and love it.
  • p0w3n3d19 hours ago
    There&#x27;s also an offline-running software called VoiceInk for macos. No need for groq or external AI.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;Beingpax&#x2F;VoiceInk" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Beingpax&#x2F;VoiceInk</a>
    • sahildeepreel10 minutes ago
      came to recommend voiceink too: <a href="https:&#x2F;&#x2F;tryvoiceink.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;tryvoiceink.com&#x2F;</a><p>just bought the one-time licence. this is the future of AI pricing - local models and one-time fee.
    • james2doyle50 minutes ago
      Haven’t seen this one, but I settled on this offline-running one: <a href="https:&#x2F;&#x2F;github.com&#x2F;hoomanaskari&#x2F;mac-dictate-anywhere" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;hoomanaskari&#x2F;mac-dictate-anywhere</a>
    • jiehong2 hours ago
      I used to use VoiceInk, but I found Spokenly [0] to be easier to use for post-processing the output, and more stable overall (local version with Parakeet or whisper is free).<p>[0]: <a href="https:&#x2F;&#x2F;spokenly.app&#x2F;" rel="nofollow">https:&#x2F;&#x2F;spokenly.app&#x2F;</a>
    • parhamn19 hours ago
      +1, my experience improved quite a bit when I switched to the parakeet model, they should definitely use that as the default.
    • arach12 hours ago
      <a href="https:&#x2F;&#x2F;usetalkie.com" rel="nofollow">https:&#x2F;&#x2F;usetalkie.com</a> - Parakeet and incredibly fast and made for devs
    • zackify19 hours ago
      My favorite too. I use the parakeet model
  • sathish31616 hours ago
    To build your own STT (speech-to-text) with a local model and and modify it, just ask Claude code to build it for you with this workflow.<p>F12 -&gt; sox for recording -&gt; temp.wav -&gt; faster-whisper -&gt; pbcopy -&gt; notify-send to know what’s happening<p><a href="https:&#x2F;&#x2F;github.com&#x2F;sathish316&#x2F;soupawhisper" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;sathish316&#x2F;soupawhisper</a><p>I found a Linux version with a similar workflow and forked it to build the Mac version. It look less than 15 mins to ask Claude to modify it as per my needs.<p>F12 Press → arecord (ALSA) → temp.wav → faster-whisper → xclip + xdotool<p><a href="https:&#x2F;&#x2F;github.com&#x2F;ksred&#x2F;soupawhisper" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ksred&#x2F;soupawhisper</a><p>Thanks to faster-whisper and local models using quantization, I use it in all places where I was previously using Superwhisper in Docs, Terminal etc.
    • archb3 hours ago
      Yeah, it&#x27;s really that simple. I have tried various applications as well and keep coming back to my custom script because when a new voice model drops on HuggingFace it becomes possible to customize it immediately - rather than wait for that application developer to support that new model.
  • dkhenry4 hours ago
    Just Text to speech seems like its largely solved on pretty much every compute platform. However I have found a huge gap going from independent words being transcribed, to formatted text ready for an editor, or further processing.<p>If you look at how authors dictate they works ( which they have done for millennia), just getting the words written down is only the first step, and its by far the easiest. I have been helping build a tool <a href="https:&#x2F;&#x2F;bookscribe.ai" rel="nofollow">https:&#x2F;&#x2F;bookscribe.ai</a> that not only does the transcription, but then can post process it to make it actually usable for longer form content.
    • pegasus4 hours ago
      Aqua Voice does (at least some of) that as well.
  • vesterde19 hours ago
    Since many are asking about apps with simillar capabilities I’m very happy with MacWhisper. Has Parakeet, near instant transcription of my lengthy monologues. All local.<p>Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!
    • SOLAR_FIELDS12 hours ago
      I actually got MacWhisper originally for speech to text so I could talk to my machine like a crazy person. I realized I didn&#x27;t like doing that but the actual killer feature for buying it that I really enjoy is the fully local transcription of meetings, with a nice little button to start recording that pops up when you launch zoom, teams, etc. It means I can safely record meetings and encrypt them locally and keep internal notes without handing off all of that to some nebulous cloud platform.<p>I had previously used Hyprnote to record meetings in this way - and indeed I still use that as a backup, it&#x27;s a great free option - but the meeting prompting to record and better transcription offered by Macwhisper is a much better experience.
      • arach12 hours ago
        I initially built Talkie to talk to it like a crazy person when I was on long runs and ideas would pop into my head haha<p>Been a power user of SuperWhisper and Wispr Flow for a long time and eventually decided to unify those flows - memos &amp; dictations, everything is a file and local first, BYOK
  • strokirk10 hours ago
    Does any of these solutions work reliably for non-English languages? I’ve had a lot of issues trying to transcribe Swedish with all the products I’ve used so far.
    • u_sama8 hours ago
      Parakeet doesn&#x27;t work ? <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;nvidia&#x2F;parakeet-tdt-0.6b-v3" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;nvidia&#x2F;parakeet-tdt-0.6b-v3</a><p>If you are willing to use a service for transcriptions, Mistral (which is also European) works rather nicely if they support your language <a href="https:&#x2F;&#x2F;docs.mistral.ai&#x2F;capabilities&#x2F;audio_transcription#transcription" rel="nofollow">https:&#x2F;&#x2F;docs.mistral.ai&#x2F;capabilities&#x2F;audio_transcription#tra...</a>
    • k92947 hours ago
      Try ottex with Gemini 3 flash as a transcription model. I&#x27;m bilingual as well and frequently switch between languages - Gemini handles this perfectly and even the case when I speak two languages in one transcription.
  • kombinar20 hours ago
    Sounds like there&#x27;s plenty of interest in those kind of tools. I&#x27;m not a huge fun API transcriptions given great local models.<p>I build <a href="https:&#x2F;&#x2F;github.com&#x2F;bwarzecha&#x2F;Axii" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;bwarzecha&#x2F;Axii</a> to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.
  • k92947 hours ago
    I&#x27;m building in the same space, Workin on <a href="https:&#x2F;&#x2F;ottex.ai" rel="nofollow">https:&#x2F;&#x2F;ottex.ai</a> - It&#x27;s a free STT app, with local models and BYOK support (OpenRouter, Groq, Mistral, and more).<p>The top feature is the per-app custom settings - you can peak different models and instructions for different apps and websites.<p>- I use the Parakeet fast model when working with Claude Code (VS Code app). - And I use a smart one when I draft notes in Obsidian. I have a prompt to clean up my rambling and format the result with proper Markdown, very convenient.<p>One more cool thing is that it allows me to use LLMs with audio input modalities directly (not as text post-processing). e.g. It sends the audio to Gemini and prompts it to transcribe, format, etc., in one run. I find it a bit slow to work with CC, but it is the absolute best model in terms of accuracy, understanding, and formatting. It is the only model I trust to understand what I meant and produce the correct result, even when I use multiple languages, tech terms, etc.
  • zuInnp3 hours ago
    I am a huge fan of OpenSuperWhisper (<a href="https:&#x2F;&#x2F;github.com&#x2F;Starmel&#x2F;OpenSuperWhisper" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Starmel&#x2F;OpenSuperWhisper</a>). Works local and is more than enough for me.
  • threekindwords16 hours ago
    i&#x27;ve used macwhisper (paid), superwhisper (paid), and handy (free) but now prefer hex (free):<p><a href="https:&#x2F;&#x2F;github.com&#x2F;kitlangton&#x2F;Hex" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kitlangton&#x2F;Hex</a><p>for me it strikes the balance of good, fast, and cheap for everyday transcription. macwhisper is overkill, superwhisper too clever, and handy too buggy. hex fits just right for me (so far)
    • shostack14 hours ago
      Tried to use it, installed, enabled permissions, downloaded the parakeet model for English and then it crashed every time I released the button after dictating. Completely unusable.
  • arach2 hours ago
    Combine this with a nice shortcut keyboard and then you&#x27;re really flying - my favorites are XP Pen and DOIO 16
  • seyz5 hours ago
    The moat here is local inference. Whisper.cpp + Metal gives you &lt;500ms latency on M1 with the small model. no API costs + no privacy concerns. Ship that and you&#x27;ve got something the paid tools can&#x27;t match. The UI is already solid, the edge is in going fully offline.
  • rabf17 hours ago
    <a href="https:&#x2F;&#x2F;github.com&#x2F;rabfulton&#x2F;Auriscribe" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rabfulton&#x2F;Auriscribe</a><p>My take for X11 Linux systems. Small and low dependency except for the model download.
  • stranded229 hours ago
    looks good although Mistral Voxtral would be a good choice, wouldn&#x27;t it?<p><a href="https:&#x2F;&#x2F;mistral.ai&#x2F;news&#x2F;voxtral-transcribe-2" rel="nofollow">https:&#x2F;&#x2F;mistral.ai&#x2F;news&#x2F;voxtral-transcribe-2</a>
  • drooby17 hours ago
    I just vibe coded a my own NaturalReader replacement. The subscription was $110&#x2F;year... and I just canceled it.<p>Chatterbox TTS (from Resemble AI) does the voice generation, WhisperX gives word-level timestamps so you can click any word to jump, and FastAPI ties it all together with SSE streaming so audio starts playing before the whole thing is done generating.<p>There&#x27;s a ~5s buffer up front while the first chunk generates, but after that each chunk streams in faster than realtime. So playback rarely stalls.<p>It took about 4 hours today... wild.
    • bawana5 hours ago
      do you have a github?
  • Fidelix20 hours ago
    MacOS only. May this help you skip a click.
    • properbrew10 hours ago
      If you&#x27;re looking for free STT you can use Whistle across Windows&#x2F;Mac&#x2F;Linux and Android (iOS released soon)<p><a href="https:&#x2F;&#x2F;blazingbanana.com&#x2F;work&#x2F;whistle" rel="nofollow">https:&#x2F;&#x2F;blazingbanana.com&#x2F;work&#x2F;whistle</a>
    • spelk19 hours ago
      Whispering [0] is Windows compatible and has gotten a lot better on Windows despite being extremely rough around the edges at first.<p>[0] <a href="https:&#x2F;&#x2F;github.com&#x2F;EpicenterHQ&#x2F;epicenter" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;EpicenterHQ&#x2F;epicenter</a>
    • 9999gold17 hours ago
      Not sure why you got downvoted. I wish this was a tag or something.
  • knob17 hours ago
    This thread is a beautiful intro into our near future. Yet more and more custom coded software. Takes me back to the days of late 90s. Loving this!
  • muratsu18 hours ago
    For those using something like this daily, what key combinations do you use to record and cancel. I’m using my capslock right now but was curious about others
    • qingcharles16 hours ago
      Someone told me the other day I should use a foot pedal, and then I remembered I already had an Elgato one under my desk connected with my Stream Deck. I got it very cheap used on eBay. So, that&#x27;s an option too.
    • Doman17 hours ago
      Scroll Lock is really good key for that in my opinion. If your keyboard does not have it exposed then you can use some remapping program like <a href="https:&#x2F;&#x2F;github.com&#x2F;jtroo&#x2F;kanata" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jtroo&#x2F;kanata</a>
    • michaelbuckbee16 hours ago
      I have a Stream Deck and made a dedicated button for this. So I tap the button speak and then tap it again and it pastes into wherever my cursor was at.<p>And then I set the button right below that as the enter key so it feels mostly handsoff the keyboard.
    • atestu16 hours ago
      Right option. Push to talk
      • tacotime15 hours ago
        I also use the right option key on Mac, never miss it.
    • adanto684015 hours ago
      Great question. I&#x27;d love to know if anyone has had any success with handheld buttons&#x2F;bluetooth remotes or similar, too.
    • Brajeshwar15 hours ago
      Can you please teach me how to use the CAPS LOCK key as a push-to-talk?
  • vittore13 hours ago
    For macos i found <a href="https:&#x2F;&#x2F;github.com&#x2F;rselbach&#x2F;jabber" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rselbach&#x2F;jabber</a> and was lately use that, but the iOS where I still need replacement.
    • Void_11 hours ago
      Not free, but Whisper Memos (<a href="https:&#x2F;&#x2F;whispermemos.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;whispermemos.com&#x2F;</a>) is about half the price
  • yrral9 hours ago
    Does anyone know of any macos transcription apps that allow you to do speech to text live? Eg, the text outputs as you are talking? Older tech like the macos dictation as well as dragon does this, but seems like theres nothing available that uses the new, better models.
  • spelk19 hours ago
    Does anyone know of an effective alternative for Android?
    • windthrown12 hours ago
      I installed Whisper+ through FDroid and it works well for my basic needs. Only 30s at a time but you can append multiple recordings to the same transcript: <a href="https:&#x2F;&#x2F;github.com&#x2F;woheller69&#x2F;whisperIMEplus" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;woheller69&#x2F;whisperIMEplus</a>
    • uncharted918 hours ago
      I have been using VoiceFlow. It works incredibly well and uses Groq to transcribe using the Whisper V3 Turbo model. You can also use it in an offline scenario with an on-device model, but I am mostly connected to the internet whenever I am transcribing.
    • jskherman19 hours ago
      Check out the FUTO keyboard or FUTO voice input apps. It only uses the whisper models though so far.
    • xnx18 hours ago
      Does the Android keyboard transcription not work for your needs?
      • baseh59 minutes ago
        For Android I find Google GBoard transcription most accurate and pretty solid.
  • corlinp18 hours ago
    I created Voibe which takes a slightly different direction and uses gpt-4o-transcribe with a configurable custom prompt to achieve maximum accuracy (much better than Whisper). Requires your own OpenAI API key.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;corlinp&#x2F;voibe" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;corlinp&#x2F;voibe</a><p>I do see the name has since been taken by a paid service... shame.
  • arcologies198520 hours ago
    Could you make it use Parakeet? That&#x27;s an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.
    • zachlatta20 hours ago
      I love this idea, and originally planned to build it using local models, but to have post-processing (that&#x27;s where you get correctly spelled names when replying to emails &#x2F; etc), you need to have a local LLM too.<p>If you do that, the total pipeline takes too long for the UX to be good (5-10 seconds per transcription instead of &lt;1s). I also had concerns around battery life.<p>Some day!
    • s0l20 hours ago
      <a href="https:&#x2F;&#x2F;github.com&#x2F;cjpais&#x2F;Handy" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cjpais&#x2F;Handy</a><p>It’s free and offline
      • zachlatta19 hours ago
        Wow, Handy looks really great and super polished. Demo at <a href="https:&#x2F;&#x2F;handy.computer&#x2F;" rel="nofollow">https:&#x2F;&#x2F;handy.computer&#x2F;</a>
  • yuppiepuppie10 hours ago
    Quick question, what’s the state of vibe coding with Xcode? I remember there were some issues months ago trying to get a seem less integration working. Has it improved?
    • b-star3 hours ago
      I just vibe coded a small app in 2 hours (mostly due to making additional adjustments). Claude Code used Xcode CLI. No issues besides the fact that it’s not notarized and you have to trust it through Gatekeeper
  • johnbatch17 hours ago
    Do any of these works as an iOS keyboard to replace the awful voice transcription Apple is currently shipping?
    • arach12 hours ago
      <a href="https:&#x2F;&#x2F;x.com&#x2F;usetalkieapp&#x2F;status&#x2F;2022341320090775647&#x2F;photo&#x2F;1" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;usetalkieapp&#x2F;status&#x2F;2022341320090775647&#x2F;photo&#x2F;...</a><p>native app uses Parakeet (v2 or V3) on iOS
    • copperx15 hours ago
      utter (utter.to) does.
  • Laurenz13376 hours ago
    I dont understand who this is for honestly. Unless you dont have hands, why would you want to talk to your computer. Maybe Im just autistic, but I would always prefer text over speaking out and have that translate to text.
    • hackernewds6 hours ago
      you shouldn&#x27;t use autism as a generic insult as you have here
  • baxtr19 hours ago
    Is there a tool that preserves the audio? I want both, the transcript and the audio.
    • heyalexej18 hours ago
      Quick glance; FreeFlow already saves WAV recordings for every transcript to ~&#x2F;Lib..&#x2F;App..&#x2F;FreeFlow&#x2F;audio&#x2F; with UUIDs linking them to pipeline history entries in CoreData. Audio files are automatically deleted though, when their associated history entries are deleted. Shall be a quick fix. Recently did the same for hyprvoice, for debugging and auditing.
  • sonu2720 hours ago
    Nice! I vibe coded the same this weekend but for OpenAI however less polished <a href="https:&#x2F;&#x2F;github.com&#x2F;sonu27&#x2F;voicebardictate" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;sonu27&#x2F;voicebardictate</a>
    • manmal19 hours ago
      Also look into voxtral, their new model is good and half the price if you can live without streaming.
  • hodanli18 hours ago
    title lacks: for Mac
  • lemming19 hours ago
    Is it possible to customise the key binding? Most of these services let you customise the binding, and also support toggle for push-to-talk mode.
  • dan_wood9 hours ago
    But SuperWhisper is free with Parakeet as a local model?
    • hnrodey9 hours ago
      That’s how I have it running too
  • dcreater15 hours ago
    Why do people find the need to market as &quot;free alternative to xyz&quot; when its a basic utility? I take it as an instant signal that the dev is a copycat and mostly interested in getting stars and eyeballs rather than making a genuinely useful high quality product.<p>Just use handy: <a href="https:&#x2F;&#x2F;github.com&#x2F;cjpais&#x2F;Handy" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cjpais&#x2F;Handy</a>
    • egonschiele15 hours ago
      Really good to know Handy exists; it&#x27;s the first I&#x27;m hearing about it. I use a speech-to-text app that I built for myself, and I know at least one co-worker pays $10 a month for (I think) Wispr. I think it&#x27;s possible there was no intention to market, and the creator simply didn&#x27;t know about Handy, just like me.
  • wazoox7 hours ago
    Murmure is multiplatform, uses parakeet and can connect to your local llm (using ollama). <a href="https:&#x2F;&#x2F;murmure.al1x-ai.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;murmure.al1x-ai.com&#x2F;</a>
  • SomaticPirate18 hours ago
    Seeing this thread, sounds a blog post comparing the offerings would be useful
    • copperx16 hours ago
      Good idea at first glance, but it would get outdated in hours.
  • ndgold5 hours ago
    Vowen
  • DevX10117 hours ago
    Anything similar for iOS?
  • Zopieux16 hours ago
    Saved you a click: Mac only and actually Grok; local inference too slow.<p>Won&#x27;t be free when xAI starts charging.
    • properbrew10 hours ago
      If you&#x27;re looking for free STT you can use Whistle across Windows&#x2F;Mac&#x2F;Linux and Android (iOS released soon)<p><a href="https:&#x2F;&#x2F;blazingbanana.com&#x2F;work&#x2F;whistle" rel="nofollow">https:&#x2F;&#x2F;blazingbanana.com&#x2F;work&#x2F;whistle</a>
    • setnone11 hours ago
      groq ≠ grok
  • _blackhawk_18 hours ago
    Spokenly?
  • copperx16 hours ago
    Utter uses your OpenAI key (~$1&#x2F;month). <a href="https:&#x2F;&#x2F;utter.to&#x2F;" rel="nofollow">https:&#x2F;&#x2F;utter.to&#x2F;</a>. Has an iPhone app.
  • anvevoice13 hours ago
    [dead]