Handy – Free open source speech-to-text app

(github.com)

247 points by tin7in23 days ago

39 comments

d4rkp4ttern23 days ago
I’ve tried several, including this one, and I’ve settled on VoiceInk (local, one-time payment), and with Parakeet V3 it’s stunningly fast (near-instant) and accurate enough to talk to LLMs/code-agents, in the sense that the slight drop in accuracy relative to Whisper Turbo3 is immaterial since they can “read between the lines” anyway.My regular cycle is to talk informally to the CLI agent and ask it to “say back to me what you understood”, and it almost always produces a nice clean and clear version. This simultaneously works as confirmation of its understanding and also as a sort of spec which likely helps keep the agent on track.UPDATE - just tried handy with Parakeet v3, and it works really well too, so I'll use this instead of VoiceInk for a few days. I just also discovered that turning on the "debug" UI with Cmd-shift-D shows additional options like post processing and appending trailing space.
- thethimble23 days ago
 I wish one of these models was fine tuned for programming.I want to be able to say things like "cd ~/projects" or "git push --force".
 - netghost23 days ago
 I'll bet you could take a relatively tiny model and get it to translate the transcribed "git force push" or "git push dash dash force" into "git push --force".Likewise "cd home slash projects" into "cd ~/projects".Maybe with some fine tuning, maybe without.
 - vismit200022 days ago
 You can try VSCode Speech to Text extension that works decently well in Github Copilot chat as part of Microsoft accessibility suite.
 - swah20 days ago
 Or just enjoy your last days of cd'ing, this shall pass soon!
blutoot23 days ago
I have dystonia which often stiffens my arms in a way that makes it impossible for me to type on a keyboard. TTS apps like SuperWhisper have proven to be very helpful for me in such situations. I am hoping to get a similar experience out of "Handy" (very apt maming from my perspective).I do, however, wonder if there is a way all these TTS tools can get to the next level. The generated text should not be just a verbatim copy of what I just said, but depending on the context, it should elaborate. For example, if my cursor is actively inside an editor/IDE with some code, my coding-related verbal prompts should actually generate the right/desired code in that IDE.Perhaps this is a bit of combining TTS with computer-use.
- mritchie71223 days ago
 I made something called `ultraplan`. It's is a CLI tool that records multi-modal context (audio transcription via local Whisper, screenshots, clipboard content, etc.) into a timeline that AI agents like Claude Code can consume.I have a claude skill `/record` that runs the CLI which starts a new recording. I debug, research, etc., then say "finito" (or choose your own stopword). It outputs a markdown file with your transcribed speech interleaved with screenshots and text that you copied. You can say other keywords like "marco" and it will take a screenshot hands-free.When the session ends, claude reads the timeline (e.g. looks at screenshots) and gets to work.I can clean it up and push to github if anyone would get use out of it.
 - mritchie71223 days ago
 <a href="https://github.com/definite-app/ultraplan" rel="nofollow">https://github.com/definite-app/ultraplan</a>
 - heliostatic23 days ago
 Definitely interested in that!
 - mritchie71223 days ago
 Added link above!
 - wanderingmind23 days ago
 Sounds interesting I would love to use it if you get a chance to push to github
 - mritchie71223 days ago
 <a href="https://github.com/definite-app/ultraplan" rel="nofollow">https://github.com/definite-app/ultraplan</a>
- sipjca23 days ago
 I totally agree with you and largely what you’re describing is one of the reasons I made Handy open source. I really want to see something like this and see someone go experiment with making it happen. I did hear some people playing with using some small local models (moondream, qwen) to get some more context of the computer itselfI initially had a ton of keyboard shortcuts in handy for myself when I had a broken finger and was in a cast. It let me play with the simplest form of this contextual thing, as shortcuts could effectively be mapped to certain apps with very clear uses cases
- eddyg23 days ago
 There’s lots of existing work on “coding by voice” long before LLMs were a thing. For example (from 2013): <a href="http://xahlee.info/emacs/emacs/using_voice_to_code.html" rel="nofollow">http://xahlee.info/emacs/emacs/using_voice_to_code.html</a> and the associated HN discussion (“Using Voice to Code Faster than Keyboard”): <a href="https://news.ycombinator.com/item?id=6203805">https://news.ycombinator.com/item?id=6203805</a>There’s also more recent-ish research, like <a href="https://dl.acm.org/doi/fullHtml/10.1145/3571884.3597130" rel="nofollow">https://dl.acm.org/doi/fullHtml/10.1145/3571884.3597130</a>
- hasperdi23 days ago
 What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.
 - sipjca23 days ago
 Handy can post process with LLMs too! It’s just currently hidden behind a debug menu as an alpha feature (ctrl/cmd+shift+d)
 - sanex23 days ago
 I was just thinking about building something like this, looks like you beat me to the punch, I will have to try it out. I'm curious if you're able to give commands just as well as some wording you want cleaned up. I could see a model being confused between editting the command input into text to be inserted and responding to the command. Sorry if that's unclear, might be better if I just try it.
 - sipjca23 days ago
 I’d just try it and fork handy if it doesn’t work how you want :)
- ryanshrott13 days ago
 [dead]
ryanshrott13 days ago
I've been going down this rabbit hole too. I ended up building DictaFlow (<a href="https://dictaflow.vercel.app/" rel="nofollow">https://dictaflow.vercel.app/</a>) because I needed something that specifically works in VDI/Citrix environments where clipboard pasting is blocked (I work in finance).It uses a 'character-typing' method instead of clipboard injection, so it's compatible with pretty much anything remote. Also kept it super lightweight (<50MB RAM) for Windows users who don't want to run a full local server stack.Cool to see Handy using the newer models—local voice tech is finally getting good.
kuatroka23 days ago
Love it. I had been searching for STT app for weeks. Every single app was either paid as a one off or had a monthly subscription. It felt a bit ridiculous having to pay when it’s all powered by such small models on the back end. So I decided to build my own. But then I found “Handy” and it’s been a really amazing partner for me. Super fast, super simple, doesn’t get in my way and it’s constantly updated. I just love it. Thanks a lot for making it! Thanks a lotP.S. The post processing that you are talking about, wouldn’t it be awesome.
frankdilo23 days ago
This looks great! What’s missing for me to switch from something like Wispr Flow is the ability to provide a dictionary for commonly mistaken words (name of your company, people, code libraries).
- tin7in23 days ago
 It has something called "Custom Words" which might be what you are describing. Haven't tested this feature yet properly.
 - frankdilo22 days ago
 So is this already in Handy or you are referring to a feature of the underlying models you are still not actively using?
 - tin7in22 days ago
 This is already in Handy in Advanced > Custom Words.There is also Post Processing where you can rerun the output through an LLM and refine it, which is the closest to what Wispr Flow is doing.This can be found in the debug menu in the GUI (Cmd + Shift + D).
- jauntywundrkind23 days ago
 I dig that some models have an ability to say how sure they are of words. Manually entering a bunch of special words is ok, but I want to be able to review the output and see what words the model was less sure of, so I can go find out what I might need to add.
- sipjca23 days ago
 There’s a PR for this which will be pulled in soon enough, I can kick off a build of the PR if you want to download a pre release version
 - sipjca23 days ago
 Okay so it's more directly text replacements<a href="https://github.com/cjpais/Handy/actions/runs/21025848728" rel="nofollow">https://github.com/cjpais/Handy/actions/runs/21025848728</a>There is also LLM post processing which can do this, and the built in dictionary feature
Barbing23 days ago
Quick thoughts re: mentioned transcribersSuperwhisper — Been using it a long time. It's paid with a lifetime subscription available. Tons of features. Language models are built right in without additional charge. Solo dev is epic; may defer upgrades to avoid occasional bugs/regressions (hey, it's complex software).Trying each for a few minutes:Hex — Feels the leanest (& cleanest) free options mentioned for Mac in this thread.Fluid Voice — Offers a unique feature, a real-time view of your speech as you talk! Superwhisper has this, but only with an online model. (You can't see your entire transcript in Fluid, though. The recording window view is limited to about one sentence at a time--of course you do see everything when you complete your dictation.)Handy — Pink and cute. I like the history window. As far as clipboard handling goes, I might note that the "don't modify clipboard" setting is more of a "restore clipboard" setting. Though it doesn't need as many permissions as Hex because it's willing to move clipboard items around a bit, if I'm not mistaken.Note Hex seems to be upset about me installing all the others... lots of restarting in between installs all around. Each has something to offer.---Big shout out to Nvidia open-sourcing Parakeet--all of these apps are lightning fast.Also I'm partial to being able to stream transcriptions to the cursor into any field, or at least view live like Fluid (or superwhisper online). I know it's complex b/c models transcribe the whole file for accuracy. (I'm OK with seeing a lower quality transcript realtime and waiting a second for the higher-quality version to paste at the end.)
mncharity23 days ago
A cautionary user experience report. The default hotkey upon download is ctrl+space. Press to begin recording, release to transcribe and insert. Key-up on the space key constitutes hotkey release. If the ctrl key is still down when the insertion lands, the transcribed text is treated as ctrl characters. The test app was emacs. (x64 linux x11, with and without xdotool)
PhilippGille23 days ago
Has anyone compared this with <a href="https://github.com/HeroTools/open-whispr" rel="nofollow">https://github.com/HeroTools/open-whispr</a> already? From the description they seem very similar.Handy first release was June 2025, OpenWhispr a month later. Handy has ~11k GitHub stars, OpenWhispr has ~730.
- gabrielste1n17 days ago
 Creator of OpenWhispr here! Honoured to be compared to Handy!I built OW because I was tired of paying for WisprFlow. I'd say it is more flexible by design: Whisper.cpp (CPU + GPU) for super fast local transcription, Parakeet in progress, local or cloud LLMs for cleanup (Qwen, Mistral, Gemini, Anthropic, OpenAI, Groq etc.), and bring-your-own API keys!Handy is more streamlined for sure!Would love any feedback :)
- kuatroka23 days ago
 I did have tried, but the ease of installing handy as just a macOS app is so much simpler than needing to constantly run in npm commands. I think at the time when I was checking it, which was a couple of months ago they did not have the parakeet model, which is a non-whisper model, so I had decided against it. If I remember correctly, the UI was also not the smoothest.Handy’s ui is so clean and minimalistic that you always know what to do or where to go. Yes, it lacks in some advanced features, but honestly, I’ve been using it for two months now and I’ve never looked back or searched for any other STT app.
 - ranguna23 days ago
 The OP asked if someone compared both, which usually means actually trying both and not just installing one and skimming through the other's README file. So, in summary, you didn't try both and didn't answer the OP.
aucisson_masque23 days ago
It’s incredibly fast on my MacBook m1 air and more accurate that the native speech to text.The ui is well thought out, just the right amount of setting for my usage.Incredible !Btw, do you know what « discharging the model » does ? It’s set to never by default, tried to check if it has an impact on ram or cpu but it doesn’t seem to do anything.
- mixtureoftakes23 days ago
 the model is permanently loaded into ram for access speed. discharging it would unload it from ram and lead to longer start times
 - sipjca23 days ago
 It does unload it, and actually might be a good default for most people as the model loading does happen in the background as soon as you hit the key
peterldowns23 days ago
Huge fan! Parakeet v3 works great with it. I have used Monologue, Superwhisper, and Aqua, at various times in the past. But Handy is at least as good, and it's not an expensive subscription. I love that it runs locally, too. Strongly recommend!
Jack550023 days ago
The Parakeet V3 model is really great!
- swah20 days ago
  I don't get why Apple can't just integrate this and we're done with keyboards..
Jayakumark23 days ago
Its great, i have been using it . Two requests though 1. iOS app 2. API option to use against meeting transcription or route audio from Mic .
- blensor23 days ago
  +1 on the meeting tranecription
holtwick23 days ago
FluidVoice for macOS is pretty handy as well. Open source under Apache License. <a href="https://altic.dev/fluid" rel="nofollow">https://altic.dev/fluid</a> <a href="https://github.com/altic-dev/FluidVoice" rel="nofollow">https://github.com/altic-dev/FluidVoice</a>
- jimmydoe23 days ago
 Its vibe coded UI feels too complicated.
unutranyholas23 days ago
<a href="https://hex.kitlangton.com/" rel="nofollow">https://hex.kitlangton.com/</a> is good
llarsson23 days ago
A question because I'm not using speech-to-text, but find it intriguing (especially since it's now possible to do locally and for free).How have your computing habits changed as a result of having this? When do you typically use this instead of typing on the keyboard?
- tin7in23 days ago
 I use it all the time with coding agents, especially if I'm running multiple terminals. It's way faster to talk than type. The only problem is that it looks awkward if there are others around.
 - johnisgood23 days ago
 Interesting. I can think and type faster, but not talk. I am not much of a talker.
 - stavros23 days ago
 Same, whenever I try to dictate something I always umm and ahhh and go back a bunch of times, and it's faster to just type. I guess it's just a matter of practice, and I'm fine when I'm talking to other people, it's only dictation I'm having trouble with.
- noneofyour23 days ago
 Part of my job is to give feedback to people using Word Comments. Using STT, it's been a breeze. The time saving really is great. Thing is, I only do this when working at home with no one around. So really only when WFH.
dumbmrblah23 days ago
I just set this up today. I had Whispering app set up on my Windows computer, but it really wasn't working well on my Ubuntu computer that I just set up. I found Handy randomly. It was the last app I needed to go Linux full-time. Thank you!
wi5eif6E23 days ago
This looks and works great! A settings option to keep no recording history at all would be terrific.
- sipjca23 days ago
  It’s in the debug menu right now (ctrl/cmd+shift+d)
vladstudio23 days ago
Use it daily. Looks and works great.
mrroryflint23 days ago
On a M4 Macbook Air, there was enough lag to make it unusable for me. I hit the shortcut and start speaking but there was always a 1-2sec delay before it would actually start transcribing even if the icon was displayed.
- jborichevskiy23 days ago
 Curious if you were using AirPods or other Bluetooth headphones for this?If so, there should be "keep microphone on" or similar setting in the config that may help with this, alternatively, I set my microphone to my MacBook mic so that my headphones aren't involved at all and there is much less latency on activation
 - mrroryflint23 days ago
 Airpods Max (is that the name?) - the big ones.
 - jborichevskiy21 days ago
 Makes sense. If you enable the Debug menu (Shift+CMD+D), there is an option for "Always-On Microphone". Might be worth a try to remove that latency.
- kuatroka23 days ago
 Yes, I’ve got the same situation too. I kind of learned to wait for one or two seconds before talking. I am using it with the AirPods, so maybe it’s indeed the Bluetooth thing.
- sipjca23 days ago
 What microphone are you using?
 - mrroryflint23 days ago
 Airpods Max (is that the name?) - the big ones.
 - sipjca23 days ago
 Yeah like the other commenters mentioned, using Bluetooth devices does not work super well at the moment. Hopefully I’ll have a fix at some point. There’s just some time over bluetooth to negotiate the connection and everything, and the app doesn’t do a good job showing this at all right nowOn a Mac I definitely recommend using the internal mic even if wearing airpods
 - mrroryflint18 days ago
 Thank you!
miniwark23 days ago
Did this thing (or open-whispr) work well with other languages than english ?
- dawkins23 days ago
  In Spanish works very well
- wi5eif6E23 days ago
  German also works great.
erelong23 days ago
WhisperTux on linux worked ok, curious how Handy compares: <a href="https://github.com/cjams/whispertux" rel="nofollow">https://github.com/cjams/whispertux</a>
qprofyeh23 days ago
As a Mac user, am I missing something? macOS has Dictation built-in, when you short press F5 it should start transcribing your spoken words into text in real time. It even does non-English languages.
- d4rkp4ttern23 days ago
 Besides being trash as others said, there’s a trade off with real time transcription word by word - there’s no opportunity for an AI to holistically correct/clean up the transcription
 - SkyPuncher23 days ago
 But, OSX does come back and fix things.
 - d4rkp4ttern23 days ago
 You mean, after displaying each word as it is spoken, then OSX goes back and fixes what’s been displayed? I think I’ve seen it fix one or two recent words, but I guess you’re saying it could fix the entire sentence as well. I didn’t know that
 - SkyPuncher23 days ago
 Yea, I use it daily for getting my thoughts into Claude. I often see it rewriting sentences it’s confused on.
- luigi2323 days ago
 it's trash if:- you're not a native speaker or have accent- using airpods mic- surroundings is noisy- use novel words like 'claude code'- mumble a bit
walthamstow23 days ago
Nice. I spent most of Christmas vibe coding with Google Antigravity with one hand while holding a sleeping baby in the other. MacOS built in dictation is OK, but struggles with technical language.
chainmail202923 days ago
There's a slightly awkward naming overlap with an existing product.
- unwind23 days ago
 Which one? I did a quick search but that didn't turn up anything so perhaps it's a partial word overlap or something.I did find the projects "user-facing" home page [1] which was nice. I found it rather hard to find a link from that to the code on GitHub, which was surprising.[1]: <a href="https://handy.computer/" rel="nofollow">https://handy.computer/</a>
 - DomB23 days ago
 It's the German word for smart phone / mobile phone
 - zavec23 days ago
 There's also a sex toy
 - sReinwald23 days ago
 [dead]
- ensocode23 days ago
 This is a slightly German-centric comment.
- xfeeefeee23 days ago
 [dead]
bn-usd-mistake23 days ago
Does anyone have a similar mobile application that works locally and is not too expensive? Mostly looking to transcribe voice messages sent over Signal which does not offer this OOTB
- 4mitkumar23 days ago
 I have been using this one from Futo for quite some time and love it: <a href="https://keyboard.futo.org/" rel="nofollow">https://keyboard.futo.org/</a>They also have a voice input only version if you still would like to keep your typing keyboard: <a href="https://voiceinput.futo.org/" rel="nofollow">https://voiceinput.futo.org/</a>
- bogtap8223 days ago
 There is one single app I've been able to find that offers Parakeet-v3 for free locally and it's called Spokenly. They have paid cloud models available as well, but the local Parakeet-v3 implementation is totally free and is the best STT has to offer these days regardless. Super fast and accurate. I consider single-user STT basically a solved problem at this point.
 - kuatroka23 days ago
 Spokenly is great too, but Handy's minimalistic and focused UI won me over.
 - dumbmrblah23 days ago
 Spokenly is my go-to app on iOS for transcription as well.
 - Esus-ai21 days ago
 [dead]
- nerdfax23 days ago
 [dead]
jborichevskiy23 days ago
Big Handy fan!
oybng23 days ago
On Windows this depends on webview2, which the installer attempts to download. No mention of this requirement in the readme. It's a shame this software isn't portable
mnmalst23 days ago
This is really cool. Works out of the box and I'm typing this using handy.Is there any way to execute commands directly on Linux?Also a feature to edit or correct already typed text would be really great.
skor23 days ago
This is so handy, thank you very much. Good work!!
dotancohen23 days ago
Looks interesting. Why does it need a GUI at all?
- tin7in23 days ago
  As an alternative to Wisprflow, Superwhisper and so on. It works really well compared to the commercial competitors but with a local model.
- Barbing23 days ago
  I hear a CLI request? Tons of CLI speech-to-text tools by the way, really glad to see this. Excellent competitors (Superwhisper, MacWhisper, etc.) are closed/paid.
- sipjca23 days ago
  It doesn’t! Just makes it more accessible to more people I feel. There’s a cli version for Mac which I wrote first handy-cli
- unwind23 days ago
  Ah, that was a typo: you meant "GPU" (Graphics Processing Unit, not "GUI" which of course is Graphical User Interface) since that is listed in the system requirements. Explained implicitly by an existing comment, thanks!
- kristianp23 days ago
  So more people can use it?
- satvikpendem23 days ago
  Because local AI models run well on a GPU, better than on a CPU
swordsith22 days ago
from the read-me, 'Handy isn't trying to be the best speech-to-text app—it's trying to be the most forkable one.' Why cant we write a readme without using generative AI, seriously, it's not that hard. :<
ekjhgkejhgk23 days ago
Explain to me why a speech-to-text app has 50% of its code in typescript...?
- beklein23 days ago
 Not the author/contributor, but the app is built using Tauri for easy multi-platform support, so the backend logic is implemented in Rust and the frontend UI is implemented in TypeScript. I think it’s a valid choice. GitHub does not include any model _code_ in the stats; the models will be downloaded separately the first time you use them. Hope this helps.I know many people hate sites like this, but I actually like them for these use cases. You can get a quick, LLM-generated overview of the architecture, e.g. here: <a href="https://codewiki.google/github.com/cjpais/handy" rel="nofollow">https://codewiki.google/github.com/cjpais/handy</a>
- sipjca23 days ago
 Tauri
Dnguyen23 days ago
Would be nice if the output can be piped directly into Claude Code.
laylower23 days ago
Is it deployed locally or does it send data to your servers?
- sipjca23 days ago
 It’s all local
 - mixtureoftakes23 days ago
 Which model would be the best to use for mandarin? Are there any models on par with Parakeet that are just as fast but also understand Chinese?
 - mixtureoftakes23 days ago
 also is there a way to make parakeet type more naturally? less capitallization, less punctuation? can this be a setting?this can already be done via local llm processing the text but surely there is an easier way to do this, right
 - sipjca23 days ago
 I believe sensevoice, I’ll hopefully be implementing it soon enough
fittingopposite23 days ago
Is there any good android app featuring parakeet v3?
- fittingopposite23 days ago
 Went into a rabbit hole and found this: <a href="https://github.com/notune/android_transcribe_app" rel="nofollow">https://github.com/notune/android_transcribe_app</a> Solid app that uses Parakeet V3. With these random apps on the internet I am always a bit sceptical. Checked it with adb and it is really fully local. I now have a voice keyboard that is a lot better than Google's and has local multilanguage support. I am stoked :)
 - fittingopposite23 days ago
 Now I can continue coding via tmux/Claude Code with the <a href="https://github.com/rberg27/doom-coding" rel="nofollow">https://github.com/rberg27/doom-coding</a> setup while going for a walk in nature.
blutoot23 days ago
Crashes on Tahoe 26.3 Betq 1 :(
- sipjca23 days ago
  Please send me a crash log!
sirjaz23 days ago
This is great, and I love that this is not another webapp
atay12323 days ago
[dead]
olya_pllkh22 days ago
[dead]