Pushing Talk_: own your voice, own what it says

It recorded. It transcribed, word for word, perfectly. Then the text vanished.

Nothing reached the field I was typing in. No error, no crash, no red flag. The recording worked, the transcription was flawless, and the one thing that mattered, the words landing where I wanted them, never happened.

That is the most expensive kind of failure, because it looks like success right up until the last inch. Then the strangest part. I had granted the permission it needed. The settings pane showed a tick. And it still did nothing.

I granted the permission. The system said yes. And nothing happened.

Why I built it

Every cloud dictation tool sends your voice to a server you do not own. I dictate client names, half-formed ideas, things I would never want sitting in a log I cannot see. I did not want my voice to be someone else's data.

So I built one that keeps it on the machine I own. Permissions, privacy and trust are not plumbing you hide behind a product. On a tool that listens to your voice, they are the product. Every bug in this story is, underneath, a trust question wearing a technical costume.

What I built

It is called Pushing Talk_. Hold a key, speak, release, and your words land as text in whatever field you were already typing in. That is the entire interaction. It runs on macOS, and it is free.

The part that matters sits underneath. The speech-to-text runs locally, on my machine and on yours, using a local Whisper model. No cloud. No account. No network call. Your voice never leaves your Mac.

Own what it says

Owning the app is half of it. The other half is owning what it produces.

Cloud dictation rents you a transcription and keeps the original. Your words land in a log on a server you do not control, useful to whoever owns it, gone to you. When the dictation runs on my own machine, the words stay mine, and that changes what they are. They stop being throwaway text and start being a record.

So I point them at a store I own. A private, searchable record of everything I talk out, that lives on my machine and answers only to me. No vendor sees it. No model trains on it.

That record is the part I am most excited about. A spoken thought is the cheapest thing I make and usually the first thing I lose. Kept and searchable, it becomes something I can hand to my own assistant later. Find the thread I was pulling on last week. Draft from the note I talked out in the car. Connect two ideas I spoke a month apart. The voice is the input. The thing I own is the memory.

None of that works if your dictation lives on rented infrastructure. A private store of your own thinking is exactly what a cloud product cannot let you keep. That is the whole reason to build it local.

How I built it

I did not type it out by hand in one sitting. I brainstormed the outcome, turned it into a plan, and the plan became the build.

The code was written by Codex sessions running in parallel, each handed a tight slice of the work. Claude held the spec, dispatched the slices, reviewed what came back, and I made the calls. The judgment that ships is a separate job from the hands that build, and that split has its own write-up in the Codex Loop.

I never leave that loop. I decide what "done" means and I approve the result. The models accelerate the middle. They do not get to define the outcome. The system around the AI is the intelligence, not the model.

Build with Codex. Ship with Claude.

But built is not the same as working. I declared this thing finished three times before it actually worked, and each near-miss taught me a debugging principle that has nothing to do with dictation. Here is where it got hard.

Beat one: instrument before you fix

The symptom was "no paste," so the instinct is to open the paste code and start patching. I resisted that. The visible failure is almost never the cause.

My first real move was not a fix. It was a light. I put a diagnostic log at every stage of the pipeline. Key pressed. Audio captured. Transcription complete. Text ready. Insertion attempted. Each stage stamped, so the path the data actually took became visible instead of imagined.

The log answered the question in one run. Every stage up to insertion was clean, and the insertion itself never landed. Everything before the last step was innocent.

A five-stage pipeline: Key, Audio, Transcribe and Text all ticked, then a broken arrow with a cross into an empty Insert box. — Every stage green, except the handoff into the field.

This is the oldest lesson in debugging, and it is table stakes. You have heard it before, so I will not dwell on it. The interesting part is what the light revealed.

Beat two: the bug was not in the code

To paste text into another app on macOS, an app needs an accessibility permission, the grant that lets one app put text into another. You allow it once, and from then on it should just work.

I had granted it. The setting showed the tick. And it did nothing.

The cause was the operating system's permission system, not my code. Earlier builds had each registered their own entry. Four stale permission entries sat in the list, ghosts of versions that no longer existed. On macOS, a permission is tied to a specific signed identity, and the build I was running matched none of them. The tick in the settings pane belonged to a build that was gone.

So a permission that read as granted was, in effect, refused. The fix lived in app signing, five steps upstream from the symptom. The bug is rarely where you look first. And the cause itself is the thesis showing up as a defect. The system refused a capability because it could not confirm my app was who it claimed to be.

Permissions
A permission can read as granted and still point at nobody.
Pushing Talk_v1✓
Pushing Talk_v2✓
Pushing Talk_v3✓
Pushing Talk_v4✓
Pushing Talk_running nowasking
The tick is real. It just belongs to builds that are gone, so the live app gets nothing.

Beat three: then it pasted three times

I cleared the permission, and the text finally arrived. Three times. The same sentence, stamped into the field in triplicate.

The paste logic I had written used a fallback design. Try one method of inserting text, and if it cannot confirm that method worked, try the next. The problem was confirmation. For some apps it could not read the field back to check whether anything had landed. With no signal that it had succeeded, the "stop on success" condition never fired, so every method ran in turn. Three methods, three pastes.

A system that cannot confirm its own success will repeat itself. Unverifiable actions retry, duplicate, and corrupt, not because the code is careless but because it has no signal to stop on. The fix was not a better paste method. It was a real confirmation step. Design the check, not just the action.

Paste
A system that cannot confirm success will repeat itself.
The field is empty.

Beat four: the fix had become the bug

Now the tool worked. It recorded, transcribed locally, pasted once, into the right place. By any normal standard, finished.

Finished is where the embarrassing things hide.

I ran the working tool through a proper security review, the way an attacker would, a panel of reviewers each attacking from a different angle. They did not find a flaw in my original design. They found a flaw in my repair.

Remember the diagnostic log from beat one, the hero of the story. It had been writing previews of the transcribed text to a plain file on disk. And the raw audio captured for each dictation was never being deleted after transcription. For a tool whose whole promise is that your voice never leaves your machine, the scaffolding I built to debug it had quietly turned the inside of the machine into a place where your words sat in the clear.

Neither was in my original product. Both arrived as helpers. I added the log to debug. I kept the audio to transcribe. The things I added to make the tool work had become the way it failed its one promise.

The leak was not in the product. It was in the repair. Audit what you added, not just what was already there.

Call it the self-inflicted bug. The failure you introduce while fixing a different one, hiding inside the thing you trust most because you just added it.

Diagnostics write data. Retries repeat actions. Caches hold copies. The instruments you reach for under pressure are the additions least likely to get a second look, because they feel like helpers, not features. They are features. They ship.

What you added
your Mac
🎙 your voice
debug log
(empty)
audio clips (0)
(none kept)
Clean. Your voice stays in the box.
Audit what you added, not just what was already there.

I caught and closed both in review, before any release. The log now stores no transcript content. The raw audio is deleted the moment transcription finishes. I made the promise true on the bench, instead of breaking it later in a stranger's hands.

The question that found it was not "does it work." A working tool had already convinced me, its maker. It was "if I wanted to leak something through this, where would I go," asked by reviewers who did not build it and had no reason to flatter it.

The payoff

Pushing Talk_ is the example. These six principles travel to any system you build.

Instrument before you fix. You already know this one. You cannot fix what you cannot see, so make the invisible visible before you touch a line.
The bug is rarely where you look first. The symptom and the cause were five steps apart. Follow the evidence, not the instinct.
A system that cannot confirm success will repeat itself. Unverifiable actions duplicate and corrupt. Design the check, not just the action.
The fix can become the next bug. The self-inflicted bug is real. Audit what you added with the same suspicion you bring to the original code.
"Working" is not "done." Done is "I tried to break it and could not." A real review of a finished thing is where the embarrassing stuff surfaces, on your terms instead of a stranger's.
Permissions and trust are part of the product. On-device, signed, least-privilege. That is a feature you can name, not plumbing you bury.

Read those together and one idea remains. Stop prompting, start defining outcomes. Decide what "done" actually means, then instrument toward it, including the helpers you brought in to get there.

The close is not a feature list. It is the point. I built it free. I signed it, so the system and the person running it both know exactly what it is. It is least-privilege. The speech-to-text is local, so your voice never leaves your machine, the raw audio is deleted the moment it is used, and the debug log holds no transcript content, because once it did, and that was the bug.

Trust is not what you do after the product works. On a thing that hears your voice, trust is the product. Everything else is just the button you hold down.

What did you add to your last build to help it, and when did you last go back and check it was not quietly hurting?

Get the beta

Pushing Talk_ works, it is free, and right now the only way to get it is to ask.

It is in private beta while I tune the last edges. Message me in the Clief Notes community and I will send you a copy. Tell me what you dictate into all day, because that is the part I am still shaping.

Message me on Skool

The Codex Loop: A Native App Dev Playbook

Stop Prompting. Start Defining Outcomes.