How to Write Better AI Prompts with Voice
Most people write short AI prompts. Not because they don't have more to say, but because typing is slow enough that they self-edit before they start. You think "I should explain the context" and then type "fix the bug" because the full explanation would take 45 seconds to type and you've got momentum to maintain.
Voice removes this bottleneck. When you speak a prompt instead of typing it, three things happen: you say more, you explain more naturally, and you give the AI better instructions without trying harder. This guide covers why voice input produces better AI prompts and how to do it effectively.
The prompt length problem
Anyone who's spent time with AI tools knows that more detailed prompts produce better results. A prompt with context, constraints, and examples outperforms a vague request almost every time. But in practice, the friction of typing long prompts means most people don't bother.
Here's a real example. You're building a Next.js app and the pricing page needs work. Both prompts ask for the same thing:
Fix the layout on the pricing page.
The pricing page has three plan cards that look fine on desktop but on mobile they're stacking with no spacing between them and the feature comparison table is overflowing off the right side of the screen. Can you make the cards stack with some breathing room on mobile and make the table horizontally scrollable? Also the "most popular" badge on the Pro card is getting cut off.
The typed version takes 3 seconds but gives the AI almost nothing to work with — it'll ask follow-up questions or guess wrong. The spoken version takes 15 seconds and gives the AI everything it needs in one shot. Most people wouldn't bother typing 68 words into a prompt box, but they'd happily say them.
Why voice produces better prompts
You describe instead of command
When you type, you tend to write commands: "fix this," "add that," "change this to that." When you speak, you tend to describe: "the problem is X, I think it's because of Y, and what I want is Z."
Description is exactly what AI tools need. The more context you provide about why you want something, the better the result. Voice naturally shifts you from terse commands to conversational description.
You don't self-edit mid-thought
Typing introduces a constant editing loop. You write half a sentence, delete a word, rephrase, reconsider. The editing takes longer than the writing — and by the time you hit Enter, you've compressed your original thought into the shortest possible version.
Speaking bypasses this. You start talking, and the thought comes out in full. Yes, it includes filler words and restarts — but AI tools handle imperfect language well. A rambling 80-word prompt with some "ums" gives Claude or ChatGPT far more to work with than a polished 10-word command.
You front-load context naturally
When you explain something verbally, you instinctively start with background before getting to the request. "So the user dashboard has this feature where..." before "can you change the sort order." This is exactly the structure that produces the best AI results — context first, request second.
When typing, people tend to skip the context and jump straight to the request because the context feels like "too much work to type."
Techniques for voice prompting
Just start talking
You don't need to plan what you're going to say before you hold the key. Some of the best prompts come from thinking out loud — you start describing the problem, realize the root cause mid-sentence, and end up giving the AI context you wouldn't have thought to include in a typed prompt.
Stream of consciousness works. AI tools are good at extracting meaning from unstructured speech. A rambling 80-word prompt that wanders through the problem space often outperforms a crisp 15-word instruction, because the "rambling" is actually you working through the details in real time.
Include everything that's on your mind — the half-formed theory about why something broke, the constraint you're not sure matters, the thing you tried that didn't work. All of it is useful context, and none of it costs you extra effort when you're just talking. The AI can ignore what's irrelevant, but it can't use context you didn't provide.
Start with context, end with the request
Structure your spoken prompts like a conversation:
- What exists: "We have a checkout flow with three steps..."
- What's happening: "...and users are dropping off on the second step where they enter their shipping address..."
- Why: "...I think it's because the form resets when they go back to change their cart..."
- What you want: "...can you make the form persist its state when the user navigates between steps?"
This takes about 10 seconds to say. An AI tool receiving this prompt can immediately understand the flow, the problem, and what the fix should preserve — no follow-up questions needed.
Don't clean up for the AI
A common instinct is to speak "properly" for the transcription — carefully enunciating, avoiding contractions, speaking in complete sentences. Don't bother.
AI tools understand casual speech perfectly well. "Yeah so the thing is the login page, when you hit it on mobile it's like, the form is all squished and the submit button is off screen" is a perfectly usable prompt. The AI knows what you mean, and the meaning is what matters.
If you're using Doing, you can set up a Cleanup Skill that removes filler words and tightens grammar automatically — but for AI coding tools, raw speech usually works better because it preserves nuance.
Include your taste
AI tools tend toward the median — the most generic, safe version of whatever you ask for. The output gets dramatically better when you include your aesthetic preferences, design opinions, and taste. But typing out "I want it to feel minimal and confident, not playful, with tight spacing and no rounded corners" feels precious. Saying it out loud feels natural — it's just how you'd describe what you want to a designer sitting next to you.
This applies to everything: UI direction, writing tone, architecture preferences, naming conventions. The more specific you are about how you want something, not just what you want, the less generic the output. Voice makes it easy to include this kind of nuance because it costs you nothing extra — you're already talking.
Use voice for the first prompt, type for follow-ups
The highest-value moment for voice is the first prompt in a conversation — where you're setting up context and explaining the problem. Follow-up prompts ("yes, but also handle the edge case where...") are shorter and typing may be faster.
That said, if you're iterating rapidly with an AI agent, voice with YOLO Mode (auto-submit) lets you maintain a spoken conversation with the tool at the speed of natural speech.
Describe what you see
When the issue is visual — a broken layout, an unexpected chart, an error dialog — describe what you're looking at while looking at it. "I'm on the settings page and there's a white gap between the header and the first section, it's about 40 pixels, and it only shows up when the sidebar is collapsed."
With Doing, you can also capture a screenshot during recording to give the AI both your verbal description and the visual context.
Real patterns from daily voice prompting
After months of using voice input for AI-assisted development, a few natural patterns emerge:
The context-setter, then rapid-fire iteration
A common workflow is one long opening prompt that sets up context (60-150 words), followed by a series of short iterative follow-ups (5-20 words each):
The pricing page has three plan cards that look fine on desktop but on mobile they're stacking weird — no spacing, the feature table overflows, and the "most popular" badge is cut off. Can you fix the mobile layout and make the table scroll horizontally?
Then the follow-ups are quick reactions: "Make the badge bigger." "Add more padding between the cards." "Now check how it looks on tablet."
Voice is perfect for that opening prompt. The follow-ups can be voice or typed — whatever's faster in the moment.
"Help me think through this"
Some of the most valuable voice prompts aren't instructions at all — they're invitations to brainstorm:
What other ideas do you have about improving this feature for everything you know about our users? Consider our competition and what makes us truly unique. Don't write any code, just brainstorm with me.
Or:
Help me think through how we would build something like this — ask me any clarifying questions and then let's figure out the approach.
A related pattern is asking the AI to interview you — instead of trying to organize your thoughts into a prompt, let the AI ask the questions and you just answer:
I need to write a PRD for this feature but I'm not sure where to start. Can you interview me about it? Ask me questions one at a time and I'll talk through my answers, then you can pull it together into a doc.
This works especially well with voice because answering questions out loud is effortless. You end up providing far more detail than you would have if you'd tried to write the PRD from scratch.
These prompts are hard to type because they feel too open-ended to bother writing out. But they're natural to say, and they produce some of the best AI output because you're giving the model permission to think broadly rather than execute narrowly.
Relaying user feedback as a story
When a user reports a bug or makes a feature request, the most effective prompt is just... retelling what happened:
One of my users has been really heavily using the app and said that when he's speaking for longer sessions, like 30 or 60 seconds, some of the end of the conversation is being cut off, even though he still had the hotkey held down. Can you take a look and see if there's any obvious reasons why this might be happening?
This is exactly how you'd describe the problem to a coworker. Voice makes it natural. Typing it out would feel like writing a bug report, so most people would compress it to "audio getting cut off for long recordings" and lose all the useful context.
Describing what you see on screen
A huge portion of real voice prompts are describing visual state:
The layout is all puffy at the top, there's a lot of space above the cards.
The cards stack fine on desktop but on mobile they're overlapping with no spacing.
You're looking at the screen while talking, so you naturally reference what you see. Pair this with screenshot capture and the AI gets both your narration and the visual.
When to use voice vs. typing
Voice input isn't better for everything. Here's when each shines:
Voice is better for:
- First prompts that need context and explanation
- Describing bugs, requirements, or desired behavior
- Brainstorming and thinking out loud
- Long-form writing (Slack messages, PRDs, code review comments)
- Anything you'd explain faster than you'd type
Typing is better for:
- Short follow-ups ("yes," "try a different approach," "also add a test")
- Precise code snippets or syntax
- Copy-pasting existing content
- Situations where you can't speak (meetings, libraries, shared offices)
Most developers who adopt voice input end up using a mix: dictate the meaty prompts, type the quick ones.
Getting started
If you want to try voice-driven AI prompting:
- Download Doing — free trial of 100 transcriptions, no account required
- Pick a hotkey — fn for Mac keyboards, Option+Space for external keyboards
- Enable YOLO Mode if you use terminal-based agents (Claude Code, Codex) — it auto-submits after pasting
- Start with one workflow — pick whatever AI tool you use most and commit to voice-prompting it for a day
The shift feels awkward for about 30 minutes. After that, typing prompts starts to feel like the slow way.
doing.
100 free transcriptions · $49 once · No subscriptionFAQ
Do AI tools actually understand spoken prompts?
Yes. Large language models like Claude, GPT, and Gemini are trained on natural language. They handle filler words, restarts, and conversational phrasing without issue. In practice, a rambling spoken prompt with full context outperforms a terse typed prompt almost every time.
Should I clean up my prompts before sending them?
For AI coding tools (Claude Code, Cursor, Codex), no — raw speech works well and preserves context. For human-facing output (Slack, email), yes — use a post-processing Skill to clean up filler words and tone. Doing lets you set this per app so it happens automatically.
How much faster is voice than typing for prompts?
Most people speak at 130-150 words per minute and type at 40-60. For a prompt with real context (50-100 words), voice takes about 30 seconds while typing takes 1-2 minutes. The gap widens for longer prompts and narrows for short follow-ups.
Does voice input work well for non-English prompts?
Doing supports 99 languages with auto-detection. The default Parakeet engine handles multilingual transcription locally on your Mac. For AI tools that accept non-English prompts, you can dictate in any supported language.