11 min read·By Brian Ellin

How to Write Better AI Prompts with Voice

In this guide

The prompt length problem
Why voice produces better prompts
Techniques for voice prompting
Real patterns from daily voice prompting
When to use voice vs. typing
Getting started
FAQ

When you're building with AI, you typically have a clear picture of what you want to build or the problem you want to solve. But typing it all out is tedious, so you compress it to a few words and hope the AI figures out the rest.

Voice removes this bottleneck. When you speak a prompt instead of typing it, three things happen: you say more, you explain more naturally, and you give the AI better instructions without trying harder. This guide covers why voice input produces better AI prompts and how to do it effectively.

The prompt length problem

Anyone who's spent time with AI tools knows that more detailed prompts produce better results. A prompt with context, constraints, and examples outperforms a vague request almost every time. But in practice, the friction of typing long prompts means most people don't bother.

Here's a real example. You're building a Next.js app and the pricing page needs work. Both prompts ask for the same thing:

Typed prompt (7 words)

Fix the layout on the pricing page.

Spoken prompt (68 words)

The pricing page has three plan cards that look fine on desktop but on mobile they're stacking with no spacing between them and the feature comparison table is overflowing off the right side of the screen. Can you make the cards stack with some breathing room on mobile and make the table horizontally scrollable? Also the "most popular" badge on the Pro card is getting cut off.

The typed version takes 3 seconds but gives the AI almost nothing to work with — it'll ask follow-up questions or guess wrong. The spoken version takes 15 seconds and gives the AI everything it needs in one shot. Most people wouldn't bother typing 68 words into a prompt box, but they'd happily say them.

Why voice produces better prompts

You describe instead of command

When you type, you tend to write commands: "fix this," "add that," "change this to that." When you speak, you tend to describe: "the problem is X, I think it's because of Y, and what I want is Z."

Description is exactly what AI tools need. The more context you provide about why you want something, the better the result. Voice naturally shifts you from terse commands to conversational description.

You don't self-edit mid-thought

Typing introduces a constant editing loop. You write half a sentence, delete a word, rephrase, reconsider. The editing takes longer than the writing — and by the time you hit Enter, you've compressed your original thought into the shortest possible version.

Speaking bypasses this. You start talking, and the thought comes out in full. Yes, it includes filler words and restarts — but AI tools handle imperfect language well. A rambling 80-word prompt with some "ums" gives Claude or ChatGPT far more to work with than a polished 10-word command.

You front-load context naturally

When you explain something verbally, you instinctively start with background before getting to the request. "So the user dashboard has this feature where..." before "can you change the sort order." This is exactly the structure that produces the best AI results — context first, request second.

When typing, people tend to skip the context and jump straight to the request because the context feels like "too much work to type."

Techniques for voice prompting

Just start talking

You don't need to plan what you're going to say before you hold the key. Some of the best prompts come from thinking out loud — you start describing the problem, realize the root cause mid-sentence, and end up giving the AI context you wouldn't have thought to include in a typed prompt.

Stream of consciousness works. AI tools are good at extracting meaning from unstructured speech. A rambling 80-word prompt that wanders through the problem space often outperforms a crisp 15-word instruction, because the "rambling" is actually you working through the details in real time.

Include everything that's on your mind — the half-formed theory about why something broke, the constraint you're not sure matters, the thing you tried that didn't work. All of it is useful context, and none of it costs you extra effort when you're just talking. The AI can ignore what's irrelevant, but it can't use context you didn't provide.

Don't clean up for the AI

A common instinct is to speak "properly" for the transcription — carefully enunciating, avoiding contractions, speaking in complete sentences. Don't bother.

AI tools understand casual speech perfectly well. "Yeah so the thing is the login page, when you hit it on mobile it's like, the form is all squished and the submit button is off screen" is a perfectly usable prompt. The AI knows what you mean, and the meaning is what matters.

If you're using Doing, you can set up a Cleanup Skill that removes filler words and tightens grammar automatically — but for AI coding tools, raw speech usually works better because it preserves nuance.

Start with context, end with the request

Structure your spoken prompts like a conversation:

What exists: "We have a checkout flow with three steps..."
What's happening: "...and users are dropping off on the second step where they enter their shipping address..."
Why: "...I think it's because the form resets when they go back to change their cart..."
What you want: "...can you make the form persist its state when the user navigates between steps?"

This takes about 10 seconds to say. An AI tool receiving this prompt can immediately understand the flow, the problem, and what the fix should preserve — no follow-up questions needed.

Include your taste

AI tools tend toward the median — the most generic, safe version of whatever you ask for. The output gets dramatically better when you include your aesthetic preferences, design opinions, and taste. But typing out "I want it to feel minimal and confident, not playful, with tight spacing and no rounded corners" feels precious. Saying it out loud feels natural — it's just how you'd describe what you want to a designer sitting next to you.

This applies to everything: UI direction, writing tone, architecture preferences, naming conventions. The more specific you are about how you want something, not just what you want, the less generic the output. Voice makes it easy to include this kind of nuance because it costs you nothing extra — you're already talking.

Use voice for the first prompt, type for follow-ups

The highest-value moment for voice is the first prompt in a conversation — where you're setting up context and explaining the problem. Follow-up prompts ("yes, but also handle the edge case where...") are shorter and typing may be faster.

That said, if you're iterating rapidly with an AI agent, voice with YOLO Mode (auto-submit) lets you maintain a spoken conversation with the tool at the speed of natural speech.

Describe what you see

When the issue is visual — a broken layout, an unexpected chart, an error dialog — describe what you're looking at while looking at it. "I'm on the settings page and there's a white gap between the header and the first section, it's about 40 pixels, and it only shows up when the sidebar is collapsed."

With Doing, you can also capture a screenshot during recording to give the AI both your verbal description and the visual context.

Real patterns from daily voice prompting

After months of using voice input for AI-assisted development, a few natural patterns emerge:

The context-setter, then rapid-fire iteration

A common workflow is one long opening prompt that sets up context (60-150 words), followed by a series of short iterative follow-ups (5-20 words each):

Opening prompt

The pricing page has three plan cards that look fine on desktop but on mobile they're stacking weird — no spacing, the feature table overflows, and the "most popular" badge is cut off. Can you fix the mobile layout and make the table scroll horizontally?

Then the follow-ups are quick reactions: "Make the badge bigger." "Add more padding between the cards." "Now check how it looks on tablet."

Voice is perfect for that opening prompt. The follow-ups can be voice or typed — whatever's faster in the moment.

"Help me think through this"

Some of the most valuable voice prompts aren't instructions at all — they're invitations to brainstorm:

Voice

What other ideas do you have about improving this feature for everything you know about our users? Consider our competition and what makes us truly unique. Don't write any code, just brainstorm with me.

Or:

Voice

Help me think through how we would build something like this — ask me any clarifying questions and then let's figure out the approach.

Interview me

Instead of trying to organize your thoughts into a prompt, tell the AI to interview you:

Voice

I need to write a PRD for this feature. Interview me about it. Ask me questions one at a time and I'll talk through my answers, then pull it together into a doc.

This works especially well with voice because answering questions out loud is effortless. You end up providing far more detail than you would have if you'd tried to write the PRD from scratch.

These prompts are hard to type because they feel too open-ended to bother writing out. But they're natural to say, and they produce some of the best AI output because you're giving the model permission to think broadly rather than execute narrowly.

Relaying user feedback as a story

When a user reports a bug or makes a feature request, the most effective prompt is just... retelling what happened:

Voice

One of my users has been really heavily using the app and said that when he's speaking for longer sessions, like 30 or 60 seconds, some of the end of the conversation is being cut off, even though he still had the hotkey held down. Can you take a look and see if there's any obvious reasons why this might be happening?

This is exactly how you'd describe the problem to a coworker. Voice makes it natural. Typing it out would feel like writing a bug report, so most people would compress it to "audio getting cut off for long recordings" and lose all the useful context.

Describing what you see on screen

A huge portion of real voice prompts are describing visual state:

Voice

The layout is all puffy at the top, there's a lot of space above the cards.

Voice

The cards stack fine on desktop but on mobile they're overlapping with no spacing.

You're looking at the screen while talking, so you naturally reference what you see. Pair this with screenshot capture and the AI gets both your narration and the visual.

"Study this first"

Before making a big change, tell the AI to study the existing system before doing anything. This front-loads understanding and prevents the AI from making assumptions:

Voice

I want to rework how we handle subscriptions and billing. Study the payments system first — look at the Stripe integration, the webhook handlers, and how we track subscription status in the database. Then come back and tell me what you found before we start making changes.

This is a prompt most people would never type because it's 50+ words of setup before you even get to the request. But it's the kind of prompt that saves hours of debugging later because the AI actually understands what it's working with.

Invoking skills and tools

If your AI tool supports slash commands, skills, or custom tools, voice is a natural way to invoke them with context:

Voice

Use the frontend-design skill to redesign our settings page. Right now it's a single long scrolling form and it feels overwhelming. I want it broken into tabs or sections, with a clean visual hierarchy. Keep it minimal — no icons, no color coding, just good typography and spacing.

This combines a tool invocation with taste, context, and constraints — all in one breath. Typing this out would feel like writing a ticket. Saying it feels like telling a coworker what you want.

When to use voice vs. typing

Voice input isn't better for everything. Here's when each shines:

Voice is better for:

First prompts that need context and explanation
Describing bugs, requirements, or desired behavior
Brainstorming and thinking out loud
Long-form writing (Slack messages, PRDs, code review comments)
Anything you'd explain faster than you'd type

Typing is better for:

Short follow-ups ("yes," "try a different approach," "also add a test")
Precise code snippets or syntax
Copy-pasting existing content
Situations where you can't speak (meetings, libraries, shared offices)

Most developers who adopt voice input end up using a mix: dictate the meaty prompts, type the quick ones.

Getting started

If you want to try voice-driven AI prompting:

Download Doing — free trial of 100 transcriptions, no account required
Pick a hotkey — fn for Mac keyboards, Option+Space for external keyboards
Enable YOLO Mode if you use terminal-based agents (Claude Code, Codex) — it auto-submits after pasting
Start with one workflow — pick whatever AI tool you use most and commit to voice-prompting it for a day

The shift feels awkward for about 30 minutes. After that, typing prompts starts to feel like the slow way.

doing.

100 free transcriptions · $49 once · No subscription

Learn more →

FAQ

Do AI tools actually understand spoken prompts?

Yes. Large language models like Claude, GPT, and Gemini are trained on natural language. They handle filler words, restarts, and conversational phrasing without issue. In practice, a rambling spoken prompt with full context outperforms a terse typed prompt almost every time.

Should I clean up my prompts before sending them?

For AI coding tools (Claude Code, Cursor, Codex), no — raw speech works well and preserves context. For human-facing output (Slack, email), yes — use a post-processing Skill to clean up filler words and tone. Doing lets you set this per app so it happens automatically.

How much faster is voice than typing for prompts?

Most people speak at 130-150 words per minute and type at 40-60. For a prompt with real context (50-100 words), voice takes about 30 seconds while typing takes 1-2 minutes. The gap widens for longer prompts and narrows for short follow-ups.