doing.
All guides
7 min read·By Brian Ellin

Doing Skills: AI Post-Processing for Voice Transcription

You dictate a Slack message and it comes out littered with "um," "like," and half-finished sentences. You could clean it up manually. Or you could let a Skill do it automatically before the text ever hits the clipboard.

Skills are Doing's system for AI post-processing of voice transcriptions. They're simple markdown files containing LLM prompts that transform your raw speech into polished text — automatically, per app, with no extra steps.

What Skills do

A Skill sits between your transcription and your clipboard. After Doing converts your speech to text, a Skill runs that text through an AI model with a specific instruction: remove filler words, rewrite for a professional tone, condense to a summary, or whatever you define.

The flow looks like this:

  1. You speak into Doing
  2. The transcription engine converts speech to text
  3. A Skill sends that text to an LLM with a system prompt
  4. The processed result is what gets pasted

Without Skills, you get raw transcription — accurate, but exactly what you said, ums and all. With Skills, you get text that's ready to use.

Built-in Skills

Doing ships with several Skills out of the box:

SkillWhat it does
CleanupRemoves filler words (um, uh, like, you know) and fixes false starts. Your ideas, minus the verbal tics.
FormalizeRewrites in a professional, polished tone. Good for emails and documentation.
SummarizeCondenses to 2–3 sentences. Useful for turning a rambling voice note into a quick summary.
EmojiConverts your text to emoji only. Mostly for fun.

The practical ones are Cleanup, Formalize, and Summarize. Emoji is there to show what's possible — and because it's fun to demo.

Skill chaining

Skills can be chained together. When you assign multiple Skills to an app, the output of the first becomes the input to the next.

For example, Cleanup → Formalize first strips filler words, then rewrites the cleaned text in a professional tone. The order matters — formalizing raw speech with all the filler still in it produces a different (worse) result than formalizing already-cleaned text.

You configure chains as comma-separated lists in your per-app settings. More on that below.

Setting up Skills per app

The most useful thing about Skills is per-app automation. You probably want different processing for different contexts — formal cleanup for email, basic cleanup for Slack, nothing for quick notes.

In Doing settings, you can map Skills to specific applications:

AppSkill chainWhy
SlackCleanupStrip filler words, keep your natural voice
MailCleanup, FormalizeClean up, then polish for professional tone
Google DocsCleanupLight touch — you'll edit anyway
Obsidian(none)Raw transcription for personal notes
All Apps (default)CleanupSensible fallback for everything else

The All Apps default catches anything without a specific mapping. If you set it to Cleanup, every app gets filler word removal unless you override it.

Tip

I keep Cleanup as my All Apps default and only override for email (where I add Formalize) and personal notes (where I want raw text). Start simple — you can always add more specific mappings later.

Writing your own Skills

This is where it gets interesting. Skills are just markdown files with a YAML header, stored in a folder on your Mac. No code, no plugins, no build step.

Doing's Skill format is based on the same SKILL.md format used by Claude Code and other AI tools. If you've written a custom skill for Claude Code, the structure will feel familiar: YAML frontmatter for metadata, markdown body for the prompt. Doing uses a simplified version focused on voice transcription — you define a name, an optional description, and the LLM instructions. No tool permissions or subagent config needed.

The file format

Each Skill lives in its own folder inside ~/.config/doing/skills/:

~/.config/doing/skills/
├── cleanup/
│   └── SKILL.md
├── formalize/
│   └── SKILL.md
├── email_draft/
│   └── SKILL.md
└── meeting_notes/
  └── SKILL.md

A SKILL.md file has two parts: YAML frontmatter with the name and description, and a body that becomes the LLM system prompt.

---
name: Email Draft
description: Format transcription as a professional email
---
Convert the following transcription into a well-structured
professional email.

Include:
- A clear subject line at the top
- Organized body paragraphs
- Professional closing

Keep the original meaning and key details intact.
Output only the email text, no preamble or explanation.

That's it. The name field is required. The description is optional but shows up in the settings UI. Everything after the second --- is the prompt that gets sent to the AI model, with your transcription as the user message.

Tips for writing good Skill prompts

Be specific about output format. If you want just the processed text with no chatbot-style preamble ("Here's your cleaned up text:"), say so explicitly. "Output only the result, no preamble or explanation" works well.

Keep prompts focused. A Skill that tries to do too many things at once produces inconsistent results. Better to chain two focused Skills than write one that cleans up and formalizes and adds formatting.

Test with real transcriptions. What sounds clear to type often comes out differently when spoken. Test your prompts with actual voice input, not typed sentences.

Example: a meeting notes Skill

---
name: Meeting Notes
description: Structure transcription as meeting notes
---
Format the following transcription as structured meeting notes.

Use this format:
- Date and participants (if mentioned)
- Key decisions (bulleted)
- Action items (bulleted, with owners if mentioned)
- Open questions

Be concise. Drop filler and repetition.
Output only the formatted notes.

Example: a commit message Skill

---
name: Commit Message
description: Turn description into a git commit message
---
Convert the following into a conventional git commit message.

Use the format: type(scope): description

Where type is one of: feat, fix, docs, refactor, test, chore.
Keep the subject line under 72 characters.
Add a body paragraph only if the change needs explanation.
Output only the commit message.

Once you create the file, enable the Skill in Doing settings and it's immediately available.

Choosing an LLM provider

Skills need an AI model to run. Doing supports several backends:

ProviderModelCostNotes
Apple on-deviceApple Foundation Model (~3B)FreemacOS 26+, Apple Silicon only. ~2000 character limit. No data leaves your Mac.
OpenAIgpt-4o-mini~$0.006/minFast, good quality. Requires API key.
Google Geminigemini-2.0-flash~$0.006/minSimilar quality and cost to OpenAI.
AssemblyAIClaude (via gateway)Per usageUses Anthropic's Claude models.

In Auto mode, Doing picks the best available provider — it'll prefer a cloud provider if you have an API key configured, and fall back to Apple's on-device model if not.

Tip

The Apple on-device model is surprisingly capable for simple Skills like Cleanup and Summarize, and it's completely free and private. Try it before paying for an API key — it might be all you need.

For longer transcriptions or complex prompts (like the meeting notes example), a cloud provider will give better results. The on-device model's ~2000 character context window means it can struggle with longer input.

Tips from daily use

Start with Cleanup as your default. It's the highest-impact, lowest-risk Skill. Filler word removal makes almost every transcription better, and it never changes the meaning of what you said.

Don't over-process. It's tempting to chain three Skills together for every app. In practice, simpler chains produce more predictable results. Cleanup alone handles 80% of what most people need.

Use Formalize selectively. Formalizing everything makes your communication sound stiff. Save it for contexts where tone matters — client emails, documentation, PRDs — and let your natural voice come through everywhere else.

Keep custom Skills simple. The best Skills have a single, clear job. A five-line prompt that does one thing well beats a fifty-line prompt that tries to handle every edge case.

Raw mode is fine for notes. Not everything needs processing. For personal notes, journals, and brainstorming, raw transcription captures your thinking better than a cleaned-up version.


FAQ

Do Skills work offline?

Only with Apple's on-device model. If you're using OpenAI, Gemini, or AssemblyAI as your Skills provider, you need an internet connection. The transcription itself can still happen offline (depending on your transcription engine), but Skill processing requires the LLM backend.

Can I use Skills without writing markdown files?

Yes — the built-in Skills work out of the box with no file editing. Just enable them in settings and assign them to apps. Custom SKILL.md files are only needed if you want to create your own.

What happens if the Skill fails?

Doing falls back to your raw transcription. If the LLM provider is down, times out, or the input exceeds the context window, you still get your text — just without the Skill processing. It never blocks your workflow.

Can I share Skills with my team?

A Skill is just a folder with a markdown file. Copy the folder, send it to a coworker, drop it in their ~/.config/doing/skills/ directory — done. No install, no package manager, no accounts.

Do Skills see my previous transcriptions?

No. Each Skill invocation is stateless. The LLM receives only the system prompt (your Skill) and the current transcription as input. No conversation history, no memory between sessions.


Try Skills for free

Skills are included with Doing — no separate purchase or add-on required. The first 100 transcriptions are free with no account needed, and that includes Skill processing.

If you're already using Doing for transcription, open Settings and enable a Skill to try it. If you're new, grab the app and start with Cleanup as your All Apps default. You'll wonder how you ever tolerated pasting raw transcriptions.

doing.

Fast voice transcription for Mac

Developer-tuned dictation with AI post-processing, transcript history, and everything running locally. No subscription.

100 free transcriptions · macOS only · $49 once
doing. Voice transcription for Mac