Doing Skills: AI Post-Processing for Voice Transcription
You dictate a Slack message and it comes out littered with "um," "like," and half-finished sentences. You could clean it up manually. Or you could let a Skill do it automatically before the text ever hits the clipboard.
Skills are Doing's system for AI post-processing of voice transcriptions. They're simple markdown files containing LLM prompts that transform your raw speech into polished text — automatically, per app, with no extra steps.
What Skills do
A Skill sits between your transcription and your clipboard. After Doing converts your speech to text, a Skill runs that text through an AI model with a specific instruction: remove filler words, rewrite for a professional tone, condense to a summary, or whatever you define.
The flow looks like this:
- You speak into Doing
- The transcription engine converts speech to text
- A Skill sends that text to an LLM with a system prompt
- The processed result is what gets pasted
Without Skills, you get raw transcription — accurate, but exactly what you said, ums and all. With Skills, you get text that's ready to use.
Built-in Skills
Doing ships with several Skills out of the box:
| Skill | What it does |
|---|---|
| Cleanup | Removes filler words (um, uh, like, you know) and fixes false starts. Your ideas, minus the verbal tics. |
| Formalize | Rewrites in a professional, polished tone. Good for emails and documentation. |
| Summarize | Condenses to 2–3 sentences. Useful for turning a rambling voice note into a quick summary. |
| Emoji | Converts your text to emoji only. Mostly for fun. |
The practical ones are Cleanup, Formalize, and Summarize. Emoji is there to show what's possible — and because it's fun to demo.
Skill chaining
Skills can be chained together. When you assign multiple Skills to an app, the output of the first becomes the input to the next.
For example, Cleanup → Formalize first strips filler words, then rewrites the cleaned text in a professional tone. The order matters — formalizing raw speech with all the filler still in it produces a different (worse) result than formalizing already-cleaned text.
You configure chains as comma-separated lists in your per-app settings. More on that below.
Setting up Skills per app
The most useful thing about Skills is per-app automation. You probably want different processing for different contexts — formal cleanup for email, basic cleanup for Slack, nothing for quick notes.
In Doing settings, you can map Skills to specific applications:
| App | Skill chain | Why |
|---|---|---|
| Slack | Cleanup | Strip filler words, keep your natural voice |
| Cleanup, Formalize | Clean up, then polish for professional tone | |
| Google Docs | Cleanup | Light touch — you'll edit anyway |
| Obsidian | (none) | Raw transcription for personal notes |
| All Apps (default) | Cleanup | Sensible fallback for everything else |
The All Apps default catches anything without a specific mapping. If you set it to Cleanup, every app gets filler word removal unless you override it.
Tip
I keep Cleanup as my All Apps default and only override for email (where I add Formalize) and personal notes (where I want raw text). Start simple — you can always add more specific mappings later.
Writing your own Skills
This is where it gets interesting. Skills are just markdown files with a YAML header, stored in a folder on your Mac. No code, no plugins, no build step.
Doing's Skill format is based on the same SKILL.md format used by Claude Code and other AI tools. If you've written a custom skill for Claude Code, the structure will feel familiar: YAML frontmatter for metadata, markdown body for the prompt. Doing uses a simplified version focused on voice transcription — you define a name, an optional description, and the LLM instructions. No tool permissions or subagent config needed.
The file format
Each Skill lives in its own folder inside ~/.config/doing/skills/:
~/.config/doing/skills/
├── cleanup/
│ └── SKILL.md
├── formalize/
│ └── SKILL.md
├── email_draft/
│ └── SKILL.md
└── meeting_notes/
└── SKILL.md
A SKILL.md file has two parts: YAML frontmatter with the name and description, and a body that becomes the LLM system prompt.
---
name: Email Draft
description: Format transcription as a professional email
---
Convert the following transcription into a well-structured
professional email.
Include:
- A clear subject line at the top
- Organized body paragraphs
- Professional closing
Keep the original meaning and key details intact.
Output only the email text, no preamble or explanation.
That's it. The name field is required. The description is optional but shows up in the settings UI. Everything after the second --- is the prompt that gets sent to the AI model, with your transcription as the user message.
Tips for writing good Skill prompts
Be specific about output format. If you want just the processed text with no chatbot-style preamble ("Here's your cleaned up text:"), say so explicitly. "Output only the result, no preamble or explanation" works well.
Keep prompts focused. A Skill that tries to do too many things at once produces inconsistent results. Better to chain two focused Skills than write one that cleans up and formalizes and adds formatting.
Test with real transcriptions. What sounds clear to type often comes out differently when spoken. Test your prompts with actual voice input, not typed sentences.
Example: a meeting notes Skill
---
name: Meeting Notes
description: Structure transcription as meeting notes
---
Format the following transcription as structured meeting notes.
Use this format:
- Date and participants (if mentioned)
- Key decisions (bulleted)
- Action items (bulleted, with owners if mentioned)
- Open questions
Be concise. Drop filler and repetition.
Output only the formatted notes.
Example: a commit message Skill
---
name: Commit Message
description: Turn description into a git commit message
---
Convert the following into a conventional git commit message.
Use the format: type(scope): description
Where type is one of: feat, fix, docs, refactor, test, chore.
Keep the subject line under 72 characters.
Add a body paragraph only if the change needs explanation.
Output only the commit message.
Once you create the file, enable the Skill in Doing settings and it's immediately available.
Choosing an LLM provider
Skills need an AI model to run. Doing supports several backends:
| Provider | Model | Cost | Notes |
|---|---|---|---|
| Apple on-device | Apple Foundation Model (~3B) | Free | macOS 26+, Apple Silicon only. ~2000 character limit. No data leaves your Mac. |
| OpenAI | gpt-4o-mini | ~$0.006/min | Fast, good quality. Requires API key. |
| Google Gemini | gemini-2.0-flash | ~$0.006/min | Similar quality and cost to OpenAI. |
| AssemblyAI | Claude (via gateway) | Per usage | Uses Anthropic's Claude models. |
In Auto mode, Doing picks the best available provider — it'll prefer a cloud provider if you have an API key configured, and fall back to Apple's on-device model if not.
Tip
The Apple on-device model is surprisingly capable for simple Skills like Cleanup and Summarize, and it's completely free and private. Try it before paying for an API key — it might be all you need.
For longer transcriptions or complex prompts (like the meeting notes example), a cloud provider will give better results. The on-device model's ~2000 character context window means it can struggle with longer input.
Tips from daily use
Start with Cleanup as your default. It's the highest-impact, lowest-risk Skill. Filler word removal makes almost every transcription better, and it never changes the meaning of what you said.
Don't over-process. It's tempting to chain three Skills together for every app. In practice, simpler chains produce more predictable results. Cleanup alone handles 80% of what most people need.
Use Formalize selectively. Formalizing everything makes your communication sound stiff. Save it for contexts where tone matters — client emails, documentation, PRDs — and let your natural voice come through everywhere else.
Keep custom Skills simple. The best Skills have a single, clear job. A five-line prompt that does one thing well beats a fifty-line prompt that tries to handle every edge case.
Raw mode is fine for notes. Not everything needs processing. For personal notes, journals, and brainstorming, raw transcription captures your thinking better than a cleaned-up version.
FAQ
Do Skills work offline?
Only with Apple's on-device model. If you're using OpenAI, Gemini, or AssemblyAI as your Skills provider, you need an internet connection. The transcription itself can still happen offline (depending on your transcription engine), but Skill processing requires the LLM backend.
Can I use Skills without writing markdown files?
Yes — the built-in Skills work out of the box with no file editing. Just enable them in settings and assign them to apps. Custom SKILL.md files are only needed if you want to create your own.
What happens if the Skill fails?
Doing falls back to your raw transcription. If the LLM provider is down, times out, or the input exceeds the context window, you still get your text — just without the Skill processing. It never blocks your workflow.
Can I share Skills with my team?
A Skill is just a folder with a markdown file. Copy the folder, send it to a coworker, drop it in their ~/.config/doing/skills/ directory — done. No install, no package manager, no accounts.
Do Skills see my previous transcriptions?
No. Each Skill invocation is stateless. The LLM receives only the system prompt (your Skill) and the current transcription as input. No conversation history, no memory between sessions.
Try Skills for free
Skills are included with Doing — no separate purchase or add-on required. The first 100 transcriptions are free with no account needed, and that includes Skill processing.
If you're already using Doing for transcription, open Settings and enable a Skill to try it. If you're new, grab the app and start with Cleanup as your All Apps default. You'll wonder how you ever tolerated pasting raw transcriptions.