Set up a Claude DOCX sub-agent and reduce cost by 50%
Or how to set up a subagent in Claude Code and Cowork.
- A subagent has its own context and can use a different model/effort. If a task doesn't require an advanced reasoning, you may be able to reduce token usage by using a subagent.
- Our law firm uses Opus 4.8 (effort: max) only for reviewing and offloads the docx interaction to Sonnet 4.6 (effort: medium). This has reduced the cost by 5x regarding the docx interaction.
- Opus 4.8 (effort: max) only sees plain-text content and only does review. Anecdotally, the quality of the output seems higher compared to the previous setting where Opus's context was littered with the docx details.
- To use a subagent, we've shown that you only need a few simple sentences in your prompt.
- While you can utilize a subagent with the Anthropic's docx skill, our law firm has developed a Claude Cowork/Code docx plugin (free, btw) that uses 2-5x fewer tokens.
- The main reason is that our docx plugin supports bidirectional translation between HTML <-> docx. Claude would just work on simple HTML instead of the docx format, which is famously bloated and unnecessarily complex. Our plugin is already set up to use a subagent. You don't really need an extra prompt.
- Lastly, we've shown how to structure your plugin to ensure Claude uses your subagent properly.
I was reading about sub-agents in Claude Cowork the other day. One detail that struck me was about how a sub-agent had its own context and could use a different model.
Our law firm's setup today is using Claude Cowork to review a contract, redline, and comment on the docx file directly. We use Opus with the max effort. It is quite expensive.
With a sub-agent, I was thinking that it might be possible to use Opus (effort: max) for review and use Sonnet (effort: medium) to redline and comment on docx instead. Sonnet will additionally be responsible for ensuring the styles (e.g. bold, italic, bullet point setting) are appropriate.
Well, it turned out to be possible with small prompt changes!
There are 2 main benefits:
- The docx skill won't loaded into Opus (effort: max) at all because Opus (effort: max) is only used for review. This will drastically reduce the cost.
- Operating the docx skill and its MCP is relatively simple. This kind of tasks doesn't need Opus. Sonnet, which is cheaper, is doing fine here. This will reduce the cost quite drastically since Sonnet (effort: medium) is ~5x cheaper than Opus (effort: max).
With the above 2 main benefits, your work will have higher quality because you are reserving Opus only for the tasks required complex reasoning with no irrelevant noise (e.g. an instruction how to read docx). It'll cost you much less (if you use the API) or use much less quota (if you are on a Claude subscription)
Here's how the responsibilities are divided:

If you are using the Anthropic's docx skill, you can add the below simple prompt to your CLAUDE.md:
You MUST use a subagent with Sonnet (effort: medium) in order to interact with a docx file using the docx skill.
The subagent is NOT responsible for reviewing and understanding the content.
If you need the content of the docx file, you should ask the subagent to fetch the plain-text content for you.
After you review the plain-text content and want to edit, redline, and/or comment, you should ask the subagent to perform those operations accordingly.There are 2 issues with using the Anthropic's docx skill:
- Sometimes Claude simply ignores the instruction to use a subagent. It's difficult to know why it decides not to.
- The Anthropic's docx skill uses CLI, unpacks docx files using zip, and writes code. This is a complex task that sometimes fails.
Our team at LegalRabbit encountered Number 2 quite often in the past. So much so that we've developed our own docx engine for Claude.
Our claude plugin consumes 2-5x fewer tokens compared to the Anthropic's docx skill because it translates the bloated docx format into simple HTML. Claude would just operate on the simple HTML layer, which is much cheaper to do.
The plugin is here: https://github.com/LegalRabbit-AI/legalrabbit-docx-claude-plugin (it's free)
Now, even with our plugin, which contains the docx skill and the docx MCP, we still encounter an issue of Number 1.
Claude sometimes just decides to operate our skill and MCP within the plugin directly.
We've invented 2 mechanisms to ensure that Claude utilize the subagent and read the MCP instruction before using the MCP:
- Our MCP requires Claude to invoke the initialization tool with a designated password, and the password is defined in the MCP instruction. This means Claude needs to load the MCP instruction before operating the MCP. (see here)
- In the plugin's skill defined at
./skills/legalrabbit-docx/SKILL.md, it simply says that Claude should use our subagent defined./agents/legalrabbit-docx-subagent.md. Then, the MCP instruction is put into the agent's instruction instead. (see here)
It has been working out quite well so far!
One issue with Cowork
Claude Code seems to support subagents better in terms of visibility. The logs are shown properly when the hand-off between the main agent and the subagent occurs.
Meanwhile Cowork tries to show the logs too but the UI is buggy e.g. not streaming logs correctly. I have to look at the audit.jsonl directly to verify that Cowork uses our subagent correctly.