Subagents are independent agents called as tools by a main agent — they have their own model, prompt, and tools, executing autonomously in an isolated context before returning results.
When a single agent's task is too complex, or you need to isolate different capabilities to specialized agents, subagents come into play. The core idea of subagents is simple: encapsulate one agent as a tool for another agent to call. The main agent handles orchestration and decision-making, while the subagent executes autonomously in an isolated context window, ultimately returning only a refined result. This pattern can handle tasks that require extensive context exploration while keeping the main agent's context clean and focused.
The subagent implementation pattern is straightforward: define an agent, then wrap it as a tool for the main agent to call:
import { createAgent, createModel, tool } from 'deepseek-kit'
import { z } from 'zod'
const model = createModel({ model: 'deepseek-v4-flash' })
const researchAgent = createAgent({
model,
system: 'You are a research assistant. Investigate the task thoroughly and clearly summarize your findings in your final response.',
tools: [searchTool, readFileTool],
})
const researchTool = tool({
name: 'research',
description: 'Research a topic or question in depth',
schema: z.object({
task: z.string().describe('The task to research'),
}),
execute: async (input) => {
const result = await researchAgent.generate({
prompt: input.task,
})
return result.text
},
})
const mainAgent = createAgent({
model,
system: 'You are an assistant that can delegate research tasks to a specialized research agent.',
tools: [researchTool],
})
const result = await mainAgent.generate({
messages: [{ role: 'user', content: 'Help me research the impact of quantum computing on cryptography.' }],
})
console.log(result.text)
Execution flow:
The main agent receives the user request and determines that in-depth research is needed
The main agent calls the research tool, passing the research task
The research agent executes autonomously in an isolated context (possibly calling search and reading tools multiple times)
The research agent returns a summarized result
The main agent generates the final response based on the research result
The core advantage of subagents is context isolation. Each time a subagent is called, it starts a fresh context window — it does not inherit the main agent's conversation history. This means:
Subagents can use hundreds of thousands of tokens for exploration without polluting the main agent's context
The main agent only receives the refined result returned by the subagent
A subagent might consume 100,000 tokens for exploration and reasoning, but the main agent only consumes the returned summary text — perhaps just 1,000 tokens.
By default, subagents don't inherit the main agent's conversation history. If you need the subagent to be aware of previous conversation context, you can manually pass messages:
Passing the full conversation history diminishes the benefits of context isolation. Use this judiciously — only pass history when the subagent genuinely needs to understand the conversation background.
When multiple research tasks are independent of each other, you can launch multiple subagents in parallel within a single tool, significantly reducing total latency:
import { createAgent, createModel, tool } from 'deepseek-kit'
import { z } from 'zod'
const model = createModel({ model: 'deepseek-v4-flash' })
const researchAgent = createAgent({
model,
system: 'You are a research assistant. Investigate the task thoroughly and clearly summarize your findings in your final response.',
tools: [searchTool],
})
const compareTool = tool({
name: 'compareTopics',
description: 'Research multiple topics in parallel and return a comparison',
schema: z.object({
topics: z.array(z.string()).describe('List of topics to compare'),
}),
execute: async (input) => {
const results = await Promise.all(
input.topics.map(topic =>
researchAgent.generate({ prompt: `Please research the following topic in depth and summarize: ${topic}` }),
),
)
return results.map((r, i) => `## ${input.topics[i]}\n${r.text}`).join('\n\n')
},
})
const mainAgent = createAgent({
model,
system: 'You are an assistant that can research multiple topics in parallel and perform comparative analysis.',
tools: [compareTool],
})
const result = await mainAgent.generate({
messages: [{ role: 'user', content: 'Compare Python, JavaScript, and Rust for web development.' }],
})
In parallel mode, three subagents execute simultaneously — the total latency depends on the slowest one, not the sum of all three.
The subagent's execution process is part of the main agent's tool call. In streaming output, you can observe subagents being triggered in tool-call events:
const stream = mainAgent.stream({
prompt: 'Help me research the application prospects of quantum computing.',
})
for await (const event of stream) {
switch (event.type) {
case 'text-delta':
process.stdout.write(event.textDelta)
break
case 'tool-call':
console.log(`\nCalling subagent: ${event.toolCalls.map(t => t.function.name).join(', ')}`)
break
case 'finish':
console.log('\nDone!')
break
}
}
Note: The subagent's internal execution process (including its tool calls and text generation) is not pushed as stream events to the main agent. The main agent only receives the final result after the subagent completes.
A subagent's system prompt should explicitly instruct it to output a clear summary upon completion. Because the main agent can only see the subagent's return value, if the subagent only returns "Done," the main agent won't have useful information:
const researchAgent = createAgent({
model,
system: `You are a research assistant. Complete the task autonomously.
Important: After completing your research, clearly summarize your findings as your final response.
This summary will be returned to the main agent, so please include all relevant information.`,
tools: [searchTool, readFileTool],
})
Context Isolation — Each subagent call starts with a fresh context and doesn't inherit the main agent's conversation history. If you need to pass context, manually provide it via the messages parameter
Latency Stacking — Subagent execution time stacks onto the main agent's total latency. For simple tasks, handling them directly in the main agent is more efficient
Error Propagation — Uncaught errors in subagents are returned to the main agent as tool execution failures and don't directly interrupt the main agent's loop
Streaming Limitations — Subagent internal stream events don't propagate to the main agent's stream. The main agent can only get results after the subagent completes