Models

Large language models are the most powerful AI tools available today — they serve as the engine that powers agent execution.

Models are the reasoning engine of agents. They drive the agent's decision-making process, determining which tools to call, how to interpret results, and when to provide the final answer. DeepSeek, with its openness, innovation, and excellent cost-efficiency, has become one of the preferred LLMs for agent development.

Creating a Model

Everything starts with createModel(). You only need to specify the model name to create a model instance — the API key is automatically read from the DEEPSEEK_API_KEY environment variable:

import { createModel } from 'deepseek-kit'

const model = createModel({
  model: 'deepseek-v4-flash',
})

If you need to explicitly pass an API key or use a custom endpoint, you can configure apiKey and baseURL:

const model = createModel({
  model: 'deepseek-v4-flash',
  apiKey: 'your-api-key',
  baseURL: 'https://api.deepseek.com',
})

Sending Requests

Once you've created a model, you can use invoke() to send a complete chat completion request and get the model's full response:

const completion = await model.invoke({
  messages: [
    { role: 'user', content: 'Hello!' },
  ],
})

console.log(completion.choices[0].message.content)

If you want to receive the model's output in real time, you can use invokeStream() for streaming requests:

for await (const chunk of model.invokeStream({
  messages: [{ role: 'user', content: 'Hello!' }],
})) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content)
  }
}

Enabling Thinking Mode

DeepSeek models have thinking mode enabled by default. The model performs deep reasoning before answering, which is ideal for handling complex problems. The default reasoning effort is 'high'. You can disable thinking mode or adjust the reasoning effort as needed:

// Default configuration (thinking mode enabled)
const model = createModel({
  model: 'deepseek-v4-flash',
})

// Disable thinking mode
const model = createModel({
  model: 'deepseek-v4-flash',
  thinking: { type: 'disabled' },
})

// Adjust reasoning effort
const model = createModel({
  model: 'deepseek-v4-flash',
  reasoningEffort: 'max',
})

Cloning Model Configuration

When you need to create instances with different configurations based on the same model, you can use withConfig() to avoid repeated initialization:

const flashModel = createModel({ model: 'deepseek-v4-flash' })
const proModel = flashModel.withConfig({ model: 'deepseek-v4-pro' })

withConfig() merges the new configuration into the current instance and returns a new model instance — the original instance is not affected.

API Reference

Parameters

modelrequiredModel
Model identifier. Supports deepseek-v4-flash, deepseek-v4-pro, or a custom string.
apiKeystring
DEEPSEEK_API_KEY env variable
DeepSeek API key.
baseURLstring
https://api.deepseek.com
API base URL.
userIdstring
Optional user identifier.
thinking{ type: 'enabled' | 'disabled' }
Enable/disable thinking mode.
reasoningEffort'high' | 'max'
Reasoning effort level.
maxTokensnumber
Maximum number of tokens to generate.
temperaturenumber
Sampling temperature (0-2).
topPnumber
Nucleus sampling parameter.
streamOptions{ include_usage: boolean }
Streaming options.
timeoutnumber
60000
Request timeout in milliseconds.
maxRetriesnumber
3
Maximum retry count for 429/500/503 errors.

Methods

invoke(params: InvokeParams)Promise<ChatCompletion>
Send a chat completion request and return the full response.
invokeStream(params: InvokeParams)AsyncGenerator<ChatCompletionChunk>
Streaming chat completion, returning an async generator of chunks.
fim(params: FIMParams)Promise<FIMResponse>
Fill-in-the-Middle code completion.
list()Promise<ListModelsResponse>
Get the list of available models.
balance()Promise<UserBalanceResponse>
Query account balance.
withConfig(options: Partial<ModelOptions>)DeepSeekModel
Create a model instance with merged configuration.