Models

Large language models are the most powerful AI tools available today — they serve as the engine that powers agent execution.

Models are the reasoning engine of agents. They drive the agent's decision-making process, determining which tools to call, how to interpret results, and when to provide the final answer. DeepSeek, with its openness, innovation, and excellent cost-efficiency, has become one of the preferred LLMs for agent development.

Creating a Model

Everything starts with createModel(). You only need to specify the model name to create a model instance — the API key is automatically read from the DEEPSEEK_API_KEY environment variable:

import { createModel } from 'deepseek-kit'

const model = createModel({
  model: 'deepseek-v4-flash',
})

If you need to explicitly pass an API key or use a custom endpoint, you can configure apiKey and baseURL:

const model = createModel({
  model: 'deepseek-v4-flash',
  apiKey: 'your-api-key',
  baseURL: 'https://api.deepseek.com',
})

Sending Requests

Once you've created a model, you can use invoke() to send a complete chat completion request and get the model's full response:

const completion = await model.invoke({
  messages: [
    { role: 'user', content: 'Hello!' },
  ],
})

console.log(completion.choices[0].message.content)

If you want to receive the model's output in real time, you can use invokeStream() for streaming requests:

for await (const chunk of model.invokeStream({
  messages: [{ role: 'user', content: 'Hello!' }],
})) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content)
  }
}

Enabling Thinking Mode

DeepSeek models have thinking mode enabled by default. The model performs deep reasoning before answering, which is ideal for handling complex problems. The default reasoning effort is 'high'. You can disable thinking mode or adjust the reasoning effort as needed:

// Default configuration (thinking mode enabled)
const model = createModel({
  model: 'deepseek-v4-flash',
})

// Disable thinking mode
const model = createModel({
  model: 'deepseek-v4-flash',
  thinking: { type: 'disabled' },
})

// Adjust reasoning effort
const model = createModel({
  model: 'deepseek-v4-flash',
  reasoningEffort: 'max',
})

Cloning Model Configuration

When you need to create instances with different configurations based on the same model, you can use withConfig() to avoid repeated initialization:

const flashModel = createModel({ model: 'deepseek-v4-flash' })
const proModel = flashModel.withConfig({ model: 'deepseek-v4-pro' })

withConfig() merges the new configuration into the current instance and returns a new model instance — the original instance is not affected.

API Reference

Parameters

modelrequiredModel

Model identifier. Supports deepseek-v4-flash, deepseek-v4-pro, or a custom string.

apiKeystring

DEEPSEEK_API_KEY env variable

DeepSeek API key.

baseURLstring

https://api.deepseek.com

API base URL.

userIdstring

Optional user identifier.

thinking{ type: 'enabled' | 'disabled' }

Enable/disable thinking mode.

reasoningEffort'high' | 'max'

Reasoning effort level.

maxTokensnumber

Maximum number of tokens to generate.

temperaturenumber

Sampling temperature (0-2).

topPnumber

Nucleus sampling parameter.

streamOptions{ include_usage: boolean }

Streaming options.

timeoutnumber

60000

Request timeout in milliseconds.

maxRetriesnumber

3

Maximum retry count for 429/500/503 errors.

strictboolean

false

Enable strict mode for all tool calls. When enabled, the SDK automatically uses the Beta endpoint and passes strict: true for all tools in requests. See Strict Mode for details.