DeepSeek-V3.1: A Friendly, No-Jargon Guide for First-Time Users
Written by an Engineer Who Still Reads Manuals First
If you have ever unboxed a new laptop and reached for the quick-start card before pressing the power button, treat this article the same way.
Below you will find nothing more—and nothing less—than the official DeepSeek-V3.1 documentation, rewritten in plain English for curious readers who have at least a junior-college background but do not live inside research papers.
1. What Exactly Is DeepSeek-V3.1?
DeepSeek-V3.1 is one neural network that can behave like two different assistants:
-
Non-Thinking Mode – gives quick, direct answers (think of a helpful customer-service rep). -
Thinking Mode – shows its scratch-work before answering (think of a student who hands in both the exam and the draft paper).
You switch between the two by changing a short text template—no extra downloads, no extra fees.
2. New Pricing (Effective September 6, 2025, 00:00 Beijing Time)
Usage Event | Cost per Million Tokens |
---|---|
Input with cache hit | 0.5 CNY |
Input with cache miss | 4 CNY |
Output (any case) | 12 CNY |
Quick mental math
-
A 3,000-character Chinese article ≈ 4 k tokens. -
250 such articles cost about 1 CNY to send and 3 CNY to receive. -
When the cache hits, the input cost drops to roughly one-eighth.
3. Performance at a Glance
The table below condenses the official benchmark sheet so you can see where the model shines without decoding acronyms.
Task Area | Benchmark Example | Non-Thinking | Thinking | Prior V3 | R1-0528 |
---|---|---|---|---|---|
General knowledge | MMLU-Redux (%) | 91.8 | 93.7 | 90.5 | 93.4 |
Graduate-level reasoning | GPQA-Diamond (%) | 74.9 | 80.1 | 68.4 | 81.0 |
Math competition | AIME 2024 (%) | 66.3 | 93.1 | 59.4 | 91.4 |
Code generation | LiveCodeBench (%) | 56.4 | 74.8 | 43.0 | 73.3 |
Software-engineering agent | SWE Verified (%) | 66.0 | — | 45.4 | 44.6 |
Web-search agent | BrowseComp (%) | — | 30.0 | — | 8.9 |
“—” means the test was not run or not applicable.
Take-away: If your task involves multi-step reasoning, code, or advanced math, switch to Thinking mode; the jump is often 20–30 percentage points.
4. How the Two Modes Work
DeepSeek-V3.1 reads special tokens that act like stage directions.
4.1 Non-Thinking Mode Templates
Single-turn conversation
<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>
The extra </think>
token tells the model, “Skip the scratch-work.”
Multi-turn conversation
<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>...<|User|>{query}<|Assistant|></think>{response}
Each time the user adds a new query, repeat the entire history and append the same closing tag.
4.2 Thinking Mode Templates
Single-turn
<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|><think>
The <think>
token invites the model to draft its reasoning first.
Multi-turn
Same history structure as above, but the final turn uses <think>
instead of </think>
.
5. Calling External Tools
DeepSeek-V3.1 can “pick up the phone” and call functions you provide.
5.1 ToolCall (Non-Thinking Mode Only)
-
Describe the tool inside the system prompt:
## Tools
You have access to the following tools:
### get_weather
Description: Returns current weather
Parameters: {"location": {"type": "string"}}
-
The model answers with a precise syntax:
<|tool▁calls▁begin|><|tool▁call▁begin|>get_weather<|tool▁sep|>{"location": "Beijing"}<|tool▁call▁end|><|tool▁calls▁end|>
-
Your code parses the JSON, runs the function, and feeds the result back into the chat history.
5.2 Search-Agent (Thinking Mode)
For tasks that require multiple searches—e.g., “Compare the 2025 GDP forecasts from three sources”—use the Search-Agent flow:
-
Model proposes a search query. -
You run the search and return the top snippets. -
Model reviews, refines the query, or writes the final answer.
Example trajectories are provided in
-
assets/search_tool_trajectory.html
-
assets/search_python_tool_trajectory.html
6. Running the Model Locally
The model architecture is identical to DeepSeek-V3; any script that runs V3 will run V3.1 unchanged.
6.1 Hardware Checklist
Precision | VRAM Needed | Notes |
---|---|---|
FP8 (official) | 80 GB | Highest speed |
BF16 | 160 GB | Fallback if FP8 not supported |
8-bit quantized | 48 GB | Community scripts via bitsandbytes |
4-bit quantized | 24 GB | Experimental; quality may drop |
6.2 Minimal Python Example
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "deepseek-ai/DeepSeek-V3.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum tunneling like I'm five."}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
thinking=True,
add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
7. Frequently Asked Questions (FAQ)
Q1: Can I download only the weights and skip the transformers library?
Yes, but you must parse tokenizer_config.json
and implement the inference loop yourself. Not recommended for beginners.
Q2: Does Thinking mode cost more?
Yes. You are billed by the token, and the scratch-work adds tokens. Whether the extra cost is justified depends on the task—math proofs and multi-file code edits usually see large accuracy gains.
Q3: How does the cache hit work?
The platform hashes the exact prompt (including spaces and line breaks). Any change creates a new hash.
Q4: Is a commercial license required?
No. Weights are released under the MIT License. You may use, modify, or redistribute them commercially as long as you keep the license file.
Q5: Are there community channels?
Yes. The official repo footer contains Discord and WeChat QR codes.
8. Quick Reference Card
What You Need | Where to Find It |
---|---|
Model weights | Hugging Face repo |
Source code | DeepSeek-V3 GitHub |
License | MIT, included in the repo |
Contact email | service@deepseek.com |
9. One-Page Decision Tree
Task type?
├─ Simple Q&A → Non-Thinking mode
├─ Multi-step math / code → Thinking mode
├─ Needs web search → Thinking mode + Search-Agent
└─ Needs external API → Non-Thinking mode + ToolCall
10. Final Word
DeepSeek-V3.1 is not magic. It is a well-documented, openly licensed tool that lets you choose between speed and depth without juggling two separate models.
If you have read this far, the next step is concrete: head to the Hugging Face page, download the files, and run the five-line Python snippet above.
The best way to understand a new model is to make it say “Hello, world.”