Hunyuan-MT: A 7-Billion-Parameter Translation Model That Outperforms Giants

“Can a 7-billion-parameter model really beat 200-billion-parameter giants at translation?”
“Is open-source finally good enough for Tibetan, Uyghur, Kazakh, and Mongolian?”
“How long does it take to get it running on my own GPU?”

If you have asked any of these questions, you are in the right place.
This post translates the official Hunyuan-MT technical report and README into plain English. Every figure, command, and benchmark comes straight from the released files—nothing added, nothing removed.

Quick overview

Item	Hunyuan-MT-7B	Hunyuan-MT-Chimera-7B
Size	7 B parameters	7 B parameters (fusion model)
Languages	33, incl. Chinese, English, Japanese, French, German, Korean, Tibetan, Uyghur, Kazakh, Mongolian	same
Training stages	5-stage pipeline (general pre-training → MT pre-training → SFT → RL → weak-to-strong RL)	built on top of the base model
Key achievement	first place in 30 of 31 WMT 2025 language pairs	first open-source translation fusion model
License	Apache-2.0 weights + code	same
Links	GitHub Hugging Face	same

Why another translation model?
Five training stages in everyday words
Data pipeline: from 1.3 T tokens to clean sentence pairs
Benchmarks: numbers and what they mean
Case studies: social-media slang, medical terms, place names
Step-by-step setup
Fine-tuning your own data with LLaMA-Factory
Production deployment: TensorRT-LLM, vLLM, or sglang
FAQ

1. Why another translation model? {#why-another-translation-model}

Machine translation has two stubborn problems:

Low-resource languages get poor coverage.
Proprietary APIs keep the best quality locked away.

Hunyuan-MT tries to solve both. It keeps the parameter count small (7 B) so you can run it on one consumer GPU, yet it outperforms much larger closed models on 33 languages, including Tibetan, Uyghur, Kazakh, and Mongolian.

2. Five training stages in everyday words {#five-training-stages}

Stage 1 – General pre-training

Goal: Teach the model general language understanding.
Data: 1.3 trillion tokens across 112 non-Chinese/English languages plus Chinese and English.
Cleanup:

A three-dimension quality model scores every document (knowledge value, authenticity, writing style).
Three tagging systems guarantee balanced topics: 24 industries × 24 themes.

Outcome: Hunyuan-7B-Base.

Stage 2 – Translation-oriented pre-training

Goal: Focus only on translation skills.
Data mix: monolingual (mC4, OSCAR) + parallel (OPUS, ParaCrawl).
Trick 1 – RegMix: run small-scale experiments to find the best mixing ratio instead of guessing.
Trick 2 – Replay buffer: 20 % of general data is re-inserted to avoid forgetting.

Outcome: Hunyuan-7B-Base★ (★ = after MT pre-training).

Stage 3 – Supervised Fine-Tuning (SFT)

Two data passes

Pass	Size	Source	Purpose
Stage I	3 million pairs	Flores-200, WMT tests, human-checked minority↔Chinese, synthetic data	teach basic translation
Stage II	268 k pairs	deep-cleaned, many-shot filtered, human-verified	polish quality

Prompt templates
Chinese⇄Other

把下面的文本翻译成<target_language>，不要额外解释。
<source_text>

Any other pair

Translate the following segment into <target_language>, without additional explanation.
<source_text>

Stage 4 – Reinforcement Learning (RL)

Algorithm: GRPO (Group Relative Policy Optimization).
Reward components

XCOMET-XXL – automatic metric close to human judgment
DeepSeek-V3 score – secondary semantic check
Terminology overlap – word-alignment-based for domain terms
Repetition penalty – stops the model from looping

Stage 5 – Weak-to-Strong RL (fusion)

Idea: at inference time, collect 6 candidate translations, then ask the same 7 B model to “refine” them into one.
Gain: extra 2–5 % XCOMET score without extra parameters.

3. Data pipeline: from 1.3 T tokens to clean sentence pairs {#data-pipeline}

Step	Tool	What it removes
Language ID	fastText	mis-classified documents
Near-duplicates	minLSH	duplicate pages
Perplexity filter	KenLM	low-quality or garbled text
Parallel-corpus quality	CometKiwi	bad alignments

4. Benchmarks: numbers and what they mean {#benchmarks}

Automatic metrics (higher is better)

Test set	Direction	Hunyuan-MT-7B	Qwen3-235B-A22B	Gemini-2.5-Pro
WMT24pp	English → XX	85.9	76.7	82.5
Mandarin ↔ Minority	Chinese ↔ Tibetan/Uyghur/Kazakh/Mongol	60.8	44.9	58.1
Flores-200	Chinese → XX	87.6	85.1	91.5

Human evaluation (0–4 scale)

Model	Chinese → English	English → Chinese	Average
Hunyuan-MT-7B	3.26	3.16	3.21
Gemini-2.5-Pro	3.23	3.22	3.22
Google Translate	2.84	2.10	2.47

5. Case studies {#case-studies}

Scenario	Input	Hunyuan-MT-7B	Google Translate
Social media slang	小红薯在国外疯魔了	REDnote has become incredibly popular abroad	sweet potatoes are popular abroad
English idiom	You are killing me!	你真要把我笑死了！	you are going to kill me
Medical terms	尿酸性肾结石	uric acid kidney stones	uricidal kidney stones
Place name	654 Huangpu Drive	黄埔大道654号	(kept in English)

6. Step-by-step setup {#step-by-step-setup}

Prerequisites

Python 3.9+
CUDA 11.8 or 12.x
16 GB GPU memory (8 GB if you use the fp8 or int4 model)

Install

pip install transformers==4.56.0 torch

Minimal working example

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "tencent/Hunyuan-MT-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "user",
     "content": "Translate into English:\n\n海水为什么是咸的？"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=128,
    temperature=0.7,
    top_p=0.6,
    top_k=20,
    repetition_penalty=1.05
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Expected output

Why is seawater salty? Because it contains large amounts of dissolved salts and minerals.

7. Fine-tuning your own data with LLaMA-Factory {#fine-tuning}

1. Install LLaMA-Factory

git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e .

2. Prepare your data

File: data/my_translate.json

[
  {
    "messages": [
      {"role": "user", "content": "Translate into Kazakh:\nOne Belt One Road"},
      {"role": "assistant", "content": "Бір белдеу, бір жол"}
    ]
  }
]

Update data/dataset_info.json

"my_translate": {
  "file_name": "my_translate.json",
  "formatting": "sharegpt",
  "columns": {"messages": "messages"},
  "tags": {
    "role_tag": "role",
    "content_tag": "content",
    "user_tag": "user",
    "assistant_tag": "assistant"
  }
}

3. Launch training (single GPU)

export DISABLE_VERSION_CHECK=1
llamafactory-cli train examples/hunyuan/hunyuan_full.yaml \
  --model_name_or_path tencent/Hunyuan-MT-7B \
  --dataset my_translate \
  --output_dir ./hunyuan-ft-7b

8. Production deployment {#deployment}

Framework	Pros	One-liner
TensorRT-LLM	lowest latency	`docker pull hunyuaninfer/hunyuan-7b:hunyuan-7b-trtllm`
vLLM	high throughput	`python -m vllm.entrypoints.openai.api_server --model tencent/Hunyuan-MT-7B`
sglang	minimal code	`python -m sglang.launch_server --model-path tencent/Hunyuan-MT-7B`

Example: vLLM server

python -m vllm.entrypoints.openai.api_server \
  --model tencent/Hunyuan-MT-7B \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype bfloat16

Then query:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tencent/Hunyuan-MT-7B",
    "messages": [
      {"role": "user", "content": "Translate into Tibetan:\nArtificial intelligence is changing our lives."}
    ]
  }'

9. FAQ {#faq}

Q: Is 7 B really enough for commercial use?
A: In blind human tests the gap between Hunyuan-MT-7B and Gemini-2.5-Pro is <0.1 points. For minority languages it is often better.

Q: How much GPU memory do I need?
A: 16 GB for bf16, 8 GB for fp8, 6 GB for int4.

Q: What if I only have a few thousand sentence pairs?
A: Use the LoRA script above; even 2–3 k high-quality pairs can noticeably improve the model.

Q: Does Chimera work offline?
A: Yes. You provide six candidate translations and run Chimera locally; no Internet required.

Q: Chain-of-thought for translation?
A: The paper tried it. Without joint reward on both CoT and the final sentence, the model only produced boiler-plate text, so it is disabled by default.

Wrap-up

Hunyuan-MT proves that 7 B parameters are already enough to challenge proprietary giants, provided you invest in clean data and a disciplined five-stage training recipe.
If you need a drop-in open-source translator for 33 languages—including low-resource ones like Tibetan or Kazakh—start with the single-line install above and fine-tune with your own data in hours, not weeks.

Happy translating!

Hunyuan-MT 7B: How a 7B-Parameter Model Beats Translation Giants

Hunyuan-MT: A 7-Billion-Parameter Translation Model That Outperforms Giants

Quick overview

Table of contents

1. Why another translation model? {#why-another-translation-model}

2. Five training stages in everyday words {#five-training-stages}

Stage 1 – General pre-training

Stage 2 – Translation-oriented pre-training

Stage 3 – Supervised Fine-Tuning (SFT)

Stage 4 – Reinforcement Learning (RL)

Stage 5 – Weak-to-Strong RL (fusion)

3. Data pipeline: from 1.3 T tokens to clean sentence pairs {#data-pipeline}

4. Benchmarks: numbers and what they mean {#benchmarks}

Automatic metrics (higher is better)

Human evaluation (0–4 scale)

5. Case studies {#case-studies}

6. Step-by-step setup {#step-by-step-setup}

Prerequisites

Install

Minimal working example

7. Fine-tuning your own data with LLaMA-Factory {#fine-tuning}

1. Install LLaMA-Factory

2. Prepare your data

3. Launch training (single GPU)

8. Production deployment {#deployment}

9. FAQ {#faq}

Wrap-up

Hunyuan-MT 7B: How a 7B-Parameter Model Beats Translation Giants

Hunyuan-MT: A 7-Billion-Parameter Translation Model That Outperforms Giants

Quick overview

Table of contents

1. Why another translation model? {#why-another-translation-model}

2. Five training stages in everyday words {#five-training-stages}

Stage 1 – General pre-training

Stage 2 – Translation-oriented pre-training

Stage 3 – Supervised Fine-Tuning (SFT)

Stage 4 – Reinforcement Learning (RL)

Stage 5 – Weak-to-Strong RL (fusion)

3. Data pipeline: from 1.3 T tokens to clean sentence pairs {#data-pipeline}

4. Benchmarks: numbers and what they mean {#benchmarks}

Automatic metrics (higher is better)

Human evaluation (0–4 scale)

5. Case studies {#case-studies}

6. Step-by-step setup {#step-by-step-setup}

Prerequisites

Install

Minimal working example

7. Fine-tuning your own data with LLaMA-Factory {#fine-tuning}

1. Install LLaMA-Factory

2. Prepare your data

3. Launch training (single GPU)

8. Production deployment {#deployment}

9. FAQ {#faq}

Wrap-up

Related Posts