Hunyuan-MT: A 7-Billion-Parameter Translation Model That Outperforms Giants

“Can a 7-billion-parameter model really beat 200-billion-parameter giants at translation?”
“Is open-source finally good enough for Tibetan, Uyghur, Kazakh, and Mongolian?”
“How long does it take to get it running on my own GPU?”

If you have asked any of these questions, you are in the right place.
This post translates the official Hunyuan-MT technical report and README into plain English. Every figure, command, and benchmark comes straight from the released files—nothing added, nothing removed.


Quick overview

Item Hunyuan-MT-7B Hunyuan-MT-Chimera-7B
Size 7 B parameters 7 B parameters (fusion model)
Languages 33, incl. Chinese, English, Japanese, French, German, Korean, Tibetan, Uyghur, Kazakh, Mongolian same
Training stages 5-stage pipeline (general pre-training → MT pre-training → SFT → RL → weak-to-strong RL) built on top of the base model
Key achievement first place in 30 of 31 WMT 2025 language pairs first open-source translation fusion model
License Apache-2.0 weights + code same
Links GitHub Hugging Face same

Table of contents

  1. Why another translation model?
  2. Five training stages in everyday words
  3. Data pipeline: from 1.3 T tokens to clean sentence pairs
  4. Benchmarks: numbers and what they mean
  5. Case studies: social-media slang, medical terms, place names
  6. Step-by-step setup
  7. Fine-tuning your own data with LLaMA-Factory
  8. Production deployment: TensorRT-LLM, vLLM, or sglang
  9. FAQ

1. Why another translation model? {#why-another-translation-model}

Machine translation has two stubborn problems:

  • Low-resource languages get poor coverage.
  • Proprietary APIs keep the best quality locked away.

Hunyuan-MT tries to solve both. It keeps the parameter count small (7 B) so you can run it on one consumer GPU, yet it outperforms much larger closed models on 33 languages, including Tibetan, Uyghur, Kazakh, and Mongolian.


2. Five training stages in everyday words {#five-training-stages}

Stage 1 – General pre-training

Goal: Teach the model general language understanding.
Data: 1.3 trillion tokens across 112 non-Chinese/English languages plus Chinese and English.
Cleanup:

  • A three-dimension quality model scores every document (knowledge value, authenticity, writing style).
  • Three tagging systems guarantee balanced topics: 24 industries × 24 themes.

Outcome: Hunyuan-7B-Base.

Stage 2 – Translation-oriented pre-training

Goal: Focus only on translation skills.
Data mix: monolingual (mC4, OSCAR) + parallel (OPUS, ParaCrawl).
Trick 1 – RegMix: run small-scale experiments to find the best mixing ratio instead of guessing.
Trick 2 – Replay buffer: 20 % of general data is re-inserted to avoid forgetting.

Outcome: Hunyuan-7B-Base★ (★ = after MT pre-training).

Stage 3 – Supervised Fine-Tuning (SFT)

Two data passes

Pass Size Source Purpose
Stage I 3 million pairs Flores-200, WMT tests, human-checked minority↔Chinese, synthetic data teach basic translation
Stage II 268 k pairs deep-cleaned, many-shot filtered, human-verified polish quality

Prompt templates
Chinese⇄Other

把下面的文本翻译成<target_language>,不要额外解释。
<source_text>

Any other pair

Translate the following segment into <target_language>, without additional explanation.
<source_text>

Stage 4 – Reinforcement Learning (RL)

Algorithm: GRPO (Group Relative Policy Optimization).
Reward components

  1. XCOMET-XXL – automatic metric close to human judgment
  2. DeepSeek-V3 score – secondary semantic check
  3. Terminology overlap – word-alignment-based for domain terms
  4. Repetition penalty – stops the model from looping

Stage 5 – Weak-to-Strong RL (fusion)

Idea: at inference time, collect 6 candidate translations, then ask the same 7 B model to “refine” them into one.
Gain: extra 2–5 % XCOMET score without extra parameters.


3. Data pipeline: from 1.3 T tokens to clean sentence pairs {#data-pipeline}

Step Tool What it removes
Language ID fastText mis-classified documents
Near-duplicates minLSH duplicate pages
Perplexity filter KenLM low-quality or garbled text
Parallel-corpus quality CometKiwi bad alignments

4. Benchmarks: numbers and what they mean {#benchmarks}

Automatic metrics (higher is better)

Test set Direction Hunyuan-MT-7B Qwen3-235B-A22B Gemini-2.5-Pro
WMT24pp English → XX 85.9 76.7 82.5
Mandarin ↔ Minority Chinese ↔ Tibetan/Uyghur/Kazakh/Mongol 60.8 44.9 58.1
Flores-200 Chinese → XX 87.6 85.1 91.5

Human evaluation (0–4 scale)

Model Chinese → English English → Chinese Average
Hunyuan-MT-7B 3.26 3.16 3.21
Gemini-2.5-Pro 3.23 3.22 3.22
Google Translate 2.84 2.10 2.47

5. Case studies {#case-studies}

Scenario Input Hunyuan-MT-7B Google Translate
Social media slang 小红薯在国外疯魔了 REDnote has become incredibly popular abroad sweet potatoes are popular abroad
English idiom You are killing me! 你真要把我笑死了! you are going to kill me
Medical terms 尿酸性肾结石 uric acid kidney stones uricidal kidney stones
Place name 654 Huangpu Drive 黄埔大道654号 (kept in English)

6. Step-by-step setup {#step-by-step-setup}

Prerequisites

  • Python 3.9+
  • CUDA 11.8 or 12.x
  • 16 GB GPU memory (8 GB if you use the fp8 or int4 model)

Install

pip install transformers==4.56.0 torch

Minimal working example

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "tencent/Hunyuan-MT-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "user",
     "content": "Translate into English:\n\n海水为什么是咸的?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=128,
    temperature=0.7,
    top_p=0.6,
    top_k=20,
    repetition_penalty=1.05
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Expected output

Why is seawater salty? Because it contains large amounts of dissolved salts and minerals.

7. Fine-tuning your own data with LLaMA-Factory {#fine-tuning}

1. Install LLaMA-Factory

git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e .

2. Prepare your data

File: data/my_translate.json

[
  {
    "messages": [
      {"role": "user", "content": "Translate into Kazakh:\nOne Belt One Road"},
      {"role": "assistant", "content": "Бір белдеу, бір жол"}
    ]
  }
]

Update data/dataset_info.json

"my_translate": {
  "file_name": "my_translate.json",
  "formatting": "sharegpt",
  "columns": {"messages": "messages"},
  "tags": {
    "role_tag": "role",
    "content_tag": "content",
    "user_tag": "user",
    "assistant_tag": "assistant"
  }
}

3. Launch training (single GPU)

export DISABLE_VERSION_CHECK=1
llamafactory-cli train examples/hunyuan/hunyuan_full.yaml \
  --model_name_or_path tencent/Hunyuan-MT-7B \
  --dataset my_translate \
  --output_dir ./hunyuan-ft-7b

8. Production deployment {#deployment}

Framework Pros One-liner
TensorRT-LLM lowest latency docker pull hunyuaninfer/hunyuan-7b:hunyuan-7b-trtllm
vLLM high throughput python -m vllm.entrypoints.openai.api_server --model tencent/Hunyuan-MT-7B
sglang minimal code python -m sglang.launch_server --model-path tencent/Hunyuan-MT-7B

Example: vLLM server

python -m vllm.entrypoints.openai.api_server \
  --model tencent/Hunyuan-MT-7B \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype bfloat16

Then query:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tencent/Hunyuan-MT-7B",
    "messages": [
      {"role": "user", "content": "Translate into Tibetan:\nArtificial intelligence is changing our lives."}
    ]
  }'

9. FAQ {#faq}

Q: Is 7 B really enough for commercial use?
A: In blind human tests the gap between Hunyuan-MT-7B and Gemini-2.5-Pro is <0.1 points. For minority languages it is often better.

Q: How much GPU memory do I need?
A: 16 GB for bf16, 8 GB for fp8, 6 GB for int4.

Q: What if I only have a few thousand sentence pairs?
A: Use the LoRA script above; even 2–3 k high-quality pairs can noticeably improve the model.

Q: Does Chimera work offline?
A: Yes. You provide six candidate translations and run Chimera locally; no Internet required.

Q: Chain-of-thought for translation?
A: The paper tried it. Without joint reward on both CoT and the final sentence, the model only produced boiler-plate text, so it is disabled by default.


Wrap-up

Hunyuan-MT proves that 7 B parameters are already enough to challenge proprietary giants, provided you invest in clean data and a disciplined five-stage training recipe.
If you need a drop-in open-source translator for 33 languages—including low-resource ones like Tibetan or Kazakh—start with the single-line install above and fine-tune with your own data in hours, not weeks.

Happy translating!