ARM Model: Breaking Through the Efficiency Bottleneck in Large Model Reasoning
Introduction: Core Challenges in Large Model Reasoning
In recent years, large language models have demonstrated remarkable capabilities in complex reasoning tasks, yet they commonly exhibit “overthinking” – applying intricate reasoning chains even for simple problems. This results in wasted computational resources and response delays. The ARM (Adaptive Reasoning Model) developed through collaboration between Fudan University and Ohio State University introduces an innovative adaptive reasoning architecture that significantly improves computational efficiency while maintaining reasoning accuracy.
!https://team-arm.github.io/arm/images/architecture.png
Visual: ARM’s dynamic reasoning format selection balances efficiency and precision
Core Features: Three Reasoning Modes
Adaptive Mode (Default)
Dynamic Decision Mechanism: Automatically selects optimal reasoning formats based on task complexity
Four Reasoning Formats:
Format Token Usage Application Scenarios
Direct Answer 10-15 tokens Common sense questions
Short-chain 30-50 tokens Medium-difficulty math problems
Code Reasoning 100-150 tokens Symbolic computation tasks
Long-chain 300+ tokens Competition-level math challenges
Instruction-guided Mode
python
Explicit format specification example
input_text = “Question: Solve x²+2x+1=0…”
output = model.generate(input_text)
Supports forced reasoning format selection via special markers
Ideal for batch processing tasks with known optimal formats
Consensus-guided Mode
Parallel generation of answers in three efficient formats
Consensus validation mechanism
Automatic long-chain activation when discrepancies occur
Majority voting determines final output
Technical Breakthrough: Ada-GRPO Training Framework
Two-phase Training Process
Phase Method Data Volume Time Allocation
Supervised Fine-tuning Multi-format training 10.8K problems 40%
Reinforcement Learning Ada-GRPO optimization 19.8K problems 60%
Algorithm Innovation
The Ada-GRPO algorithm introduces format diversity rewards to traditional GRPO:
math
r_i’ = \alpha_i(t) \cdot r_i
\alpha_i(t) = \frac{G}{F(o_i)} \cdot \left[\frac{F(o_i)}{G} + 0.5\left(1-\frac{F(o_i)}{G}\right)\left(1+\cos(\pi\frac{t}{T})\right)\right]
This formula dynamically adjusts reward weights to:
Prevent format collapse in long-chain dominance
Encourage format exploration in early training
Focus on precision optimization in later stages
Performance Metrics: Benchmark Results
Cross-domain Performance
Dataset Accuracy Gain Token Saving Speed Improvement
CommonsenseQA +1.2% 73% 2.1x
GSM8K -0.3% 55% 1.8x
MATH +2.7% 42% 1.5x
Model Scale Comparison
!https://huggingface.co/arm-team/plots/comparison.png
Data Source: Qwen2.5 Series Model Benchmarks
Implementation Guide: Setup & Usage
Environment Configuration
bash
SFT Training
conda env create -f environment/llama_factory_env.yaml
RL Training
conda env create -f environment/verl_env.yaml
Training Example
python
Supervised Fine-tuning
CUDA_VISIBLE_DEVICES=0-3 llamafactory-cli train stage1_scripts/qwen2.5_7b/train.yaml
Reinforcement Learning
bash stage2_scripts/trainer/run.sh
Inference API
python
from arm import AdaptiveReasoner
model = AdaptiveReasoner.load(“arm-team/arm-7b”)
response = model.generate(
“Prove the Pythagorean theorem”,
mode=”adaptive”, # Options: consensus/instruction
temperature=0.7
)
Industry Applications
Education: Automatically adjusts explanation depth for math problems
Customer Service: Rapid responses for simple queries with deep analysis for complex issues
Scientific Computing: Automatic switching between code generation and symbolic computation
Financial Analysis: Consensus verification for critical decisions
Future Development
Extension to multimodal reasoning tasks
Development of low-precision quantized versions
Online learning framework construction
Optimization for extreme-scale models (100B+ parameters)
Conclusion: New Benchmark in Reasoning Efficiency
ARM establishes new industry standards through:
Up to 70% token reduction
2x training acceleration
Multi-format collaborative reasoning
The open-source implementation and pre-trained models are available on https://huggingface.co/arm-team, driving continuous advancement in efficient reasoning technologies.