Introducing Cogito v2 Preview: The Next Leap in Self-Improving AI Models

DeepCogito unveils groundbreaking open-source language models that evolve through autonomous reasoning refinement, setting new standards for AI efficiency and capability.


Key Highlights at a Glance

Feature Technical Advancement
Open Models 4 hybrid reasoning models released under open license
Model Scale 70B dense, 109B MoE, 405B dense, 671B MoE
Core Innovation Iterated Distillation & Amplification (IDA) for autonomous capability enhancement
Reasoning Efficiency 60% shorter reasoning chains than DeepSeek R1
Training Efficiency All models trained for <$3.5M (including data generation)
Performance 671B MoE matches DeepSeek’s latest models, approaches closed frontier systems

How to Access Cogito v2 Models

Experience these cutting-edge models through multiple channels:

1. **Direct Download**:  
   [Hugging Face Model Collection](https://huggingface.co/collections/deepcogito/cogito-v2-preview-6886b5450b897ea2a2389a6b)
   
2. **Cloud APIs**:  
   - [Together AI Playground](https://api.together.xyz/playground/deepcogito/cogito-v2-preview-deepseek-671b)  
   - [Baseten Deployment](https://www.baseten.co/library/cogito-v2-671b/)  
   - [RunPod Console](https://console.runpod.io/hub)  

3. **Local Execution**:  
   Use [Unsloth Framework](https://docs.unsloth.ai/basics/tutorials-how-to-fine-tune-and-run-llms/cogito-v2-how-to-run-locally) for optimized local deployment

The Technical Breakthrough: From Computation to Intuition

Why Cogito v2 Changes the Game

Traditional LLMs improve performance through extended reasoning chains—like solving math problems through brute-force calculation. Cogito v2 fundamentally transforms this approach through Iterated Distillation & Amplification (IDA):

  • Internalized Reasoning: Converts explicit computation into model intuition
  • Autonomous Refinement: Models evolve capabilities through parameter distillation
  • Efficiency Leap: 671B MoE achieves superior results with 60% shorter reasoning chains

Analogy:
Like chess novices calculating every move (traditional LLMs) versus grandmasters intuiting optimal strategies (Cogito models)

The Self-Improvement Engine

graph LR
A[Base Model] --> B(Reasoning Search)
B --> C[Successful Solution Paths]
C --> D[Parameter Distillation]
D --> E[Enhanced Model]
E --> B

This creates an autonomous improvement loop where models:

  1. Develop stronger “intuition” for optimal solution paths
  2. Continuously upgrade fundamental capabilities
  3. Reduce dependency on computational brute force

Model Specifications & Capabilities

Comparative Model Analysis

Model Architecture Parameters Core Advancement Performance Benchmark
70B Dense 70B Enhanced non-reasoning intuition Competes with leading same-scale models
109B MoE 109B Hybrid expert optimization Outperforms comparable Llama models
405B Dense 405B Advanced parametric reasoning Nears frontier closed-model performance
671B MoE 671B Dual-mode enhancement (reasoning + standard) Matches latest DeepSeek models

Flagship Performance: 671B MoE

Demonstrated capabilities:

  • Reasoning Mode: Outperforms DeepSeek R1, matches DeepSeek R1 0528
  • Standard Mode: Surpasses DeepSeek v3, equals DeepSeek v3 0324
  • Overall Position: Among world’s strongest open models, approaching closed models like o3/Claude 4 Opus

Benchmark Visualization:
Cogito 671B MoE Standard Mode Performance
Standard inference benchmarks

Cogito 671B MoE Reasoning Mode Performance
Structured reasoning benchmarks


The Science of Building Superintelligence

From Game AI to General Intelligence

Similar to AlphaGo’s evolution through Monte Carlo Tree Search (MCTS), Cogito implements:

  1. Solution Search: Models explore reasoning paths during inference
  2. Policy Refinement: Successful paths distilled into parameters
  3. Intuition Development: New iterations start closer to optimal solutions

Breaking Traditional Limitations

Conventional LLM Constraints:
- Dependence on extended reasoning chains
- Heuristic patches like backtracking

Cogito's Breakthrough Approach:
1. Supervises reasoning process (not just outputs)
2. Discourages inefficient exploration
3. Builds precise path anticipation capabilities

Unexpected Capability: Cross-Modal Reasoning

Despite text-only training, models demonstrate emergent visual reasoning abilities:

Input Example:
Duck and Lion Comparison
“Compare these two images. What are the similarities and differences?”

Model Reasoning Process:

<think>
Analysis sequence:
1. Mallard duck: Aquatic environment, color attributes, reflection symmetry
2. African lion: Terrestrial setting, mane characteristics, direct gaze
3. Commonalities: Natural habitats, male specimens, subject-focused composition
4. Contrasts: Environment types, size dimensions, motion states...
</think>

Structured Output:

Comparison Aspect Mallard Duck African Lion
Environment Aquatic (blue water) Terrestrial (golden grassland)
Color Palette Vibrant greens/yellows Earth tones (browns/golds)
Motion State Active swimming Stationary position
Composition Reflection symmetry Open-space framing
Emotional Tone Peaceful tranquility Regal authority

Research Significance: Pure transfer learning enables visual reasoning without multimodal training data, suggesting new bootstrap methods for multimodal systems.


Comprehensive Benchmark Results

70B Dense Model Performance

Cogito 70B Benchmarks

109B MoE Model Performance

Cogito 109B Benchmarks

405B Dense Model Performance

Cogito 405B Benchmarks

Technical FAQ: Understanding Cogito v2

What distinguishes MoE architecture?

Answer: Mixture-of-Experts (MoE) systems activate specialized sub-networks per task. For example, the 109B MoE model activates ~24B parameters per query, dramatically improving efficiency.

How can developers implement these models?

Answer: Three practical approaches:

  1. Cloud APIs via Together AI/Baseten
  2. Managed deployments through RunPod
  3. Local execution using Unsloth’s optimized framework

How does performance compare to DeepSeek models?

Answer: Through IDA methodology, Cogito achieves:

  • 60% shorter reasoning chains at comparable accuracy
  • Reduced computational requirements
  • Built-in autonomous improvement capability

How was such low training cost achieved?

Answer: The IDA framework enables unprecedented efficiency by:

  • Eliminating unproductive search paths
  • Precisely distilling critical reasoning patterns
  • Training 8 models (3B to 671B) for <$3.5M total

Future Development Roadmap

  1. Compute Scaling: Apply additional resources to current architecture
  2. Open Commitment: Maintain open licensing for all future models
  3. Cross-Modal Research: Explore autonomous visual reasoning training
  4. Community Integration: Expand Hugging Face collaboration

“Building superintelligence is fundamentally a tractable machine learning problem”
— DeepCogito Research Manifesto