Large Language Modelsarchive

The A.X K1 Deep Dive: A 519B MoE Model with Think-Fusion Intelligence

29 days ago 高效码农

Deep Dive into A.X K1: Architecture Design and Think-Fusion Evolution of a 519B MoE Model Snippet: A.X K1 is a 519B-parameter Mixture-of-Experts (MoE) model by SK Telecom, activating only 33B parameters for efficient inference. It introduces the Think-Fusion training recipe, enabling a unified model to switch between high-speed “intuition” and deep “reasoning” modes, setting new benchmarks in Korean and multi-language AI performance. In the pursuit of Artificial General Intelligence (AGI), the industry faces a constant tug-of-war: how to maintain massive model capacity without skyrocketing inference costs. The newly released A.X K1 technical report provides a definitive answer. By leveraging a …

Taming Hyper-Connections: How Geometric Constraints Revolutionize LLM Training Stability

1 months ago 高效码农

When Residual Connections Go Rogue: How We Tamed Hyper-Connections with Geometry Hyper-Connections promised better performance but delivered training instability. Manifold-Constrained Hyper-Connections fix this by forcing residual mappings onto the Birkhoff polytope, restoring stability while preserving all performance gains with only 6.7% overhead. Introduction: The Hidden Cost of Wider Residual Streams What happens when you try to increase a model’s capacity by widening its residual connections without adding constraints? You get unpredictable signal explosions that crash training runs. We learned this the hard way while training a 27-billion parameter model. For a decade, residual connections have been the quiet heroes of …

R-Few: How Minimal Human Supervision Enables Stable LLM Self-Evolution

2 months ago 高效码农

From “Self-Taught” to “Mentor-Guided”: How R-Few Enables Stable Self-Evolution of LLMs with Minimal Human Supervision This article aims to answer a core question: How can we build a Large Language Model (LLM) system capable of continuous and stable self-improvement without relying on massive amounts of labeled data, while preventing it from plateauing or veering off course during its own training? The vision of AI that can autonomously learn and evolve through practice, much like humans do, has long been a dream on the path toward more advanced intelligence. Imagine a model that could improve its reasoning abilities like AlphaZero mastered …

DeepSeek-V3.2: The Open-Source LLM Challenging GPT-5 & Gemini-3.0 in AI Reasoning

2 months ago 高效码农

DeepSeek-V3.2: Pushing the Frontier of Open-Source Large Language Models In today’s rapidly evolving artificial intelligence landscape, large language models (LLMs) have become the core driving force behind technological advancement. Recently, DeepSeek-AI released the全新的DeepSeek-V3.2 model, a breakthrough that not only delivers outstanding performance across multiple benchmarks but also achieves an ingenious balance between efficiency and capability, injecting new vitality into the open-source AI community. Model Overview: The Perfect Fusion of Efficient Reasoning and Agentic AI DeepSeek-V3.2 is a large language model that integrates efficient computation, exceptional reasoning ability, and agent performance. It’s built upon three key technological innovations: DeepSeek Sparse Attention …

Qwen3-VL: How a 256K-Token Vision Model Masters 500-Page Documents

2 months ago 高效码农

Inside Qwen3-VL: How a 256K-Token Vision-Language Model Learns to Read 500-Page Documents and 2-Hour Videos Without Breaking a Sweat A plain-language walk-through of the technical report that introduced Qwen3-VL—no hype, no jargon, and no external facts beyond the original paper. Table of Contents The 30-Second Takeaway Model Family at a Glance Three Architectural Tweaks That Actually Matter Four-Stage Training From Scratch What the Model Was Fed (Data Ingredients) Post-Training: SFT, Distillation, and Reinforcement Learning “Thinking Mode” Explained Benchmark Scores in One Sitting Hardware-Friendly Deployment Answers to the Most-Asked Questions Key Limits and Next Steps 1. The 30-Second Takeaway Qwen3-VL is …

POPE: The Breakthrough RL Method for Scaling LLM Reasoning on Hard Problems

2 months ago 高效码农

🧠 How to Scale RL for Hard Reasoning Problems in LLMs: A Deep Engineering Dive into POPE Based on CMU ML Blog — “How to Explore to Scale RL Training of LLMs on Hard Problems?” Written for engineers, researchers, and practitioners building RL-trained reasoning LLMs. 1. Introduction: Why RL Hits a Wall on Hard Problems Reinforcement Learning (RL) has become a central technique for improving reasoning abilities of Large Language Models. However, practitioners have started to observe a frustrating pattern: Even with large-scale rollouts, well-designed reward functions, and advanced PPO variants… LLMs simply fail to learn genuinely hard reasoning tasks. …

How to Build an LLM Council for Smarter AI Decisions

2 months ago 高效码农

LLM Council: Leverage Collective Wisdom from Multiple LLMs llmcouncil Instead of relying on a single LLM provider—like OpenAI GPT 5.1, Google Gemini 3.0 Pro, Anthropic Claude Sonnet 4.5, or xAI Grok 4—what if you could gather them into your own “LLM Council”? This repo introduces a simple, local web app that works like ChatGPT but with a twist: it uses OpenRouter to send your query to multiple LLMs, lets them review and rank each other’s outputs, and finally lets a “Chairman LLM” craft a polished final response. How It Works: The 3-Stage Process When you submit a query, here’s what …

OpenPangu Ultra-MoE-718B-V1.1: How This Massive AI Model Solves Real-World Problems

2 months ago 高效码农

OpenPangu Ultra-MoE-718B-V1.1: A Practical Guide to This Massive Mixture-of-Experts Language Model What Is OpenPangu Ultra-MoE-718B-V1.1, and How Can It Fit into Your AI Projects? OpenPangu Ultra-MoE-718B-V1.1 is a large-scale mixture-of-experts language model trained on Ascend NPU hardware, boasting a total of 718 billion parameters but activating just 39 billion at a time. This setup gives it two key abilities: quick thinking for fast responses and deep thinking for tackling tough problems. Compared to the earlier V1.0 version, V1.1 shines brighter with better tool-calling skills for agents, a much lower rate of hallucinations—those pesky made-up facts—and overall stronger performance across the …

Kimi Linear: How This Hybrid Attention Architecture Masters Million-Token Contexts

3 months ago 高效码农

Kimi Linear: Revolutionizing Efficient Attention Architecture for Long Context Processing The Core Challenge in Modern Language Models How can we process million-token contexts while maintaining performance and efficiency? Kimi Linear presents a groundbreaking hybrid attention architecture that successfully addresses this fundamental challenge. As large language models evolve into sophisticated agents capable of complex tool usage and multi-step reasoning, the computational limitations of traditional attention mechanisms have become increasingly apparent. The quadratic time complexity and linearly growing memory requirements of standard softmax attention create significant bottlenecks for real-world applications. Kimi Linear emerges as a comprehensive solution that not only maintains but …

Long-Term Memory for LLMs: How OpenMemory Solves the Goldfish Problem for Good

3 months ago 高效码农

OpenMemory: Give Any AI a Private, Persistent & Explainable Long-Term Memory “ In one line—OpenMemory is a self-hosted, MIT-licensed “memory engine” that turns LLMs from goldfish into elephants: they never forget user facts, yet can tell you exactly why they recalled something. Core questions this post answers Why do vector DBs and chat-history caches fail at “getting smarter over time”? How does OpenMemory’s Hierarchical Memory Decomposition (HMD) work in plain English? Can you go from git clone to first recall in under 10 minutes? What does production look like for a personal assistant, an enterprise copilot and a LangGraph agent? …

DeepSeek-R1-Safe: Revolutionizing AI Safety with Bilingual Security Training & Ascend Chip Optimization

4 months ago 高效码农

As artificial intelligence continues to evolve at a rapid pace, the capabilities of large language models are expanding—but so are concerns around their safety and compliance. This is where DeepSeek-R1-Safe comes in: a pioneering solution designed to tackle these critical challenges head-on. What Is DeepSeek-R1-Safe? DeepSeek-R1-Safe is a safety-aligned large language model developed through a collaboration between Zhejiang University’s College of Cybersecurity and Huawei. Built upon the advanced DeepSeek architecture, this model has been specifically optimized to address security and compliance challenges in AI applications. The model runs on Huawei’s Ascend chips and leverages the MindSpeed-LLM framework for development and …

Grok 4 Fast Review: xAI’s Reasoning Powerhouse vs GPT-5 & Claude (Performance Deep Dive)

4 months ago 高效码农

Choosing the right large language model (LLM) is a critical decision for developers and businesses. With the market offering a vast array of models, each promising a different blend of intelligence, speed, and cost, making an informed choice requires clear, unbiased data. This analysis provides a comprehensive examination of xAI’s Grok 4 Fast, situating its performance within the broader landscape of contemporary models like GPT-5, Claude 4.1 Opus, Gemini 2.5, and various open-weight alternatives, using data from rigorous independent evaluations. How Do We Measure “Intelligence” in AI Models? To compare models objectively, we rely on standardized benchmarks that test a …

ParaThinker Revolutionizes LLM Reasoning: Native Parallel Thinking Breaks Test-Time Scaling Barriers

4 months ago 高效码农

ParaThinker: Native Parallel Thinking – A New Way to Unlock LLM Reasoning Potential Introduction: How Can We Break the Test-Time Scaling Barrier in LLMs? Large language models (LLMs) have made remarkable strides by scaling test-time compute—generating longer sequential reasoning paths to improve performance. However, this approach hits a ceiling where more computation yields minimal gains. ParaThinker addresses this by introducing native parallel thinking, allowing LLMs to generate multiple diverse reasoning paths simultaneously and synthesize them into better answers, overcoming the “Tunnel Vision” limitation of sequential reasoning. In recent years, the progress of LLMs has been driven by scaling—first in pretraining …

Set Block Decoding: Achieve 3-5x Faster LLM Inference Speeds Instantly

4 months ago 高效码农

Set Block Decoding: A New Method to Boost Large Language Model Inference Speed by 3-5x 1. The Problem: Why Do Language Models Need Faster Inference? If you’ve ever used a large language model (LLM) for tasks like writing code or solving math problems, you might have experienced: Lagging responses when generating long code blocks Slowdowns halfway through complex calculations Increasing wait times as text generation progresses These issues stem from fundamental challenges in LLM inference. Traditional autoregressive models face three core limitations: Key Pain Points: Computational Intensity: Each new word (token) requires a full model computation Memory Pressure: Constant reloading …

Grok 2 Unleashed: Your Complete 5-Step Guide to Downloading, Deploying and Running the AI Powerhouse

5 months ago 高效码农

Grok 2 Model: A Complete Guide to Downloading, Deploying, and Running Large-scale language models have quickly become critical infrastructure in today’s AI-driven world. Grok 2, developed and used by xAI in 2024, is one such model. With its released weights, Grok 2 provides researchers and developers an opportunity to explore, experiment, and build applications using cutting-edge technology. This article walks you step by step through the entire process of downloading, setting up, and running Grok 2. The guide is based entirely on the official instructions and includes all technical details: downloading the weights, preparing the runtime environment, launching an inference …

Unlock OpenAI’s gpt-oss: Run & Fine-Tune Billion-Parameter Models on Consumer Hardware

6 months ago 高效码农

The Complete Guide to Running and Fine-Tuning OpenAI’s gpt-oss Models with Unsloth You might wonder: How can I run billion-parameter open-source models efficiently? OpenAI’s newly released gpt-oss series combined with Unsloth’s toolchain enables high-performance inference and fine-tuning on consumer hardware. What Are gpt-oss Models? In August 2025, OpenAI open-sourced two breakthrough language models: gpt-oss-120b and gpt-oss-20b. Both models feature: Apache 2.0 license for commercial use 128k context window for long-form reasoning State-of-the-art performance in reasoning, tool use, and agentic tasks Key Model Specifications Model Parameters Performance Benchmark Core Strengths gpt-oss-20b 20 billion Matches o3-mini Tool calling, chain-of-thought reasoning gpt-oss-120b 120 …

Introducing Qwen3-4B-Thinking-2507: The Lightweight LLM That Outperforms Larger Models in Complex Reasoning

6 months ago 高效码农

Qwen3-4B-Thinking-2507: The Open-Source LLM That Thinks Deeper and Reasons Smarter “ Core breakthrough: Alibaba Cloud’s newly upgraded Qwen3-4B-Thinking-2507 model delivers exceptional performance in complex tasks like logical reasoning and coding, featuring native 262K context understanding – outclassing larger models in specialized benchmarks. Why This Model Matters If you need an open-source LLM that excels at complex decision-making, Qwen3-4B-Thinking-2507 deserves attention. This lightweight 4B-parameter model outperforms 30B-class models in specialized tests. Its standout feature? An automated thinking mechanism – no manual activation required. The model internally generates reasoning chains before delivering final outputs. Three Major Upgrades 1. Quantum Leap in Reasoning …

Step3 Model: How a 321B-Parameter AI Beats 37B Models at 39% Lower Cost

6 months ago 高效码农

Step3: How a 321-Billion-Parameter Model Runs Cheaper Than a 37-Billion One A Plain-English Guide for Developers, Students, and Curious Minds Quick Takeaways What you get Number Cost per 1 M tokens (32 K context) 0.13 USD (vs. 0.21 for DeepSeek-V3) Tokens per second on one H800 GPU 4 039 (vs. 2 324 for DeepSeek-V3) GPUs to start serving 32 (vs. 128–320 for similar models) If you only remember three things, remember those. 1. What Exactly Is Step3? Step3 is a vision-language model with 321 billion total parameters, but only 38 billion are active for each token. Think of it like …

Inside 2025’s LLM Revolution: From GPT-2 to Kimi 2 Architectures Explained

6 months ago 高效码农

From GPT-2 to Kimi 2: A Visual Guide to 2025’s Leading Large Language Model Architectures If you already use large language models but still get lost in technical jargon, this post is for you. In one long read you’ll learn: Why DeepSeek-V3’s 671 B parameters run cheaper than Llama 3’s 405 B How sliding-window attention lets a 27 B model run on a Mac Mini Which open-weight model to download for your next side project Table of Contents Seven Years of the Same Backbone—What Actually Changed? DeepSeek-V3 / R1: MLA + MoE, the Memory-Saving Duo OLMo 2: Moving RMSNorm One …

Kimi K2 Unleashed: How Moonshot AI’s Agentic Intelligence is Redefining AI Capabilities

6 months ago 高效码农

Kimi K2: Unleashing Agentic Intelligence with MoE and Muon Optimization Driven by the rapid evolution of large language models, Kimi K2 emerges from Moonshot AI as a next-generation agentic intelligence powerhouse. Boasting a trillion-parameter mixture-of-experts (MoE) architecture and over thirty-two billion active parameters, Kimi K2 was engineered to excel in natural language understanding, code generation, advanced reasoning, and seamless tool integration. This comprehensive guide presents a clear, practical overview—tailored for readers with junior college education or above—covering its design philosophy, architecture, performance benchmarks, deployment strategies, and hands-on examples. Table of Contents Why Agentic Intelligence Matters Core Innovations in Kimi K2 …