The A.X K1 Deep Dive: A 519B MoE Model with Think-Fusion Intelligence

3 days ago 高效码农

Deep Dive into A.X K1: Architecture Design and Think-Fusion Evolution of a 519B MoE Model Snippet: A.X K1 is a 519B-parameter Mixture-of-Experts (MoE) model by SK Telecom, activating only 33B parameters for efficient inference. It introduces the Think-Fusion training recipe, enabling a unified model to switch between high-speed “intuition” and deep “reasoning” modes, setting new benchmarks in Korean and multi-language AI performance. In the pursuit of Artificial General Intelligence (AGI), the industry faces a constant tug-of-war: how to maintain massive model capacity without skyrocketing inference costs. The newly released A.X K1 technical report provides a definitive answer. By leveraging a …

NVIDIA Nemotron-3-Nano Architecture: How the 31B MoE Model with Mamba-2 Delivers 1M Context

25 days ago 高效码农

Nemotron-3-Nano Under the Hood: 31 B Parameters, 3 B Active, 1 M Context, 3× Faster Inference “ TL;DR: NVIDIA’s latest open-weight model keeps 128 experts on standby, wakes up only 6, and mixes Mamba-2 with Group-Query Attention to deliver 25 T token pre-training, multi-environment RL, and FP8 inference that outruns models twice its activated size while supporting 1 M token context. What Makes Nemotron-3-Nano Special in One Sentence? It achieves higher accuracy than Nemotron-2-Nano and competitive models while activating less than half the parameters per forward pass and delivering up to 3.3× higher inference throughput on a single H200 GPU. …

Running an 8.3 B-Parameter Neural Network on a Phone CPU: Inside LFM2-8B-A1B’s Sparse-Magic and On-Device Deployment Guide

3 months ago 高效码农

“ “Mixture-of-Experts only lives in the cloud?” Liquid AI just proved that idea wrong with a Samsung Galaxy S24 Ultra and a 2-second local reply. 1. Opening scene – why this model matters It is 1 a.m. and you are still polishing a slide deck. A pop-up asks: “Summarise this 200-page English PDF into ten Chinese bullets, please.” Old routine: copy → cloud assistant → wait → pay. New routine: press “Run” on your phone; two seconds later the answer is there – no Internet, no fee, no data leakage. The engine behind the new routine is LFM2-8B-A1B, Liquid AI’s …