On-Device AIarchive | Efficient Coder

OpenPhone Unveiled: How 3B-Parameter AI Agents Are Powering the Next-Gen Smartphone

4 days ago 高效码农

Exploring OpenPhone: How Lightweight Mobile Agentic Foundation Models Are Shaping the Future of AI Phones Featured Snippet Summary OpenPhone is an open-source 3B-parameter agentic foundation model designed for on-device smartphone interactions, addressing privacy, latency, and cost issues from cloud API reliance. Running entirely locally, it achieves performance comparable to 7B-9B models through advanced SFT+RL training, while a device-cloud collaboration framework reduces cloud calls by about 10%. In today’s smartphone world, we often run into frustrations with AI assistants: they constantly ping the cloud, raising privacy concerns, slowing responses, and racking up API costs. What if your phone could handle most …

LiteRT NeuroPilot Unlocks Phone NPUs: The Secret to 1600+ Tokens/sec On-Device LLMs

12 days ago 高效码农

Google LiteRT NeuroPilot: Making Phone NPUs “First-Class Citizens” for On-Device LLMs In the era of pursuing faster, more private AI experiences, running Large Language Models (LLMs) directly on devices is the critical next step. Yet, fitting models with billions of parameters into smartphones and running them smoothly has remained a significant challenge for developers. Recently, the LiteRT NeuroPilot Accelerator stack, launched by Google and MediaTek, aims to turn the NPUs (Neural Processing Units) in MediaTek’s Dimensity series chips into the “preferred target” for on-device LLMs. This is not just another technical update; it seeks to fundamentally change how developers interact …

How to Run LLMs on MediaTek Phones Using LiteRT-NeuroPilot

13 days ago 高效码农

MediaTek NPU × LiteRT: Running LLMs on Phones Without Losing Your Sanity A field-note style walkthrough of the new LiteRT NeuroPilot Accelerator—what it is, why it matters, and how to ship a 1B-parameter model in an Android APK in under 30 min. 0. One-Sentence Take-away You can now compile a Gemma 3 1B model once and run it on millions of MediaTek phones at 1 600 tokens/s prefill—without writing a single line of SoC-specific C++—thanks to the LiteRT NeuroPilot Accelerator. 1. Why On-Device LLMs Keep Getting Stuck 1 cm from the Finish Line Core question: “I already have an INT8 …

Running an 8.3 B-Parameter Neural Network on a Phone CPU: Inside LFM2-8B-A1B’s Sparse-Magic and On-Device Deployment Guide

2 months ago 高效码农

“ “Mixture-of-Experts only lives in the cloud?” Liquid AI just proved that idea wrong with a Samsung Galaxy S24 Ultra and a 2-second local reply. 1. Opening scene – why this model matters It is 1 a.m. and you are still polishing a slide deck. A pop-up asks: “Summarise this 200-page English PDF into ten Chinese bullets, please.” Old routine: copy → cloud assistant → wait → pay. New routine: press “Run” on your phone; two seconds later the answer is there – no Internet, no fee, no data leakage. The engine behind the new routine is LFM2-8B-A1B, Liquid AI’s …

MiniCPM4 Revealed: How Edge Devices Run GPT-3-Class Models at 30W Power

3 months ago 高效码农

MiniCPM4 & MiniCPM4.1: A Pocket-Sized 8 B-Parameter Model That Thinks—and Runs—at the Edge (The no-hype, no-code-dump guide for junior developers, product managers, and tinkerers) “Can I really run a GPT-3-class model on a lunch-box computer?” If that question keeps you awake, this article is the sleeping pill. Everything below is copied straight from the official OpenBMB repositories (no extra facts, no fluff). I’ve only translated, re-ordered, and explained the bits that usually stay locked inside research papers. 1. Elevator summary What Number Why it matters Model size 8 B parameters Fits a 16 GB RTX 4070 at 16-bit, or a …

Kitten TTS: Ultra-Efficient AI Text-to-Speech Model for On-Device Voice Synthesis

4 months ago 高效码农

What Is Kitten TTS and Why It Matters? In the world of AI voice synthesis, the prevailing narrative has been “bigger is better.” Multi-billion-parameter models deliver life-like speech—but only if you have a GPU farm and an AWS budget to match. Kitten TTS flips that script. At just 15 million parameters and under 25 MB on disk, this open-source, Apache 2.0-licensed model delivers expressive, high-quality voices without a GPU—on everything from your laptop to a Raspberry Pi, or even a smartphone. Kitten TTS isn’t about chasing benchmarks; it’s about democratizing voice AI. By slashing resource requirements, it puts advanced text-to-speech …

Microsoft Mu AI Revolutionizes Windows Settings: 330M-Parameter On-Device Intelligence Redefines Local AI Execution

6 months ago 高效码农

Mu: How Microsoft’s Tiny On-Device AI Transforms Windows Settings “ Processing 100+ tokens per second entirely on NPU hardware – Microsoft’s Mu language model delivers instant settings control without cloud dependency. The Dawn of On-Device Intelligence When you type “dim screen at night” into Windows Settings, a 330-million parameter AI springs into action on your device’s Neural Processing Unit (NPU). This is Mu – Microsoft’s purpose-built language model that translates natural language into precise system actions. Currently powering the Settings Agent in Copilot+ PCs for Windows Insiders, Mu represents a paradigm shift in local AI execution. Why This Matters: 🚫 …

Cactus Framework: Revolutionizing On-Device AI Development for Mobile Apps

6 months ago 高效码农

Cactus Framework: The Ultimate Solution for On-Device AI Development on Mobile Why Do We Need Mobile-Optimized AI Frameworks? Cactus Architecture Diagram With smartphone capabilities reaching new heights, running AI models locally has become an industry imperative. The Cactus framework addresses three critical technical challenges through innovative solutions: Memory Optimization – 1.2GB memory footprint for 1.5B parameter models Cross-Platform Consistency – Unified APIs for Flutter/React-Native Power Efficiency – 15% battery drain for 3hr continuous inference Technical Architecture Overview [Architecture Diagram] Application Layer → Binding Layer → C++ Core → GGML/GGUF Backend Supports React/Flutter/Native implementations Optimized via Llama.cpp computation Core Feature Matrix …

Gemma 3n: How Google DeepMind Redefines On-Device AI for Real-Time Multimodal Tasks

7 months ago 高效码农

Google DeepMind Unveils Gemma 3n: Redefining Real-Time Multimodal AI for On-Device Use Introduction: Why On-Device AI Is the Future of Intelligent Computing As smartphones, tablets, and laptops evolve at breakneck speed, user expectations for AI have shifted dramatically. The demand is no longer limited to cloud-based solutions—people want AI to run locally on their devices. Whether it’s real-time language translation, context-aware content generation, or offline processing of sensitive data, the vision is clear. Yet, two critical challenges remain: memory constraints and response latency. Traditional AI models rely on cloud servers, offering robust capabilities but introducing delays and privacy risks. Existing …

FastVLM: Revolutionizing AI Efficiency in Vision-Language Models for Real-World Deployment

7 months ago 高效码农

FastVLM: Revolutionizing Efficient Vision Encoding for Vision Language Models Introduction: Redefining Efficiency in Multimodal AI In the intersection of computer vision and natural language processing, Vision Language Models (VLMs) are driving breakthroughs in multimodal artificial intelligence. However, traditional models face critical challenges when processing high-resolution images: excessive encoding time and overproduction of visual tokens, which severely limit real-world responsiveness and hardware compatibility. FastVLM, a groundbreaking innovation from Apple’s research team, introduces the FastViTHD vision encoder architecture, achieving 85x faster encoding speeds and 7.9x faster Time-to-First-Token (TTFT), setting a new industry benchmark for efficiency. Core Innovations: Three Technical Breakthroughs 1. FastViTHD …