Machine Learning Optimizationarchive

Scaling AI Agents: When More Models Hurt Performance & The Formula to Predict It

1 months ago 高效码农

Scaling AI Agents: When Adding More Models Hurts Performance “ Core question: Does adding more AI agents always improve results? Short answer: Only when the task is parallelizable, tool-light, and single-agent accuracy is below ~45%. Otherwise, coordination overhead eats all gains. What This Article Answers How can you predict whether multi-agent coordination will help or hurt before you deploy? What do 180 controlled configurations across finance, web browsing, planning, and office workflows reveal? Which practical checklist can you copy-paste into your next design doc? 1 The Setup: 180 Experiments, One Variable—Coordination Structure Summary: Researchers locked prompts, tools, and token budgets, …

How Budget-Aware Search Agents Break Performance Ceilings (BATS Framework)

1 months ago 高效码农

Running on a Budget, Yet Smarter—How “Money-Wise” Search Agents Break the Performance Ceiling Keywords: budget-aware tool use, test-time scaling, search agent, BATS, Budget Tracker, cost-performance Pareto frontier Opening: Three Quick Questions Hand an agent 100 free search calls—will it actually use them? If it stops at 30 and calls it a day, will more budget move the accuracy needle? Can we teach the machine to check its wallet before every click? A new joint study by Google, UCSB and NYU says YES. “Simply letting the model see the remaining balance pushes accuracy up while keeping the tab unchanged—or even smaller.” …

Unlocking 3x Faster LLM Inference on MacBooks: The KVSplit Quantization Breakthrough

8 months ago 高效码农

Efficient LLM Inference on Apple Silicon: The KVSplit Breakthrough Introduction: Redefining Memory Constraints with Smart Quantization KV Cache Memory Comparison Running large language models (LLMs) on consumer MacBooks has long faced two critical challenges: memory limitations for long contexts and sluggish inference speeds. Traditional solutions forced trade-offs between precision and performance – until KVSplit introduced differentiated key-value quantization. This groundbreaking approach achieves: • 72% memory reduction • 3x longer context handling • 8% faster inference • <1% quality loss This deep dive explores the technical implementation, empirical results, and practical applications of this paradigm-shifting technology. Core Innovation: Why Treat Keys …