LEGO: The Open-Source Framework That Turns AI Loops into Silicon—No RTL Templates Required

23 days ago 高效码农

Keywords: LEGO accelerator, automatic RTL generation, spatial accelerator, tensor applications, AI chip design, Gemmini comparison, data-flow fusion, MIT Han Lab TL;DR LEGO is an open-source toolchain released by MIT Han Lab in 2025. Feed it a plain tensor loop (GEMM, Conv2D, Attention, MTTKRP) and it returns production-grade Verilog—no human-written templates, no HLS headaches. On a 28 nm test chip LEGO beats the state-of-the-art Gemmini generator by 3.2× speed and 2.4× energy while using the same MAC count and on-chip memory. What you will learn in 12 minutes Why even Google still hand-tunes TPU blocks—and where that hurts How LEGO removes …

DeepSeek UE8M0 FP8 Optimization: Revolutionizing Domestic AI-Semiconductor Synergy

1 months ago 高效码农

DeepSeek UE8M0 FP8 Optimization: A Critical Breakthrough in the Synergy Between Domestic AI and Semiconductors In today’s rapidly evolving field of artificial intelligence (AI), the efficiency of model training and the cost of deployment have become core concerns for the industry. Floating-point numbers— the fundamental way computers process decimals— play a direct role in determining an AI system’s precision, speed, and resource consumption. In recent years, low-precision floating-point formats, particularly 8-bit floating-point (FP8), have emerged as a key solution for balancing performance and efficiency. Among these innovations, the UE8M0 FP8 format developed by the Chinese team at DeepSeek stands out …