Long Context LLMsarchive | Efficient Coder

How to Adapt Full-Attention LLMs to Sliding Window Attention: The SWAA Practical Guide

1 months ago 高效码农

How to Adapt Full-Attention LLMs to Sliding Window Attention: A Practical Guide to SWAA Featured Snippet Summary Sliding Window Attention Adaptation (SWAA) is a practical toolkit for adapting full-attention pretrained large language models (LLMs) to sliding window attention (SWA) without expensive pretraining. It combines five methods—prefill-only SWA, sink token preservation, layer interleaving, chain-of-thought prompting, and fine-tuning—to reduce long-context inference costs to linear complexity while recovering most original performance on models like Qwen3 and Llama. Why Sliding Window Attention Matters for Long-Context LLMs If you’ve ever tried running a large language model on a really long prompt—say, analyzing a full book …