Dual Chunk Attention: The Training-Free Breakthrough for 100k+ Token LLMs

18 hours ago 高效码农

What is Dual Chunk Attention? by @karminski-dentist dual-chunk-attention-concept (Image source: Paper “Training-Free Long-Context Scaling of Large Language Models”) DCA (Dual Chunk Attention) is a technology developed by institutions including the University of Hong Kong in 2024. It’s a training-free method to expand the context window of large language models. This means models like Llama2 70B, which originally only support a 4k token context window, can now handle more than 100k tokens without the need for any ongoing training. In simple terms, think of a language model’s context window as the “memory” it has when processing text. If you’ve ever tried …