Lightning Attention-2: Unlimited Sequence Lengths with Constant Compute Cost

Lightning Attention-2: Unlimited Sequence Lengths with Constant Compute Cost

Lightning Attention-2 is a novel linear attention mechanism that aligns the training and inference costs of long sequences with those of a 1K sequence length. The limitations on sequence length in large language models significantly constrain their applications in artificial intelligence, such as multi-turn dialogue, long text understanding, and the processing and generation of multimodal … Read more