Jan 20, 2026 We released a new sparse attention model for long context, see
preprint.
Jul 17, 2025 RAttention code is released
here in Axlearn.
Jun 20, 2025 Your sliding window size of local attention can be reduced to 512, see
preprint.
Jan 22, 2025 We have open-sourced Jax/Pallas implementatins of Mamba/Mamba2 via
Axlearn