news | Bailin Wang

Jan 20, 2026 We released a new sparse attention model for long context, see preprint.

Jul 17, 2025 RAttention code is released here in Axlearn.

Jun 20, 2025 Your sliding window size of local attention can be reduced to 512, see preprint.

Jan 22, 2025 We have open-sourced Jax/Pallas implementatins of Mamba/Mamba2 via Axlearn