publications

2026

arXiv

SPLA: Block Sparse Plus Linear Attention for Long Context Modeling

Bailin Wang, Dan Friedman, Tao Lei, and Chong Wang

Preprint, 2026

arXiv

2025

arXiv

RAttention: Towards the Minimal Sliding Window Size in Local-Global Attention Models

Bailin Wang, Chang Lan, Chong Wang, and Ruoming Pang

Preprint, 2025

arXiv Code

2024

ICML

Gated linear attention transformers with hardware-efficient training

Songlin Yang*, Bailin Wang*, Yikang Shen, Rameswar Panda, and Yoon Kim

ICML, 2024

arXiv Code
ICML

In-Context Language Learning: Architectures and Algorithms

Ekin Akyürek, Bailin Wang, Yoon Kim, and Jacob Andreas

ICML, 2024

arXiv Code
NeurIPS

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, and Yoon Kim

NeurIPS, 2024

arXiv Code

2021

NeurIPS

Structured Reordering for Modeling Latent Alignments in Sequence Transduction

Bailin Wang, Mirella Lapata, and Ivan Titov

NeurIPS, 2021

arXiv Code