Bailin Wang

prof_pic.jpg

Researcher at Meta, working on multimodal and long-context modeling.

Previously: Apple, MIT, Microsoft and University of Edinburgh.

(Recurrent) Research Interests:

  • long context modeling with memory,
  • test-time training at scale.

I’m currently based in NYC. Feel free to reach out : )

news

Jan 20, 2026 We released a new sparse attention model for long context, see preprint.
Jul 17, 2025 RAttention code is released here in Axlearn.
Jun 20, 2025 Your sliding window size of local attention can be reduced to 512, see preprint.
Jan 22, 2025 We have open-sourced Jax/Pallas implementatins of Mamba/Mamba2 via Axlearn :sparkles:

selected publications

  1. SPLA: Block Sparse Plus Linear Attention for Long Context Modeling
    Bailin Wang, Dan Friedman, Tao Lei, and Chong Wang
    Preprint, 2026
  2. RAttention: Towards the Minimal Sliding Window Size in Local-Global Attention Models
    Bailin Wang, Chang Lan, Chong Wang, and Ruoming Pang
    Preprint, 2025
  3. Gated linear attention transformers with hardware-efficient training
    Songlin Yang*, Bailin Wang*, Yikang Shen, Rameswar Panda, and Yoon Kim
    ICML, 2024
  4. In-Context Language Learning: Architectures and Algorithms
    Ekin Akyürek, Bailin Wang, Yoon Kim, and Jacob Andreas
    ICML, 2024
  5. Parallelizing Linear Transformers with the Delta Rule over Sequence Length
    Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, and Yoon Kim
    NeurIPS, 2024
  6. Structured Reordering for Modeling Latent Alignments in Sequence Transduction
    Bailin Wang, Mirella Lapata, and Ivan Titov
    NeurIPS, 2021

latest posts