Bailin Wang
Researcher at Meta, working on multimodal and long-context modeling.
Previously: Apple, MIT, Microsoft and University of Edinburgh.
(Recurrent) Research Interests:
- long context modeling with memory,
- test-time training at scale.
I’m currently based in NYC. Feel free to reach out : )
news
| Jan 20, 2026 | We released a new sparse attention model for long context, see preprint. |
|---|---|
| Jul 17, 2025 | RAttention code is released here in Axlearn. |
| Jun 20, 2025 | Your sliding window size of local attention can be reduced to 512, see preprint. |
| Jan 22, 2025 | We have open-sourced Jax/Pallas implementatins of Mamba/Mamba2 via Axlearn |
selected publications
latest posts
| Jun 14, 2024 | The End of Training Log |
|---|