Bailin Wang

Researcher at Apple AI/ML, working on the pretraining of foundation models.

Previously, I was a postdoc at MIT and obtained my PhD from the University of Edinburgh. I worked on semantic parsing and machine translation.

I’m currently interested in algorithmically improving the efficiency of sequence models to enable capabilities such as:

news

Jun 20, 2025	Your sliding window size of local attention can be reduced to 512, see preprint.
Jan 22, 2025	We have open-sourced Jax/Pallas implementatins of Mamba/Mamba2 via axlearn

Jun 14, 2024	The End of Training Log