
About Me (Jiachen Zhu / 朱家晨)
I am a Research Scientist at Skild AI, where I am currently working on building VLA/VA models with strong vision capabilities. If you are interested in an internship or full-time role at Skild AI, feel free to reach out!
Prior to joining Skild, I received my PhD from NYU in July 2025, advised by Yann LeCun. I also spent an extended period at FAIR, Meta AI, where I worked closely with Jing Li, Yubei Chen, and Zhuang Liu.
Education
- PhD, Computer Science, New York University, 2020 - 2025
- MSc, Computer Science, New York University, 2018 - 2020
- BSc, Computer Science, The Hong Kong Polytechnic University, 2010 - 2015
Research Interests
My current research focuses on leveraging in-the-wild videos to improve VLA models, particularly their vision capabilities. I am also interested in building an agentic layer on top of VLA models to accomplish long-horizon tasks that require memory and planning.
During my PhD, I worked on self-supervised learning for images and videos, as well as pretraining vision encoders for VLMs. I also maintain a broad interest in understanding the design principles behind various neural network architectures.
Papers
For the most up-to-date paper list, please see my Google Scholar.
Transformers without Normalization
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Scaling Language-Free Visual Representation Learning
Variance-Covariance Regularization Improves Representation Learning
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Contact
jiachen DOT zhu AT nyu DOT edu
Appendix
Two ideas that I find both shockingly simple and extremely clever: 1, 2