Jiachen Zhu

About Me (Jiachen Zhu / 朱家晨)

I am a Research Scientist at Skild AI, where I am currently working on building VLA/VA models with strong vision capabilities. If you are interested in an internship or full-time role at Skild AI, feel free to reach out!
Prior to joining Skild, I received my PhD from NYU in July 2025, advised by Yann LeCun. I also spent an extended period at FAIR, Meta AI, where I worked closely with Jing Li, Yubei Chen, and Zhuang Liu.

Education

Research Interests

My current research focuses on leveraging in-the-wild videos to improve VLA models, particularly their vision capabilities. I am also interested in building an agentic layer on top of VLA models to accomplish long-horizon tasks that require memory and planning.

During my PhD, I worked on self-supervised learning for images and videos, as well as pretraining vision encoders for VLMs. I also maintain a broad interest in understanding the design principles behind various neural network architectures.

Papers

For the most up-to-date paper list, please see my Google Scholar.

Transformers without Normalization

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Scaling Language-Free Visual Representation Learning

Variance-Covariance Regularization Improves Representation Learning

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

Masked Siamese ConvNets

TiCo: Transformation Invariance and Covariance Contrast for Self-Supervised Visual Representation Learning

Contact

jiachen DOT zhu AT nyu DOT edu

Appendix

My Favourite Illusion!

Two ideas that I find both shockingly simple and extremely clever: 1, 2