Wanpeng Zhang (张万鹏)

I am a Ph.D. candidate at Peking University, advised by Prof. Zongqing Lu. My research interests lie in Foundation Models, Reinforcement Learning, and Robotics. I am currently working on post-training and infrastructure at BeingBeyond, a startup focused on building foundation models for embodied intelligence. For more details, please see my CV or Chinese CV. I am also hiring self-motivated interns to work at BeingBeyond. Feel free to reach out by email if you are interested in collaborating.

/////

Selected Publication

(For the full publications, please see my Google Scholar.)

1. Robotics

arXiv'26.05

Being-H0.7: A Latent World-Action Model from Egocentric Videos

Wanpeng Zhang*, BeingBeyond Team. (*Co-first Author, Core Contributor.)

Being-H0.7 is a latent world-action model that scales egocentric video pretraining into future-aware robot control.

Blog/Paper/Bib/GitHub

arXiv'26.03

Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

Wanpeng Zhang, Hao Luo, Sipeng Zheng, Yicheng Feng, Haiweng Xu, Ziheng Xi, Chaoyi Xu, Haoqi Yuan, Zongqing Lu

PTR performs reward-free offline policy improvement by conservatively reweighting offline data according to whether an action leads to an identifiable downstream outcome.

Blog/Paper/Bib/GitHub

CVPR'26

Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

Yicheng Feng, Wanpeng Zhang, Ye Wang, Hao Luo, Haoqi Yuan, Sipeng Zheng, Zongqing Lu.

We introduce VIPA-VLA, which learns 2D-to-3D visual-physical grounding from human videos, enabling VLA with stronger spatial understanding and generalization.

Blog/Paper/Bib/GitHub

CVPR'26

Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild

Hao Luo, Ye Wang, Wanpeng Zhang, Haoqi Yuan, Yicheng Feng, Haiweng Xu, Sipeng Zheng, Zongqing Lu.

JALA scales VLA pretraining by aligning predictive embeddings with inverse dynamics to learn a unified latent action space from both labeled and unannotated human videos.

Blog/Paper/Bib/GitHub

arXiv'26.02

Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization

Ye Wang*, Sipeng Zheng*, Hao Luo*, Wanpeng Zhang*, Haoqi Yuan, Chaoyi Xu, Haiweng Xu, Yicheng Feng, Mingyang Yu, Zhiyu Kang, Zongqing Lu, Qin Jin. (*Co-first Author.)

A controlled study of VLA scaling shows EEF-relative alignment is the most robust action default, naive heterogeneous data pooling can cause destructive interference.

Blog/Paper/Bib/GitHub

arXiv'26.01

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Wanpeng Zhang*, BeingBeyond Team. (*Co-first Author, Core Contributor.)

Being-H0.5 is a foundational VLA model that scales human-centric learning with a unified action space to enable robust cross-embodiment robot control.

Blog/Paper/Bib/GitHub/Hugging Face

ICML'26

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

Wanpeng Zhang*, BeingBeyond Team. (*Co-first Author, Core Contributor.)

We introduce Being-H0, the first dexterous Vision-Language-Action model pretrained from large-scale human videos via explicit hand motion modeling.

Blog/Paper/Bib/GitHub/Hugging Face

arXiv'25.12

DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models

Wanpeng Zhang, Ye Wang, Hao Luo, Haoqi Yuan, Yicheng Feng, Sipeng Zheng, Qin Jin, Zongqing Lu.

DiG-Flow is a plug-and-play module for flow-matching based VLAs that rebalances control between the autoregressive foundation model and the flow expert.

Blog/Paper/Bib/GitHub

2. MLLM

NeurIPS'25

OpenMMEgo: Enhancing Egocentric Understanding for LMMs with Open Weights and Data

Hao Luo, Zihao Yue, Wanpeng Zhang, Yicheng Feng, Sipeng Zheng, Deheng Ye, Zongqing Lu.

OpenMMEgo enhances egocentric video understanding through a multi-level synthetic dataset, semantic-aware visual token compression to handle viewpoint shifts, and curriculum learning for stable training.

Paper/Bib/GitHub

ICCV'25, Highlight

Unified Multimodal Understanding via Byte-Pair Visual Encoding

Wanpeng Zhang, Yicheng Feng, Hao Luo, Yijiang Li, Zihao Yue, Sipeng Zheng, Zongqing Lu.

Building upon the visual BPE tokenizer proposed in the previous work, we further designed a complete training framework and our Being-VL-0.5 model.

Blog/Paper/Bib/GitHub/Link

ICCV'25

VideoOrion: Tokenizing Object Dynamics in Videos

Yicheng Feng, Yijiang Li, Wanpeng Zhang, Hao Luo, Zihao Yue, Sipeng Zheng, Zongqing Lu.

VideoOrion encodes videos with a two-branch design, using object tokens from a detect-segment-track pipeline to capture object dynamics alongside scene context.

Paper/Bib/Link

ICLR'25

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

Wanpeng Zhang, Zilong Xie, Yicheng Feng, Yijiang Li, Xingrun Xing, Sipeng Zheng, Zongqing Lu.

We propose BPE Tokenizer for images, enabling Transformers to learn and align multi-modal information more effectively, providing a new learning paradigm for Unified MLLMs.

Blog/Paper/Bib/GitHub/Link

3. RL & Agent

NAACL'25

LLM-Based Explicit Models of Opponents for Multi-Agent Games

Xiaopeng Yu, Wanpeng Zhang, Zongqing Lu.

We propose EMO, a method that models each opponent individually using LLMs with iterative self- and global-refinement for better multi-agent reasoning.

Paper/Bib/Link

ICML'24

Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu.

By adaptively learning the causal relationship joint graph in the environment and providing representations with causal relationships, RL algorithms can effectively tackle non-stationarities.

Paper/Bib/GitHub/Link

NAACL'24

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Wanpeng Zhang, Zongqing Lu.

We propose AdaRefiner to achieve the co-learning of LLMs and RL agents by enabling them to provide feedback to each other, optimizing both perception and decision-making capabilities.

Paper/Bib/GitHub/Link

ICML'23

Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning

Ziluo Ding*, Wanpeng Zhang*, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu. (*Equal Contribution.)

We propose EnDi framework, achieving agent goal division and collaboration enhancement in multi-agent systems through language and entity binding.

Paper/Bib/GitHub/Link

NeurIPS'22

Model-Based Opponent Modeling

Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu.

MBOM uses environment models to recursively simulate and mix imagined opponent policies for adaptive opponent modeling.

Paper/Bib/GitHub/Link

Education

Peking University. (Beijing, China. Sep 2022 — Jun 2026 (Expected))
- Ph.D. Candidate in Computer Science.
- Research Interest: Foundation Models / Reinforcement Learning / Robotics
Tsinghua University. (Beijing, China. Sep 2019 — Jun 2022)
- M.S. in Computer Science.
- Research Interest: Reinforcement Learning
Nankai University. (Tianjin, China. Sep 2015 — Jun 2019)
- B.S. in Mathematics.
- Research Interest: Mathematics / Machine Learning

Work Experience

BeingBeyond. (Beijing, China. Mar 2025 — Present)
- Startup.
- VLA / World Model / RL / Robotics
Beijing Academy of Artificial Intelligence. (Beijing, China. May 2024 — Mar 2025)
- Research Intern.
- VLM / RL / Robotics
Tencent AI Lab (Rhino-bird Program).
- Research Intern. (Shenzhen, China. Jun 2020 — Jul 2021)
- Reinforcement Learning

Patent

Multimodal data processing method, device, storage medium, and electronic equipment. (CN119226992B)
- Zongqing Lu, Wanpeng Zhang.
- Link / PDF / Certificate
Method, device and equipment for determining parameters and storage medium. (CN112527104A)
- Wanpeng Zhang, Dijun Luo, Xi Xiao.
- Link / PDF

Award

National Scholarship. (2025)
Top 10 Students at the National Engineering Research Center of Visual Technology. (2025)
Merit Student of Peking University. (2025)
Presidential Scholarship of Peking University. (2024)
Award for Scientific Research of Peking University. (2024)
Rhino-bird Talent Program of Tencent. (2021)
Mathematical Contest in Modeling (MCM/ICM), Meritorious Winner (First Prize). (2017)
China Undergraduate Mathematical Contest in Modeling (CUMCM), Second Prize. (2016)
National High School Mathematics Competition, Second Prize. (2014)

Service

Conference Reviewer
- ICML / NeurIPS / ICLR / CVPR / ICCV / ECCV / ICRA / AAAI / AISTATS / BMVC
Journal Reviewer
- TNNLS / TIST / RAL
Teaching Assistant
- Deep Reinforcement Learning, Peking University. (Spring, 2025)