学术论文

80 篇多智能体强化学习、LLM推理和机器学习系统领域论文。 Google Scholar →

2026

Learning to Reason in Structured In-context Environments with Reinforcement Learning

Peng Yu, Zeyuan Zhao, Shao Zhang, Luoyi Fu, Xinbing Wang, Ying Wen

ICLR [arXiv]

MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety

Xiaoyu Wen, Zhida He, Han Qi, Ziyu Wan, Zhongtian Ma, Ying Wen, Tianhang Zheng, Xingcheng Xu, Chaochao Lu, Qiaosheng Zhang

arXiv Preprint [arXiv]

Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory

Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhuo Li, Yujie Zheng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, Muning Wen

arXiv Preprint [arXiv]

Offline Fictitious Self-Play for Competitive Games

J Chen, W Xie, W Zhang, Y Wen

AAAI 2026 [arXiv]

Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity

Yingxuan Yang, Chengrui Qu, Muning Wen, Laixi Shi, Ying Wen, Weinan Zhang, Adam Wierman, Shangding Gu

arXiv Preprint [arXiv]

2025

A survey of ai agent protocols

Yingxuan Yang, Huacan Chai, Yuanyi Song, Siyuan Qi, Muning Wen, Ning Li, Junwei Liao, Haoyi Hu, Jianghao Lin, Gaowei Chang, Weiwen Liu, Ying Wen, Yong Yu, Weinan Zhang

arXiv Preprint [arXiv]

Agentic web: Weaving the next web with ai agents

Yingxuan Yang, Mulei Ma, Yuxuan Huang, Huacan Chai, Chenyu Gong, Haoran Geng, Yuanjian Zhou, Ying Wen, Meng Fang, Muhao Chen, Shangding Gu, Ming Jin, Costas Spanos, Yang Yang, Pieter Abbeel, Dawn Song, Weinan Zhang, Jun Wang

arXiv Preprint [arXiv]

Agent exchange: Shaping the future of AI agent economics

Yingxuan Yang, Ying Wen, Jun Wang, Weinan Zhang

arXiv Preprint [arXiv]

AT-Drone: Benchmarking Adaptive Teaming in Multi-Drone Pursuit

Y Li, J Chen, F Xue, J Qiu, W Li, Q Zhang, Y Wen, W Pan

CoRL 2025 [arXiv]

Embodied arena: A comprehensive, unified, and evolving evaluation platform for embodied ai

Fei Ni, Min Zhang, Pengyi Li, Yifu Yuan, Lingfeng Zhang, Yuecheng Liu, Peilong Han, Longxin Kou, Shaojin Ma, Jinbin Qiao, David Gamaliel Arcos Bravo, Yuening Wang, Xiao Hu, Zhanguang Zhang, Xianze Yao, Yutong Li, Zhao Zhang, Ying Wen, Ying-Cong Chen, Xiaodan Liang, Liang Lin, Bin He, Haitham Bou-Ammar, He Wang, Huazhe Xu, Jiankang Deng, Shan Luo, Shuqiang Jiang, Wei Pan, Yang Gao, Stefanos Zafeiriou, Jan Peters, Yuzheng Zhuang, Yingxue Zhang, Yan Zheng, Hongyao Tang, Jianye Hao

arXiv Preprint [arXiv]

Language Games as the Pathway to Artificial Superhuman Intelligence

Ying Wen, Ziyu Wan, Shao Zhang

arXiv Preprint [arXiv]

Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration

Shao Zhang, Xihuai Wang, Wenhao Zhang, Chaoran Li, Junru Song, Tingyu Li, Lin Qiu, Xuezhi Cao, Xunliang Cai, Wen Yao, Weinan Zhang, Xinbing Wang, Ying Wen

ACL [arXiv]

Ml-master: Towards ai-for-ai via integration of exploration and reasoning

Zexi Liu, Yuzhu Cai, Xinyu Zhu, Yujie Zheng, Runkun Chen, Ying Wen, Yanfeng Wang, Weinan E, Siheng Chen

arXiv Preprint [arXiv]

PMAT: Optimizing Action Generation Order in Multi-Agent Reinforcement Learning

Kun Hu, Muning Wen, Xihuai Wang, Shao Zhang, Yiwei Shi, Minne Li, Minglong Li, Ying Wen

AAMAS [arXiv]

Progra: Progress-Aware Reinforcement Learning for Multi-Turn Function Calling

Haochen Chai, Zhicheng Cao, Mingxuan Ran, Yiming Yang, Jiawei Lin, Ruizhi Ding, Ziyu Wan, Muning Wen, Weinan Liu, Ying Wen

Preprint

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors

Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang

AAAI [arXiv]

Rema: Learning to Meta-Think for LLMs with Multi-Agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen, Yan Song, Hanjing Wang, Linyi Yang, Mark Schmidt, Jun Wang, Weinan Zhang, Shuyue Hu, Ying Wen

NeurIPS [arXiv]

RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations

J Chen, X Li, J Cao, Z Zhu, W Dong, M Liu, Y Wen, Y Yu, L Zhang

ICC 2025 [arXiv]

Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning

J Zhu, C Zheng, J Lin, K Du, Y Wen, Y Yu, J Wang, W Zhang

ACL 2025 [arXiv]

Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse

Z Zhao, C Li, S Zhang, Y Wen

arXiv Preprint [arXiv]

Retrieval dexterity: Efficient object retrieval in clutters with dexterous hand

F Bai, Y Li, J Chu, T Chou, R Zhu, Y Wen, Y Yang, Y Chen

arXiv Preprint [arXiv]

STAR: Efficient Preference-based Reinforcement Learning via Dual Regularization

Muning Wen, Ziyu Wan, Weinan Zhang, Jun Wang, Ying Wen

NeurIPS

Towards Monotonic Improvement in In-Context Reinforcement Learning

W Zhang, S Zhang, X Wang, Y Li, Y Wen

arXiv Preprint [arXiv]

Thinkbench: Dynamic out-of-distribution evaluation for robust llm reasoning

S Huang, L Yang, Y Song, S Chen, L Cui, Z Wan, Q Zeng, Y Wen, K Shao

NeurIPS 2025 [arXiv]

ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling

J Lin, Y Shi, X Peng, R Ding, H Wang, Y Peng, B Bai, W Song, F Bai

arXiv Preprint [arXiv]

Unlocking the potential of decentralized llm-based mas: Privacy preservation and monetization in collective intelligence

Y Yang, Q Peng, J Wang, Y Wen, W Zhang

Proceedings of the 24th International Conference on Autonomo [arXiv]

2024

Agent Exchange: An Auction Platform for AI Agent Marketplaces

Y Yang, Y Wen, J Wang, W Zhang

Preprint [arXiv]

Aligning Individual and Collective Objectives in Multi-Agent Cooperation

Yang Li, Wenhao Zhang, Jianhong Wang, Shao Zhang, Yali Du, Ying Wen, Wei Pan

NeurIPS [arXiv]

Cooperative Open-ended Learning Framework for Zero-Shot Coordination

Yang Li, Shao Zhang, Jichen Sun, Yali Du, Ying Wen, Xinbing Wang, Wei Pan

ICML [arXiv]

Conflux-PSRO: Effectively leveraging collective advantages in policy space response oracles

Yucong Huang, Jiesong Lian, Mingzhi Wang, Chengdong Ma, Ying Wen

arXiv Preprint [arXiv]

Cross-Utterance Conditioned VAE for Speech Generation

Y Li, C Yu, G Sun, W Zu, Z Tian, Y Wen, W Pan, C Zhang, J Wang, Y Yang

IEEE TASLP [arXiv]

Critic-Guided Decision Transformer for Offline Reinforcement Learning

Yuanfu Wang, Chao Yang, Ying Wen*, Yu Liu, Yu Qiao

AAAI [arXiv]

Controlling large language model-based agents for large-scale decision-making: An actor-critic approach

B Zhang, H Mao, J Ruan, Y Wen, Y Li, S Zhang, Z Xu, D Li, Z Li, R Zhao

ICLR 2024 [arXiv]

Efficient model-agnostic alignment via bayesian persuasion

Fengshuo Bai, Mingzhi Wang, Zhaowei Zhang, Boyuan Chen, Yinda Xu, Ying Wen, Yaodong Yang

arXiv Preprint [arXiv]

Efficient preference-based reinforcement learning via aligned experience estimation

F Bai, R Zhao, H Zhang, S Cui, Y Wen, Y Yang, B Xu, L Han

NeurIPS 2025 [arXiv]

DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

S Guo, C Deng, Y Wen, H Chen, Y Chang, J Wang

ICML 2024 [arXiv]

Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement

Muning Wen, Junwei Liao, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen

arXiv Preprint [arXiv]

Fusion-psro: Nash policy fusion for policy space response oracles

Jiesong Lian, Yucong Huang, Chengdong Ma, Mingzhi Wang, Ying Wen, Long Hu, Yixue Hao

arXiv Preprint [arXiv]

HOLA-Drone: Hypergraphic Open-ended Learning for Zero-Shot Multi-Drone Cooperative Pursuit

Yang Li, Dengyu Zhang, Junfan Chen, Ying Wen, Qingrui Zhang, Shaoshuai Mou, Wei Pan

arXiv Preprint [arXiv]

KaLM: Knowledge-aligned autoregressive language modeling via dual-view knowledge graph contrastive learning

Peng Yu, Cheng Deng, Beiya Dai, Xinbing Wang, Ying Wen

arXiv Preprint [arXiv]

Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games

Naming Liu, Mingzhi Wang, Youzhi Zhang, Yaodong Yang, Bo An, Ying Wen

arXiv Preprint [arXiv]

Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task

Shao Zhang, Xihuai Wang, Wenhao Zhang, Yongshan Chen, Lian Gao, Dong Wang, Weinan Zhang, Xinbing Wang, Ying Wen

arXiv Preprint [arXiv]

Natural language reinforcement learning

X Feng, B Liu, Y Song, H Fu, Z Wan, GA Koushik, Z Hu, M Yang, Y Wen

arXiv Preprint [arXiv]

Open-Ended Learning in General-Sum Games: The Role of Diversity in Correlated Equilibrium

Z Zhao, M Wen, Y Wen, Y Yang

Preprint [arXiv]

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

OpenR Team, Ying Wen

Preprint [Code]

Reinforcing Language Agents via Policy Optimization with Action Decomposition

Muning Wen, Ziyu Wan, Weinan Zhang, Jun Wang, Ying Wen

NeurIPS [arXiv]

Reinforcing LLM Agents via Policy Optimization with Action Decomposition

M Wen, Z Wan, J Wang, W Zhang, Y Wen

The Thirty-eighth Annual Conference on Neural Information Pr [arXiv]

Tackling cooperative incompatibility for zero-shot human-ai coordination

Y Li, S Zhang, J Sun, W Zhang, Y Du, Y Wen, X Wang, W Pan

JAIR 2024 [arXiv]

AlphaZero-like Tree-Search can Guide Large Language Model Decoding and Training

Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus McAleer, Ying Wen, Weinan Zhang, Jun Wang

TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision

R Zhou, Y Yang, M Wen, Y Wen, W Wang, C Xi, G Xu, Y Yu, W Zhang

SIGIR 2024 [arXiv]

ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination

Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, Weinan Zhang

NeurIPS [Code]

2023

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, J. Wang, Yaodong Yang, Luo Mai

Offline Pre-trained Multi-agent Decision Transformer

Linghui Meng, Muning Wen, Chenyang Le, Xiyun Li, Dengpeng Xing, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Yaodong Yang, Bo Xu

MIR [arXiv]

Order Matters: Agent-by-agent Policy Optimization

Xihuai Wang, Zheng Tian, Ziyu Wan, Ying Wen, J. Wang, Weinan Zhang

ICLR [arXiv]

2022

Greedy when sure and conservative when uncertain about the opponents

H Fu, Y Tian, H Yu, W Liu, S Wu, J Xiong, Y Wen, K Li, J Xing, Q Fu

International Conference on Machine Learning, 6829-6848, 202 [arXiv]

Multi-agent feedback enabled neural networks for intelligent communications

F Sun, Y Li, Y Wen, J Hu, J Wang, Y Yang, K Li

IEEE TWC [arXiv]

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Muning Wen, J. Kuba, Runji Lin, Weinan Zhang, Ying Wen, J. Wang, Yaodong Yang

NeurIPS [arXiv] [Code]

2021

A Game-Theoretic Approach to Multi-Agent Trust Region Optimization

Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang

International Conference on Distributed Artificial Intelligence [arXiv]

🏆 Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, D. Graves, H. Ammar, Jun Wang, Matthew E. Taylor

Adaptive Agents and Multi-Agent Systems Best Blue-Sky Paper Award [arXiv]

Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games

Xidong Feng, Oliver Slumbers, Yaodong Yang, Ziyu Wan, Bo Liu (Benjamin Liu), S. McAleer, Ying Wen, Jun Wang

Preprint [arXiv]

Learning in Nonzero-Sum Stochastic Games with Potentials

D. Mguni, Yutong Wu, Yali Du, Yaodong Yang, Ziyi Wang, Minne Li, Ying Wen, Joel Jennings, Jun Wang

ICML [arXiv]

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang

Journal of machine learning research [arXiv] [Code]

Modelling behavioural diversity for learning in open-ended games

N Perez-Nieves, Y Yang, O Slumbers, DH Mguni, Y Wen, J Wang

International conference on machine learning, 8514-8524, 202 [arXiv]

Neural Auto-Curricula in Two-Player Zero-Sum Games

Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu (Benjamin Liu), S. McAleer, Ying Wen, Jun Wang, Yaodong Yang

NeurIPS [arXiv]

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

NeurIPS [arXiv]

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

J. Kuba, Ruiqing Chen, Munning Wen, Ying Wen, Fanglei Sun, Jun Wang, Yaodong Yang

ICLR [arXiv]

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

Preprint [arXiv]

2020

Multi-Agent Determinantal Q-Learning

Yaodong Yang, Ying Wen, Lihuan Chen, Jun Wang, Kun Shao, D. Mguni, Weinan Zhang

ICML [arXiv]

🏆 SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving

Ming Zhou, Jun Luo, Julian Villela, Yaodong Yang, David Rusu, J. Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Chong-ping Huang, Ying Wen, Kimia Hassanzadeh, D. Graves, Zhengbang Zhu, Yihan Ni, Nhat M. Nguyen, Mohamed Elsayed, H. Ammar, A. Cowen-Rivers, S. Ahilan, Zheng Tian, Daniel Palenicek, Kasra Rezaee, Peyman Yadmellat, Kun Shao, Dong Chen, Baokuan Zhang, Hongbo Zhang, Jianye Hao, Wulong Liu, Jun Wang

CoRL Best System Paper Award [Code]

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

Ming Zhou, Jun Luo, Julian Villela, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, D. Graves, Dong Chen, Zhengbang Zhu, Nhat M. Nguyen, M. ElSayed, Kun Shao, S. Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat, Mohsen Rohani, Nicolas Perez Nieves, Yihan Ni, Seyedershad Banijamali, Alexander Cowen Rivers, Zheng Tian, Daniel Palenicek, H. Ammar, Hongbo Zhang, Wulong Liu, Jianye Hao, Jun Wang

Preprint [arXiv] [Code]

2019

A Regularized Opponent Model with Maximum Entropy Objective

Zheng Tian, Ying Wen, Zhichen Gong, Faiz Punakkath, Shihao Zou, Jun Wang

IJCAI 2019 [arXiv]

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning

Ying Wen, Yaodong Yang, Jun Wang

IJCAI [arXiv]

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning

Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan

ICLR [arXiv]

2018

Factorized Q-learning for large-scale multi-agent systems

Yong Chen, M. Zhou, Ying Wen, Yaodong Yang, Yufeng Su, Weinan Zhang, Dell Zhang, Jun Wang, Han Liu

International Conference on Distributed Artificial Intelligence [arXiv]

2017

A Study of AI Population Dynamics with Million-agent Reinforcement Learning

Yaodong Yang, Lantao Yu, Yiwei Bai, Ying Wen, Weinan Zhang, Jun Wang

Adaptive Agents and Multi-Agent Systems [arXiv]

Learning to Design Games: Strategic Environments in Deep Reinforcement Learning

Haifeng Zhang, Jun Wang, Zhiming Zhou, Weinan Zhang, Ying Wen, Yong Yu, Wenxin Li

IJCAI [arXiv]

Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games

Peng Peng, Quan Yuan, Ying Wen, Yaodong Yang, Zhenkun Tang, Haitao Long, Jun Wang

Preprint [arXiv]

Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games

Peng Peng, Ying Wen, Yaodong Yang, Quan Yuan, Zhenkun Tang, Haitao Long, Jun Wang

Preprint [arXiv]

2016

Learning text representation using recurrent convolutional neural network with highway layers

Ying Wen, Weinan Zhang, Rui Luo, Jun Wang

SIGIR 2016 [arXiv]

Product-based neural networks for user response prediction

Y Qu, H Cai, K Ren, W Zhang, Y Yu, Y Wen, J Wang

ICDM 2016 [arXiv]