# Ying Wen Full LLM Context > Expanded context for LLMs, agents, and research assistants reading yingwen.io. Use /llms.txt for the shorter entry point. ## Identity - Name: Ying Wen - Chinese name: 温颖 - Current role: Tenure-Track Associate Professor, School of Artificial Intelligence, Shanghai Jiao Tong University - Additional role: Mentor, Shanghai Innovation Institute - Email: ying.wen@sjtu.edu.cn - Website: https://yingwen.io/ - Scholar: https://scholar.google.com/citations?user=_A1CxG8AAAAJ - GitHub: https://github.com/ying-wen ## Research Areas - Reinforcement learning: RL theory, credit assignment, process rewards, test-time scaling, and scalable training systems. - Multi-agent systems: recursive reasoning, game theory, population-based learning, diversity, zero-shot coordination, and platforms such as MALib and SMARTS. - Foundation models and LLM agents: tree-search reasoning, action decomposition, RL alignment, agent environments, OpenR, and agentic evaluation. ## Academic Service - DAI 2026 Program Committee Co-Chair. The DAI 2026 theme is "Agentic AI Goes Live — Science, Systems, and Societies." ## Site Index - https://yingwen.io/en/ - https://yingwen.io/zh/ - https://yingwen.io/en/publications/ - https://yingwen.io/zh/publications/ - https://yingwen.io/en/publications.md - https://yingwen.io/zh/publications.md - https://yingwen.io/en/projects/ - https://yingwen.io/zh/projects/ - https://yingwen.io/en/blog/ - https://yingwen.io/zh/blog/ - https://yingwen.io/en/art/ - https://yingwen.io/zh/art/ - https://yingwen.io/sitemap.xml ## Projects - OpenR: Open source framework for advanced reasoning with large language models. (https://github.com/openreasoner/openr) ## Blog Posts - 2026-05-07 [en] What Is a World Model Modeling? From Predicting the Future to Reusing Experience: https://yingwen.io/en/blog/what-is-a-world-model-modeling/ ; markdown: https://yingwen.io/en/blog/what-is-a-world-model-modeling.md - 2026-05-07 [zh] 世界模型到底在建模什么:从预测未来到复用经验: https://yingwen.io/zh/blog/what-is-a-world-model-modeling/ ; markdown: https://yingwen.io/zh/blog/what-is-a-world-model-modeling.md - 2026-05-06 [en] What Environment Do LLM Agents Actually Learn In?: https://yingwen.io/en/blog/what-environment-do-llm-agents-learn-in/ ; markdown: https://yingwen.io/en/blog/what-environment-do-llm-agents-learn-in.md - 2026-05-06 [zh] 大语言模型智能体到底在什么环境里学习?: https://yingwen.io/zh/blog/what-environment-do-llm-agents-learn-in/ ; markdown: https://yingwen.io/zh/blog/what-environment-do-llm-agents-learn-in.md - 2026-03-15 [en] When Agents Learn from the World, Not from Us: https://yingwen.io/en/blog/when-agents-learn-from-world/ ; markdown: https://yingwen.io/en/blog/when-agents-learn-from-world.md - 2026-03-15 [zh] 当智能体开始从世界中学习,而不是人类: https://yingwen.io/zh/blog/when-agents-learn-from-world/ ; markdown: https://yingwen.io/zh/blog/when-agents-learn-from-world.md ## Publications - 2026. MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety. Xiaoyu Wen, Zhida He, Han Qi, Ziyu Wan, Zhongtian Ma, Ying Wen, Tianhang Zheng, Xingcheng Xu, Chaochao Lu, Qiaosheng Zhang. arXiv Preprint. arXiv: https://arxiv.org/abs/2602.01539. - 2026. Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory. Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhuo Li, Yujie Zheng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, Muning Wen. arXiv Preprint. arXiv: https://arxiv.org/abs/2601.03192. - 2026. Offline Fictitious Self-Play for Competitive Games. J Chen, W Xie, W Zhang, Y Wen. AAAI 2026. arXiv: https://arxiv.org/abs/2407.11088. - 2026. Structured In-context Environment Scaling for Large Language Model Reasoning. Peng Yu, Zeyuan Zhao, Shao Zhang, Luoyi Fu, Xinbing Wang, Ying Wen. ICLR. arXiv: https://arxiv.org/abs/2509.23330v3. - 2026. Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity. Yingxuan Yang, Chengrui Qu, Muning Wen, Laixi Shi, Ying Wen, Weinan Zhang, Adam Wierman, Shangding Gu. arXiv Preprint. arXiv: https://arxiv.org/abs/2602.03794. - 2025. A survey of ai agent protocols. Yingxuan Yang, Huacan Chai, Yuanyi Song, Siyuan Qi, Muning Wen, Ning Li, Junwei Liao, Haoyi Hu, Jianghao Lin, Gaowei Chang, Weiwen Liu, Ying Wen, Yong Yu, Weinan Zhang. arXiv Preprint. arXiv: https://arxiv.org/abs/2504.16736. - 2025. Agent exchange: Shaping the future of AI agent economics. Yingxuan Yang, Ying Wen, Jun Wang, Weinan Zhang. arXiv Preprint. arXiv: https://arxiv.org/abs/2507.03904. - 2025. Agentic web: Weaving the next web with ai agents. Yingxuan Yang, Mulei Ma, Yuxuan Huang, Huacan Chai, Chenyu Gong, Haoran Geng, Yuanjian Zhou, Ying Wen, Meng Fang, Muhao Chen, Shangding Gu, Ming Jin, Costas Spanos, Yang Yang, Pieter Abbeel, Dawn Song, Weinan Zhang, Jun Wang. arXiv Preprint. arXiv: https://arxiv.org/abs/2507.21206. - 2025. AT-Drone: Benchmarking Adaptive Teaming in Multi-Drone Pursuit. Y Li, J Chen, F Xue, J Qiu, W Li, Q Zhang, Y Wen, W Pan. CoRL 2025. arXiv: https://arxiv.org/abs/2503.10027. - 2025. Embodied arena: A comprehensive, unified, and evolving evaluation platform for embodied ai. Fei Ni, Min Zhang, Pengyi Li, Yifu Yuan, Lingfeng Zhang, Yuecheng Liu, Peilong Han, Longxin Kou, Shaojin Ma, Jinbin Qiao, David Gamaliel Arcos Bravo, Yuening Wang, Xiao Hu, Zhanguang Zhang, Xianze Yao, Yutong Li, Zhao Zhang, Ying Wen, Ying-Cong Chen, Xiaodan Liang, Liang Lin, Bin He, Haitham Bou-Ammar, He Wang, Huazhe Xu, Jiankang Deng, Shan Luo, Shuqiang Jiang, Wei Pan, Yang Gao, Stefanos Zafeiriou, Jan Peters, Yuzheng Zhuang, Yingxue Zhang, Yan Zheng, Hongyao Tang, Jianye Hao. arXiv Preprint. arXiv: https://arxiv.org/abs/2509.15273. - 2025. Language Games as the Pathway to Artificial Superhuman Intelligence. Ying Wen, Ziyu Wan, Shao Zhang. arXiv Preprint. arXiv: https://arxiv.org/abs/2501.18924. - 2025. Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration. Shao Zhang, Xihuai Wang, Wenhao Zhang, Chaoran Li, Junru Song, Tingyu Li, Lin Qiu, Xuezhi Cao, Xunliang Cai, Wen Yao, Weinan Zhang, Xinbing Wang, Ying Wen. ACL. arXiv: https://arxiv.org/abs/2502.11882. - 2025. Ml-master: Towards ai-for-ai via integration of exploration and reasoning. Zexi Liu, Yuzhu Cai, Xinyu Zhu, Yujie Zheng, Runkun Chen, Ying Wen, Yanfeng Wang, Weinan E, Siheng Chen. arXiv Preprint. arXiv: https://arxiv.org/abs/2506.16499. - 2025. PMAT: Optimizing Action Generation Order in Multi-Agent Reinforcement Learning. Kun Hu, Muning Wen, Xihuai Wang, Shao Zhang, Yiwei Shi, Minne Li, Minglong Li, Ying Wen. AAMAS. arXiv: https://arxiv.org/abs/2302.09646. - 2025. Progra: Progress-Aware Reinforcement Learning for Multi-Turn Function Calling. Haochen Chai, Zhicheng Cao, Mingxuan Ran, Yiming Yang, Jiawei Lin, Ruizhi Ding, Ziyu Wan, Muning Wen, Weinan Liu, Ying Wen. Preprint. - 2025. RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors. Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang. AAAI. arXiv: https://arxiv.org/abs/2406.02902. - 2025. Rema: Learning to Meta-Think for LLMs with Multi-Agent Reinforcement Learning. Ziyu Wan, Yunxiang Li, Xiaoyu Wen, Yan Song, Hanjing Wang, Linyi Yang, Mark Schmidt, Jun Wang, Weinan Zhang, Shuyue Hu, Ying Wen. NeurIPS. arXiv: https://arxiv.org/abs/2411.16986. - 2025. Retrieval dexterity: Efficient object retrieval in clutters with dexterous hand. F Bai, Y Li, J Chu, T Chou, R Zhu, Y Wen, Y Yang, Y Chen. arXiv Preprint. arXiv: https://arxiv.org/abs/2502.18423. - 2025. Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning. J Zhu, C Zheng, J Lin, K Du, Y Wen, Y Yu, J Wang, W Zhang. ACL 2025. arXiv: https://arxiv.org/abs/2501.14539. - 2025. RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations. J Chen, X Li, J Cao, Z Zhu, W Dong, M Liu, Y Wen, Y Yu, L Zhang. ICC 2025. arXiv: https://arxiv.org/abs/2502.13134. - 2025. Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse. Z Zhao, C Li, S Zhang, Y Wen. arXiv Preprint. arXiv: https://arxiv.org/abs/2509.23778. - 2025. STAR: Efficient Preference-based Reinforcement Learning via Dual Regularization. Muning Wen, Ziyu Wan, Weinan Zhang, Jun Wang, Ying Wen. NeurIPS. - 2025. Thinkbench: Dynamic out-of-distribution evaluation for robust llm reasoning. S Huang, L Yang, Y Song, S Chen, L Cui, Z Wan, Q Zeng, Y Wen, K Shao. NeurIPS 2025. arXiv: https://arxiv.org/abs/2503.08532. - 2025. ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling. J Lin, Y Shi, X Peng, R Ding, H Wang, Y Peng, B Bai, W Song, F Bai. arXiv Preprint. arXiv: https://arxiv.org/abs/2510.14703. - 2025. Towards Monotonic Improvement in In-Context Reinforcement Learning. W Zhang, S Zhang, X Wang, Y Li, Y Wen. arXiv Preprint. arXiv: https://arxiv.org/abs/2509.23209. - 2025. Unlocking the potential of decentralized llm-based mas: Privacy preservation and monetization in collective intelligence. Y Yang, Q Peng, J Wang, Y Wen, W Zhang. Proceedings of the 24th International Conference on Autonomo. arXiv: https://arxiv.org/abs/2501.13828. - 2024. Agent Exchange: An Auction Platform for AI Agent Marketplaces. Y Yang, Y Wen, J Wang, W Zhang. Preprint. arXiv: https://arxiv.org/abs/2501.10448. - 2024. Aligning Individual and Collective Objectives in Multi-Agent Cooperation. Yang Li, Wenhao Zhang, Jianhong Wang, Shao Zhang, Yali Du, Ying Wen, Wei Pan. NeurIPS. arXiv: https://arxiv.org/abs/2402.12416. - 2024. AlphaZero-like Tree-Search can Guide Large Language Model Decoding and Training. Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus McAleer, Ying Wen, Weinan Zhang, Jun Wang. ICML. arXiv: https://arxiv.org/abs/2309.17179; code: https://github.com/waterhorse1/LLM_Tree_Search. - 2024. Conflux-PSRO: Effectively leveraging collective advantages in policy space response oracles. Yucong Huang, Jiesong Lian, Mingzhi Wang, Chengdong Ma, Ying Wen. arXiv Preprint. arXiv: https://arxiv.org/abs/2410.22776. - 2024. Controlling large language model-based agents for large-scale decision-making: An actor-critic approach. B Zhang, H Mao, J Ruan, Y Wen, Y Li, S Zhang, Z Xu, D Li, Z Li, R Zhao. ICLR 2024. arXiv: https://arxiv.org/abs/2311.13884. - 2024. Cooperative Open-ended Learning Framework for Zero-Shot Coordination. Yang Li, Shao Zhang, Jichen Sun, Yali Du, Ying Wen, Xinbing Wang, Wei Pan. ICML. arXiv: https://arxiv.org/abs/2302.04831. - 2024. Critic-Guided Decision Transformer for Offline Reinforcement Learning. Yuanfu Wang, Chao Yang, Ying Wen*, Yu Liu, Yu Qiao. AAAI. arXiv: https://arxiv.org/abs/2312.13716. - 2024. Cross-Utterance Conditioned VAE for Speech Generation. Y Li, C Yu, G Sun, W Zu, Z Tian, Y Wen, W Pan, C Zhang, J Wang, Y Yang. IEEE TASLP. arXiv: https://arxiv.org/abs/2309.04156. - 2024. DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning. S Guo, C Deng, Y Wen, H Chen, Y Chang, J Wang. ICML 2024. arXiv: https://arxiv.org/abs/2402.17453. - 2024. Efficient model-agnostic alignment via bayesian persuasion. Fengshuo Bai, Mingzhi Wang, Zhaowei Zhang, Boyuan Chen, Yinda Xu, Ying Wen, Yaodong Yang. arXiv Preprint. arXiv: https://arxiv.org/abs/2405.18718. - 2024. Efficient preference-based reinforcement learning via aligned experience estimation. F Bai, R Zhao, H Zhang, S Cui, Y Wen, Y Yang, B Xu, L Han. NeurIPS 2025. arXiv: https://arxiv.org/abs/2405.18688. - 2024. Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement. Muning Wen, Junwei Liao, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen. arXiv Preprint. arXiv: https://arxiv.org/abs/2402.06700. - 2024. Fusion-psro: Nash policy fusion for policy space response oracles. Jiesong Lian, Yucong Huang, Chengdong Ma, Mingzhi Wang, Ying Wen, Long Hu, Yixue Hao. arXiv Preprint. arXiv: https://arxiv.org/abs/2405.21027. - 2024. HOLA-Drone: Hypergraphic Open-ended Learning for Zero-Shot Multi-Drone Cooperative Pursuit. Yang Li, Dengyu Zhang, Junfan Chen, Ying Wen, Qingrui Zhang, Shaoshuai Mou, Wei Pan. arXiv Preprint. arXiv: https://arxiv.org/abs/2409.08767. - 2024. KaLM: Knowledge-aligned autoregressive language modeling via dual-view knowledge graph contrastive learning. Peng Yu, Cheng Deng, Beiya Dai, Xinbing Wang, Ying Wen. arXiv Preprint. arXiv: https://arxiv.org/abs/2412.04948. - 2024. Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games. Naming Liu, Mingzhi Wang, Youzhi Zhang, Yaodong Yang, Bo An, Ying Wen. arXiv Preprint. arXiv: https://arxiv.org/abs/2403.00255. - 2024. Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task. Shao Zhang, Xihuai Wang, Wenhao Zhang, Yongshan Chen, Lian Gao, Dong Wang, Weinan Zhang, Xinbing Wang, Ying Wen. arXiv Preprint. arXiv: https://arxiv.org/abs/2409.08811. - 2024. Natural language reinforcement learning. X Feng, B Liu, Y Song, H Fu, Z Wan, GA Koushik, Z Hu, M Yang, Y Wen. arXiv Preprint. arXiv: https://arxiv.org/abs/2411.14251. - 2024. Open-Ended Learning in General-Sum Games: The Role of Diversity in Correlated Equilibrium. Z Zhao, M Wen, Y Wen, Y Yang. Preprint. arXiv: https://arxiv.org/abs/2501.08315. - 2024. OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models. OpenR Team, Ying Wen. Preprint. code: https://github.com/openreasoner/openr. - 2024. Reinforcing Language Agents via Policy Optimization with Action Decomposition. Muning Wen, Ziyu Wan, Weinan Zhang, Jun Wang, Ying Wen. NeurIPS. arXiv: https://arxiv.org/abs/2410.21727. - 2024. Reinforcing LLM Agents via Policy Optimization with Action Decomposition. M Wen, Z Wan, J Wang, W Zhang, Y Wen. The Thirty-eighth Annual Conference on Neural Information Pr. arXiv: https://arxiv.org/abs/2410.21727. - 2024. Tackling cooperative incompatibility for zero-shot human-ai coordination. Y Li, S Zhang, J Sun, W Zhang, Y Du, Y Wen, X Wang, W Pan. JAIR 2024. arXiv: https://arxiv.org/abs/2312.07531. - 2024. TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision. R Zhou, Y Yang, M Wen, Y Wen, W Wang, C Xi, G Xu, Y Yu, W Zhang. SIGIR 2024. arXiv: https://arxiv.org/abs/2403.09631. - 2024. ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination. Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, Weinan Zhang. NeurIPS. code: https://github.com/sjtu-marl/ZSC-Eval. - 2023. GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models. Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, J. Wang, Yaodong Yang, Luo Mai. ICML. arXiv: https://arxiv.org/abs/2310.05205; code: https://github.com/bigrl-team/gear. - 2023. Offline Pre-trained Multi-agent Decision Transformer. Linghui Meng, Muning Wen, Chenyang Le, Xiyun Li, Dengpeng Xing, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Yaodong Yang, Bo Xu. MIR. arXiv: https://arxiv.org/abs/2112.10751. - 2023. Order Matters: Agent-by-agent Policy Optimization. Xihuai Wang, Zheng Tian, Ziyu Wan, Ying Wen, J. Wang, Weinan Zhang. ICLR. arXiv: https://arxiv.org/abs/2302.06205. - 2022. Greedy when sure and conservative when uncertain about the opponents. H Fu, Y Tian, H Yu, W Liu, S Wu, J Xiong, Y Wen, K Li, J Xing, Q Fu. International Conference on Machine Learning, 6829-6848, 202. arXiv: https://arxiv.org/abs/2206.01443. - 2022. Multi-agent feedback enabled neural networks for intelligent communications. F Sun, Y Li, Y Wen, J Hu, J Wang, Y Yang, K Li. IEEE TWC. arXiv: https://arxiv.org/abs/2205.10750. - 2022. Multi-Agent Reinforcement Learning is a Sequence Modeling Problem. Muning Wen, J. Kuba, Runji Lin, Weinan Zhang, Ying Wen, J. Wang, Yaodong Yang. NeurIPS. arXiv: https://arxiv.org/abs/2205.14953; code: https://github.com/PKU-MARL/Multi-Agent-Transformer. - 2021. A Game-Theoretic Approach to Multi-Agent Trust Region Optimization. Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang. International Conference on Distributed Artificial Intelligence. arXiv: https://arxiv.org/abs/2106.06828. - 2021. Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games. Xidong Feng, Oliver Slumbers, Yaodong Yang, Ziyu Wan, Bo Liu (Benjamin Liu), S. McAleer, Ying Wen, Jun Wang. Preprint. arXiv: https://arxiv.org/abs/2106.02745. - 2021. Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems. Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, D. Graves, H. Ammar, Jun Wang, Matthew E. Taylor. Adaptive Agents and Multi-Agent Systems. Award: Best Blue-Sky Paper Award. arXiv: https://arxiv.org/abs/2102.07659. - 2021. Learning in Nonzero-Sum Stochastic Games with Potentials. D. Mguni, Yutong Wu, Yali Du, Yaodong Yang, Ziyi Wang, Minne Li, Ying Wen, Joel Jennings, Jun Wang. ICML. arXiv: https://arxiv.org/abs/2103.09284. - 2021. MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning. Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang. Journal of machine learning research. arXiv: https://arxiv.org/abs/2106.07551; code: https://github.com/sjtu-marl/malib. - 2021. Modelling behavioural diversity for learning in open-ended games. N Perez-Nieves, Y Yang, O Slumbers, DH Mguni, Y Wen, J Wang. International conference on machine learning, 8514-8524, 202. arXiv: https://arxiv.org/abs/2103.07927. - 2021. Neural Auto-Curricula in Two-Player Zero-Sum Games. Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu (Benjamin Liu), S. McAleer, Ying Wen, Jun Wang, Yaodong Yang. NeurIPS. arXiv: https://arxiv.org/abs/2106.02745. - 2021. Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games. Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu. NeurIPS. arXiv: https://arxiv.org/abs/2106.04958. - 2021. Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning. J. Kuba, Ruiqing Chen, Munning Wen, Ying Wen, Fanglei Sun, Jun Wang, Yaodong Yang. ICLR. arXiv: https://arxiv.org/abs/2109.11251. - 2021. Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games. Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu. Preprint. arXiv: https://arxiv.org/abs/2106.04958. - 2020. Multi-Agent Determinantal Q-Learning. Yaodong Yang, Ying Wen, Lihuan Chen, Jun Wang, Kun Shao, D. Mguni, Weinan Zhang. ICML. arXiv: https://arxiv.org/abs/2006.01482. - 2020. SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving. Ming Zhou, Jun Luo, Julian Villela, Yaodong Yang, David Rusu, J. Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Chong-ping Huang, Ying Wen, Kimia Hassanzadeh, D. Graves, Zhengbang Zhu, Yihan Ni, Nhat M. Nguyen, Mohamed Elsayed, H. Ammar, A. Cowen-Rivers, S. Ahilan, Zheng Tian, Daniel Palenicek, Kasra Rezaee, Peyman Yadmellat, Kun Shao, Dong Chen, Baokuan Zhang, Hongbo Zhang, Jianye Hao, Wulong Liu, Jun Wang. CoRL. Award: Best System Paper Award. code: https://github.com/huawei-noah/SMARTS. - 2020. SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving. Ming Zhou, Jun Luo, Julian Villela, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, D. Graves, Dong Chen, Zhengbang Zhu, Nhat M. Nguyen, M. ElSayed, Kun Shao, S. Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat, Mohsen Rohani, Nicolas Perez Nieves, Yihan Ni, Seyedershad Banijamali, Alexander Cowen Rivers, Zheng Tian, Daniel Palenicek, H. Ammar, Hongbo Zhang, Wulong Liu, Jianye Hao, Jun Wang. Preprint. arXiv: https://arxiv.org/abs/2010.09776; code: https://github.com/huawei-noah/SMARTS. - 2019. A Regularized Opponent Model with Maximum Entropy Objective. Zheng Tian, Ying Wen, Zhichen Gong, Faiz Punakkath, Shihao Zou, Jun Wang. IJCAI 2019. arXiv: https://arxiv.org/abs/1905.01709. - 2019. Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning. Ying Wen, Yaodong Yang, Jun Wang. IJCAI. arXiv: https://arxiv.org/abs/1901.09216. - 2019. Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning. Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan. ICLR. arXiv: https://arxiv.org/abs/1901.09207. - 2018. Factorized Q-learning for large-scale multi-agent systems. Yong Chen, M. Zhou, Ying Wen, Yaodong Yang, Yufeng Su, Weinan Zhang, Dell Zhang, Jun Wang, Han Liu. International Conference on Distributed Artificial Intelligence. arXiv: https://arxiv.org/abs/1809.03738. - 2017. A Study of AI Population Dynamics with Million-agent Reinforcement Learning. Yaodong Yang, Lantao Yu, Yiwei Bai, Ying Wen, Weinan Zhang, Jun Wang. Adaptive Agents and Multi-Agent Systems. arXiv: https://arxiv.org/abs/1709.04511. - 2017. Learning to Design Games: Strategic Environments in Deep Reinforcement Learning. Haifeng Zhang, Jun Wang, Zhiming Zhou, Weinan Zhang, Ying Wen, Yong Yu, Wenxin Li. IJCAI. arXiv: https://arxiv.org/abs/1707.01310. - 2017. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games. Peng Peng, Quan Yuan, Ying Wen, Yaodong Yang, Zhenkun Tang, Haitao Long, Jun Wang. Preprint. arXiv: https://arxiv.org/abs/1703.10069. - 2017. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games. Peng Peng, Ying Wen, Yaodong Yang, Quan Yuan, Zhenkun Tang, Haitao Long, Jun Wang. Preprint. arXiv: https://arxiv.org/abs/1703.10069. - 2016. Learning text representation using recurrent convolutional neural network with highway layers. Ying Wen, Weinan Zhang, Rui Luo, Jun Wang. SIGIR 2016. arXiv: https://arxiv.org/abs/1606.06905. - 2016. Product-based neural networks for user response prediction. Y Qu, H Cai, K Ren, W Zhang, Y Yu, Y Wen, J Wang. ICDM 2016. arXiv: https://arxiv.org/abs/1611.00144.