Publications
Preprint
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Yijia Shao, Vinay Samuel, Yucheng Jiang, John Yang, Diyi Yang
2025
Peer Reviewed
Yijia Shao, Vinay Samuel, Yucheng Jiang, John Yang, Diyi Yang
2025
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang*, Carlos E. Jimenez*, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press
2025 • ICLR
John Yang*, Carlos E. Jimenez*, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press
2025 • ICLR
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study
Bowen Li*, Wenhan Wu*, Ziwei Tang*, Lin Shi*, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen
2025 • COLING • Oral
Bowen Li*, Wenhan Wu*, Ziwei Tang*, Lin Shi*, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen
2025 • COLING • Oral
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
John Yang*, Carlos E. Jimenez*, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press
2024 • NeurIPS
John Yang*, Carlos E. Jimenez*, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press
2024 • NeurIPS
Referral Augmentation for Zero-Shot Information Retrieval
Michael William Tang, Shunyu Yao, John Yang, Karthik Narasimhan
2024 • ACL (Findings)
Michael William Tang, Shunyu Yao, John Yang, Karthik Narasimhan
2024 • ACL (Findings)
SWE-bench: Can Language Models Resolve Real-World Github Issues?
Carlos E. Jimenez*, John Yang*, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
2024 • ICLR • Oral
Carlos E. Jimenez*, John Yang*, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
2024 • ICLR • Oral
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao
2023 • NeurIPS (Datasets & Benchmarks)
John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao
2023 • NeurIPS (Datasets & Benchmarks)
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao*, Howard Chen*, John Yang, Karthik Narasimhan
2022 • NeurIPS
Workshop
Shunyu Yao*, Howard Chen*, John Yang, Karthik Narasimhan
2022 • NeurIPS
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
John Yang, Akshara Prabhakar, Shunyu Yao, Kexin Pei, Karthik Narasimhan
2023 • Multi-Agent Security Workshop @ NeurIPS 2023 • Best Paper Award
John Yang, Akshara Prabhakar, Shunyu Yao, Kexin Pei, Karthik Narasimhan
2023 • Multi-Agent Security Workshop @ NeurIPS 2023 • Best Paper Award
Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment
John Yang, Howard Chen, Karthik Narasimhan
2022 • Language & Reinforcement Learning Workshop @ NeurIPS 2022
Miscellaneous
John Yang, Howard Chen, Karthik Narasimhan
2022 • Language & Reinforcement Learning Workshop @ NeurIPS 2022
Introducing SWE-bench Verified
Neil Chowdhury*, James Aung*, Chan Jun Shern*, Oliver Jaffe*, Dane Sherburn*, Giulio Starace*, Evan Mays, Rachel Dias, Marwan Aljubeh, Mia Glaese, Carlos E. Jimenez, John Yang, Kevin Liu, Aleksander Madry
2024 • OpenAI Technical Blog
Neil Chowdhury*, James Aung*, Chan Jun Shern*, Oliver Jaffe*, Dane Sherburn*, Giulio Starace*, Evan Mays, Rachel Dias, Marwan Aljubeh, Mia Glaese, Carlos E. Jimenez, John Yang, Kevin Liu, Aleksander Madry
2024 • OpenAI Technical Blog
Learning Language through Interactions with the Digital World
John Yang
2022 • M.S.E. Thesis | Princeton University
John Yang
2022 • M.S.E. Thesis | Princeton University
Quartz: A Framework for Engineering Secure Smart Contracts
John Kolb, John Yang, Randy H Katz, David E Culler
2020 • EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2020-178
John Kolb, John Yang, Randy H Katz, David E Culler
2020 • EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2020-178