Publications

💡 = Representative Work

Preprint
💡 SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang*, Carlos E. Jimenez*, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press
2024
Peer Reviewed
💡 SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
John Yang*, Carlos E. Jimenez*, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press
2024 • NeurIPS
Referral Augmentation for Zero-Shot Information Retrieval
Michael William Tang, Shunyu Yao, John Yang, Karthik Narasimhan
2024 • ACL (Findings)
💡 SWE-bench: Can Language Models Resolve Real-World Github Issues?
Carlos E. Jimenez*, John Yang*, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
2024 • ICLR • Oral
💡 InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao
2023 • NeurIPS (Datasets & Benchmarks)
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao*, Howard Chen*, John Yang, Karthik Narasimhan
2022 • NeurIPS
Workshop
💡 Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
John Yang, Akshara Prabhakar, Shunyu Yao, Kexin Pei, Karthik Narasimhan
2023 • Multi-Agent Security Workshop @ NeurIPS 2023 • Best Paper Award
Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment
John Yang, Howard Chen, Karthik Narasimhan
2022 • Language & Reinforcement Learning Workshop @ NeurIPS 2022
Miscellaneous
Learning Language through Interactions with the Digital World
John Yang
2022 • M.S.E. Thesis | Princeton University
Quartz: A Framework for Engineering Secure Smart Contracts
John Kolb, John Yang, Randy H Katz, David E Culler
2020 • EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2020-178