Publications
💡 = Representative Work
Preprint
💡 SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang*, Carlos E. Jimenez*, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press
2024
John Yang*, Carlos E. Jimenez*, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press
2024
EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges
Talor Abramovich, Meet Udeshi, Minghao Shao, Kilian Lieret, Haoran Xi, Kimberly Milner, Sofija Jancheska, John Yang, Carlos E. Jimenez, Farshad Khorrami, Prashanth Krishnamurthy, Brendan Dolan-Gavitt, Muhammad Shafique, Karthik Narasimhan, Ramesh Karri, Ofir Press
2024
Peer Reviewed
Talor Abramovich, Meet Udeshi, Minghao Shao, Kilian Lieret, Haoran Xi, Kimberly Milner, Sofija Jancheska, John Yang, Carlos E. Jimenez, Farshad Khorrami, Prashanth Krishnamurthy, Brendan Dolan-Gavitt, Muhammad Shafique, Karthik Narasimhan, Ramesh Karri, Ofir Press
2024
💡 SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
John Yang*, Carlos E. Jimenez*, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press
2024 • NeurIPS
John Yang*, Carlos E. Jimenez*, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press
2024 • NeurIPS
Referral Augmentation for Zero-Shot Information Retrieval
Michael William Tang, Shunyu Yao, John Yang, Karthik Narasimhan
2024 • ACL (Findings)
Michael William Tang, Shunyu Yao, John Yang, Karthik Narasimhan
2024 • ACL (Findings)
💡 SWE-bench: Can Language Models Resolve Real-World Github Issues?
Carlos E. Jimenez*, John Yang*, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
2024 • ICLR • Oral
Carlos E. Jimenez*, John Yang*, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
2024 • ICLR • Oral
💡 InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao
2023 • NeurIPS (Datasets & Benchmarks)
John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao
2023 • NeurIPS (Datasets & Benchmarks)
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao*, Howard Chen*, John Yang, Karthik Narasimhan
2022 • NeurIPS
Workshop
Shunyu Yao*, Howard Chen*, John Yang, Karthik Narasimhan
2022 • NeurIPS
💡 Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
John Yang, Akshara Prabhakar, Shunyu Yao, Kexin Pei, Karthik Narasimhan
2023 • Multi-Agent Security Workshop @ NeurIPS 2023 • Best Paper Award
John Yang, Akshara Prabhakar, Shunyu Yao, Kexin Pei, Karthik Narasimhan
2023 • Multi-Agent Security Workshop @ NeurIPS 2023 • Best Paper Award
Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment
John Yang, Howard Chen, Karthik Narasimhan
2022 • Language & Reinforcement Learning Workshop @ NeurIPS 2022
Miscellaneous
John Yang, Howard Chen, Karthik Narasimhan
2022 • Language & Reinforcement Learning Workshop @ NeurIPS 2022
Introducing SWE-bench Verified
Neil Chowdhury*, James Aung*, Chan Jun Shern*, Oliver Jaffe*, Dane Sherburn*, Giulio Starace*, Evan Mays, Rachel Dias, Marwan Aljubeh, Mia Glaese, Carlos E. Jimenez, John Yang, Kevin Liu, Aleksander Madry
2024 • OpenAI Technical Blog
Neil Chowdhury*, James Aung*, Chan Jun Shern*, Oliver Jaffe*, Dane Sherburn*, Giulio Starace*, Evan Mays, Rachel Dias, Marwan Aljubeh, Mia Glaese, Carlos E. Jimenez, John Yang, Kevin Liu, Aleksander Madry
2024 • OpenAI Technical Blog
Learning Language through Interactions with the Digital World
John Yang
2022 • M.S.E. Thesis | Princeton University
John Yang
2022 • M.S.E. Thesis | Princeton University
Quartz: A Framework for Engineering Secure Smart Contracts
John Kolb, John Yang, Randy H Katz, David E Culler
2020 • EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2020-178
John Kolb, John Yang, Randy H Katz, David E Culler
2020 • EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2020-178