Fixpad++: Automated Bug Fix Verification Using LLM Agents (AIware 2026 - Main Track) - AIware 2026

Mon 6 - Tue 7 July 2026 Montreal, Canada

co-located with FSE 2026

Who

Mustafa Özkan İr, Mehmet Dedeler, Anil Koyuncu, Eray Tüzün

Track

AIware 2026 Main Track

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Mon 6 Jul 2026 09:30 - 09:35 at MB 1.210 - Coding Agents, Software Testing, and Code Understanding

Abstract

Verifying bug fixes before patches are released to end users is a critical step in the software development lifecycle. However, this process is often manual, repetitive, and error-prone, especially for crash bugs triggered through Graphical User Interface (GUI) interactions in desktop applications. Despite recent advancements in LLM-driven software agents, existing work primarily targets bug reproduction without addressing fix verification, while approaches that do focus on verification rely on source code access, making them inapplicable to closed-source GUI-based desktop applications. This paper introduces Fixpad++, a framework designed to automatically verify bug fixes in the Notepad++ desktop application using LLM-powered agents. Fixpad++ employs a two-phase approach: first, a multi-modal multi-agent system interacts with the buggy version to reproduce the reported crash using visual parsing and LLM reasoning. Second, upon successful reproduction, a trajectory replay mechanism executes the recorded action sequence on the patched version to validate the fix. We evaluated Fixpad++ on FixPad-Bench, a new dataset of 105 evaluation instances derived from 22 real-world Notepad++ crash bugs, including valid and invalid patches. The system achieved a reproduction success rate of 72.73% with an average time of 174.07 seconds. Among the successfully reproduced cases, Fixpad++ correctly verified valid fixes with 87.50% accuracy and detected invalid fixes with 77.05% accuracy, outperforming OpenAI’s Computer-Using Agent (CUA). Fixpad++ demonstrates the effectiveness of specialized LLM agent architectures for automated bug fix verification in GUI-based desktop applications, offering a practical solution for automating verification workflows without requiring access to source code.

Mustafa Özkan İr

Bilkent University, Bilkent University, TR

Mehmet Dedeler

Bilkent University, Bilkent University, TR

Anil Koyuncu

Bilkent University

Turkey

Eray Tüzün

Bilkent University

Turkey

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Mon 6 Jul
Displayed time zone: Eastern Time (US & Canada) change

	08:50 - 10:30	Coding Agents, Software Testing, and Code UnderstandingArXiv Track / Main Track at MB 1.210

	08:50 5m Talk		Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles Main Track Young Jo Chung , Safwat Hassan University of Toronto
	08:55 5m Talk		When Code Authors Are Agents: A Large-Scale Study of Human–Agent Collaboration in Pull Requests Main Track Anthonia Oluchukwu Njoku École Polytechnique de Montréal, Université de Montréal, CA, Zohreh Sharafi Polytechnique Montréal, Foutse Khomh Polytechnique Montréal
	09:00 5m Talk		Understanding Conversational Patterns in Multi-Agent Programming: A Case Study On Fibonacci Game Development Main Track Srijita Basu Chalmers University of Technology and University of Gothenburg, Viktor Kjellberg Göteborg University, SE, Simin Sun , Bengt Haraldsson Chalmers University of Technology and University of Gothenburg, Scania CV AB, Md Abu Ahammed Babu Volvo Cars AB, Wilhelm Meding Ericsson, Farnaz Fotrousi Chalmers University of Technology and University of Gothenburg, Miroslaw Staron Chalmers University of Technology and University of Gothenburg
	09:05 5m Talk		Recovering from Misbehaviors in Coding Agents Main Track Rahul Nanda Facebook, US, Chandra Maddila Meta Platforms, Inc., Smriti Jha Facebook, US, Euna Mehnaz Khan , Satish Chandra Meta Platforms, Inc., Matteo Paltenghi University of Stuttgart
	09:10 5m Talk		Configuring Agentic AI Coding Tools: An Exploratory Study Main Track Matthias Galster University of Canterbury, Seyedmoein Mohsenimofidi Heidelberg University, Jai Lal Lulla Singapore Management University, Muhammad Auwal Abubakar Otto-Friedrich Universität Bamberg, DE, Christoph Treude Singapore Management University, Sebastian Baltes Heidelberg University Pre-print
	09:15 5m Talk		Execution Control Matters: Deterministic and Agentic Tool Orchestration for LLM-Based Code Translation Main Track Naing Oo Lwin Bucknell University, US, Rajesh Kumar Bucknell University, US
	09:20 5m Talk		Developer Experience with AI Coding Agents: HTTP Behavioral Signatures in Documentation Portals ArXiv Track Oleksii Borysenko Cisco DevNet
	09:25 5m Talk		VISOR: A Vision-Language Model-based Test Oracle for Testing Robots Main Track Prasun Saurabh Simula Research Laboratory, NO, Pablo Valle Mondragon University, Aitor Arrieta Mondragon University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Paolo Arcaini National Institute of Informatics
	09:30 5m Talk		Fixpad++: Automated Bug Fix Verification Using LLM Agents Main Track Mustafa Özkan İr Bilkent University, Bilkent University, TR, Mehmet Dedeler Bilkent University, Bilkent University, TR, Anil Koyuncu Bilkent University, Eray Tüzün Bilkent University
	09:35 5m Talk		Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation Main Track Éric Jacopin Cosmic AI, FR
	09:40 5m Talk		Examining LLMs Ability to Summarize Code Through Mutation-Analysis Main Track Lara Khatib University of Waterloo, Michael Pu University of Waterloo, Bogdan Vasilescu Carnegie Mellon University, Mei Nagappan University of Waterloo
	09:45 5m Talk		Testing AIware Systems: A Software Engineering Survey Main Track Karla Gonzalez Royal Military College of Canada, Mariam El Mezouar Royal Military College
	09:50 5m Talk		TestMap: Evidence Infrastructure for Foundation-Model-Assisted Test Generation ArXiv Track Hunter Leary Virginia Tech, Luke Hanuska Virginia Tech, Chris Brown Virginia Tech
	09:55 5m Talk		Metamorphic Testing for Clinical ML Models: A Framework Proposal and Pilot Study ArXiv Track Jie JW Wu Michigan Technological University, USA, Feiyu E Michigan Technological University, USA, Bo Chen Michigan Technological University, USA
	10:00 5m Talk		An Empirical Study of Reasoning Steps in Thinking Code LLMs Main Track Haoran Xue York University, CA, Gias Uddin York University, Canada, Song Wang York University
	10:05 5m Talk		Can LLMs really reason about Code? Studying how well LLMs understand the relation between Input, Code, and Output Main Track Norman Becker CISPA Helmholtz Center for Information Security, DE, Tural Mammadov CISPA Helmholtz Center for Information Security, Andreas Zeller CISPA Helmholtz Center for Information Security
	10:10 5m Talk		How Robustly do LLMs Understand Execution Semantics? Main Track Claudio Spiess University of California, Davis, Premkumar Devanbu UC Davis, Earl T. Barr University College London
	10:15 5m Talk		Program-as-Weights: A Programming Paradigm for Fuzzy Functions ArXiv Track Wentao Zhang University of Waterloo, Liliana Hotsko University of Waterloo, Woojeong Kim Cornell University, Pengyu Nie University of Waterloo, Stuart Shieber Harvard University, Yuntian Deng University of Waterloo
	10:20 10m Live Q&A		Joint Q&A and Discussion Main Track