AIware 2026
Mon 6 - Tue 7 July 2026 Montreal, Canada
co-located with FSE 2026

Verifying bug fixes before patches are released to end users is a critical step in the software development lifecycle. However, this process is often manual, repetitive, and error-prone, especially for crash bugs triggered through Graphical User Interface (GUI) interactions in desktop applications. Despite recent advancements in LLM-driven software agents, existing work primarily targets bug reproduction without addressing fix verification, while approaches that do focus on verification rely on source code access, making them inapplicable to closed-source GUI-based desktop applications. This paper introduces Fixpad++, a framework designed to automatically verify bug fixes in the Notepad++ desktop application using LLM-powered agents. Fixpad++ employs a two-phase approach: first, a multi-modal multi-agent system interacts with the buggy version to reproduce the reported crash using visual parsing and LLM reasoning. Second, upon successful reproduction, a trajectory replay mechanism executes the recorded action sequence on the patched version to validate the fix. We evaluated Fixpad++ on FixPad-Bench, a new dataset of 105 evaluation instances derived from 22 real-world Notepad++ crash bugs, including valid and invalid patches. The system achieved a reproduction success rate of 72.73% with an average time of 174.07 seconds. Among the successfully reproduced cases, Fixpad++ correctly verified valid fixes with 87.50% accuracy and detected invalid fixes with 77.05% accuracy, outperforming OpenAI’s Computer-Using Agent (CUA). Fixpad++ demonstrates the effectiveness of specialized LLM agent architectures for automated bug fix verification in GUI-based desktop applications, offering a practical solution for automating verification workflows without requiring access to source code.