AIware 2026
Mon 6 - Tue 7 July 2026 Montreal, Canada
co-located with FSE 2026
Events (10 results)

Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles

Main Track People: Young Jo Chung, Safwat Hassan

… a recorded review step. Across all tools, operational agency and governance …. Detailed analysis of all 54 automation‑authorized merges highlights …

VeriTrans: Fine-Tuned LLM-Assisted NL→PL Translation via a Deterministic Neuro-Symbolic Pipeline

Main Track People: Xuan Liu, Dheeraj Kodakandla, Kushagra Srivastva, Mahfuza Farooque

… PL$\!\to\!$CNF compilation, all executed via fixed API configuration …. Validator overhead contributes $<15\%$ of end-to-end runtime, and all prompts …

AgentTelemetry: A Fault Detection Benchmark and Toolkit for LLM Agent Observability

Benchmark & Dataset Track People: Krishna Chaitanya Balusu

… to 0.429 for vanilla OpenTelemetry and OTel+GenAI. An ablation study proves all nine … by +8.3 pp over a matched control. All code, data, and benchmark configurations …

JunoBench: A Benchmark Dataset of Crashes in Python Machine Learning Jupyter Notebooks

Benchmark & Dataset Track People: Yiran Wang, José Antonio Hernández López, Ulf Nilsson, Daniel Varro

… a unified execution environment that reliably reproduces all crashes. In addition …

CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis

Benchmark & Dataset Track People: Arunabh Majumdar

… % across all 15 vulnerabilities—87% of chains are invisible to per-commit SAST …

How Robustly do LLMs Understand Execution Semantics?

Main Track People: Claudio Spiess, Premkumar Devanbu, Earl T. Barr

… in the way all models understand code, and establish the value of using perturbation …

Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

Main Track People: Éric Jacopin

… –100%) across all models; (2) separated tests expose stark model-tier gaps (0 …

Recovering from Misbehaviors in Coding Agents

Main Track People: Rahul Nanda, Chandra Maddila, Smriti Jha, Euna Mehnaz Khan, Satish Chandra, Matteo Paltenghi

… Problems, and Tool Call Failures, which we find occur in about 30% of all agent …

Can LLMs really reason about Code? Studying how well LLMs understand the relation between Input, Code, and Output

Main Track People: Norman Becker, Tural Mammadov, Andreas Zeller

all datasets, including perfect input recovery on deterministic string tasks …

Detecting Unsoundness in Neural Network Verifiers via Concrete–Abstract Consistency

Main Track People: Kaijie Liu, Yulei Sui

… by considering all possible specified behaviors, the soundness …