Search events for 'all'
Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles
Main Track People: Young Jo Chung, Safwat Hassan
… a recorded review step. Across all tools, operational agency and governance …. Detailed analysis of all 54 automation‑authorized merges highlights …
VeriTrans: Fine-Tuned LLM-Assisted NL→PL Translation via a Deterministic Neuro-Symbolic Pipeline
Main Track People: Xuan Liu, Dheeraj Kodakandla, Kushagra Srivastva, Mahfuza Farooque
… PL$\!\to\!$CNF compilation, all executed via fixed API configuration …. Validator overhead contributes $<15\%$ of end-to-end runtime, and all prompts …
AgentTelemetry: A Fault Detection Benchmark and Toolkit for LLM Agent Observability
Benchmark & Dataset Track People: Krishna Chaitanya Balusu
… to 0.429 for vanilla OpenTelemetry and OTel+GenAI. An ablation study proves all nine … by +8.3 pp over a matched control. All code, data, and benchmark configurations …
JunoBench: A Benchmark Dataset of Crashes in Python Machine Learning Jupyter Notebooks
Benchmark & Dataset Track People: Yiran Wang, José Antonio Hernández López, Ulf Nilsson, Daniel Varro
… a unified execution environment that reliably reproduces all crashes. In addition …
CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis
Benchmark & Dataset Track People: Arunabh Majumdar
… % across all 15 vulnerabilities—87% of chains are invisible to per-commit SAST …
How Robustly do LLMs Understand Execution Semantics?
Main Track People: Claudio Spiess, Premkumar Devanbu, Earl T. Barr
… in the way all models understand code, and establish the value of using perturbation …
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
Main Track People: Éric Jacopin
… –100%) across all models; (2) separated tests expose stark model-tier gaps (0 …
Recovering from Misbehaviors in Coding Agents
Main Track People: Rahul Nanda, Chandra Maddila, Smriti Jha, Euna Mehnaz Khan, Satish Chandra, Matteo Paltenghi
… Problems, and Tool Call Failures, which we find occur in about 30% of all agent …
Can LLMs really reason about Code? Studying how well LLMs understand the relation between Input, Code, and Output
Main Track People: Norman Becker, Tural Mammadov, Andreas Zeller
… all datasets, including perfect input recovery on deterministic string tasks …
Detecting Unsoundness in Neural Network Verifiers via Concrete–Abstract Consistency
Main Track People: Kaijie Liu, Yulei Sui
… by considering all possible specified behaviors, the soundness …