SOSecure: The Wisdom of the Crowd for Safer AI-Generated Code
This program is tentative and subject to change.
Large Language Models (LLMs) are widely used for automated code generation, but the code they produce can contain security vulnerabilities. Their reliance on pretraining data means they may not reflect newly discovered vulnerabilities or evolving security practices. In contrast, developer communities on Stack Overflow (SO) provide a continuously updated record of security issues and their resolutions, as developers discuss and address vulnerabilities in real-world code. However, this information is not directly available to LLMs during code generation. This paper presents \textbf{SOSecure}, a post-generation security review layer that operationalizes Stack Overflow (SO) discussions as inference-time safety signals. SOSecure builds a security-focused knowledge base from SO answers and comments that explicitly identify vulnerabilities and security antipatterns. Given an LLM-generated snippet, it retrieves discussions involving similar code patterns and incorporates them as contextual guidance to revise potentially unsafe outputs. Unlike approaches that rely solely on curated vulnerability descriptions, SOSecure leverages community-authored critiques to provide targeted, framework-specific security nudges. We evaluate SOSecure on three datasets, SALLM, LLMSecEval, and LMSys. Across these datasets, SOSecure achieves fix rates of 71.7%, 91.3%, and 96.7%, respectively, compared to 49.1%, 56.5%, and 37.5% when prompting GPT-4 without retrieved discussions. SOSecure requires no retraining or fine-tuning and demonstrates how community knowledge can function as a lightweight inference-time safety layer for AI-generated code.