LLMs Writing Code? Cool. LLMs Executing It? Dangerous.

Published 06/03/2025

Written by Olivia Rempe, Community Engagement Manager, Cloud Security Alliance.

There’s no denying it—Large Language Models (LLMs) have changed the game for software development.

They can autocomplete boilerplate, refactor legacy functions, and even generate entire microservices with a well-crafted prompt. But as tempting as it is to let that generated code run, here’s a word of caution:

Letting an LLM write code is powerful.
Letting it execute code? That’s dangerous.

CSA’s latest white paper, Securing LLM-Backed Systems: Essential Authorization Practices, spells out why combining non-deterministic models with execution rights is a major security risk—and what you should do about it.

The Risks: When Code Becomes a Threat Vector

LLMs don’t “understand” what they’re doing. They generate code probabilistically based on patterns—not verified logic. Combine that with runtime permissions, and you get a volatile mix:

Vulnerable Code Generation

LLMs can hallucinate insecure functions, use deprecated libraries, or output vulnerable patterns (e.g., unsanitized SQL queries, shell commands, etc.).

Prompt Injection Attacks

Malicious users can craft prompts that trick the model into generating dangerous or destructive code—even unintentionally.

Excessive Privileges

The runtime environment may have broad access to internal systems, files, or the network, turning one bad line of code into a system-wide breach.

No Natural Boundaries

LLMs don’t distinguish between code and data, and they don’t enforce access controls. Left unchecked, they can pull in sensitive data, mutate state, or execute instructions outside their scope.

What Secure Execution Should Look Like

If you must allow runtime code execution from an LLM (say, for a developer tool or automation pipeline), CSA recommends a strict, multi-layered defense:

1. Sandbox the Execution Environment

Use containerized or virtualized sandboxes with strict runtime limits
Block outbound network access by default
Whitelist libraries and functions that can be imported or called
Prevent write access unless explicitly required (read-only by default)
Limit CPU, memory, and execution time (e.g., 30 seconds max)

2. Validate and Review the Code

Use automated static code analysis and LLM-based reviewers to flag suspicious patterns
Only allow specific types of code generation (e.g., string manipulation, math functions)
Monitor for calls to file systems, subprocesses, or system-level operations
Implement syntax checks, query validation, and function white-listing

3. Insert Human-in-the-Loop Checkpoints

Require human approval before executing code that:
- Modifies data
- Calls external APIs
- Has destructive potential
Log all executions, including who approved and what inputs were used

4. Use the Orchestrator, Not the LLM, to Control Execution

Treat the LLM as an untrusted advisor, not an autonomous actor
The orchestrator (your trusted runtime) should handle:
- User authentication
- API calls
- Identity and access checks
- Execution permissions
Don’t let the LLM pull the trigger—let it suggest, and have a human or secure service confirm.

Build Safely or Not at All

In the rush to harness LLMs for productivity, too many teams skip the guardrails. But when the LLM moves from “suggesting code” to “executing code,” you enter a whole new threat landscape.

Just because the model can do it doesn’t mean it should—at least not without a secure environment, human oversight, and airtight permissions.

So the next time you build a tool that lets the LLM run scripts, ask yourself:

Would I let an intern with zero context and no code review access production?

If not, don’t let your LLM either.

CSA’s white paper breaks this all down—from sandboxing tips to orchestration patterns to execution audit trails.

Download “Securing LLM-Backed Systems: Essential Authorization Practices” today.

AI Top News Artificial Intelligence DevSecOps Innovating from the cloud