May 30, 2026ai security

How an AI agent turned a notebook vulnerability into a database breach in under one hour

The attacker did not write a script. Once they had a foothold inside a compromised server, they handed the keyboard to an AI agent and let it figure out the rest. The agent had never seen this network before. It had no map of the database, no list of credentials, no pre-written playbook. It reasoned its way through four pivots, from an exposed developer tool to an internal PostgreSQL database, in under sixty minutes.

On May 10, 2026, researchers at Sysdig observed what they describe as the first confirmed in-the-wild intrusion driven by a large language model (LLM) agent. The entry point was Marimo, a Python notebook platform used by data scientists and researchers. A flaw in Marimo's code gave any unauthenticated attacker a full command-line shell on the server. What happened next is what makes this incident different from every prior attack on record: the attacker outsourced the entire post-exploitation operation to an AI.

The most alarming detail is not the speed, though the database was emptied in under two minutes. It is what the speed reveals about the new economics of attacking. Building a multi-pivot intrusion used to require an operator who understood cloud infrastructure, SSH key management, database schemas, and API rate-limiting well enough to write custom code for each target. The agent in this attack needed none of that preparation. It carried general knowledge about how cloud deployments are structured and composed the attack live. As Sysdig's Sr. Director Michael Clark put it: "We are not watching AI replace attackers. We are watching attackers replace their scripts with AI."

🟡 Level 2

Read the narrative

How the attack unfolded, phase by phase.

6 min read

🔴 Level 3

Go deeper

Technical mechanism, CVE, ATT&CK mapping, primary sources.

3 min read

Does this apply to you?

If your organization runs Marimo notebooks on any version at or below 0.23.0 and those notebooks are reachable from the internet, you may have an unauthenticated remote code execution exposure. If those notebooks run with cloud credentials, SSH keys, or environment variables in scope (a standard configuration in data science and AI development workflows), your blast radius may extend to every internal system those credentials can reach, including databases, secrets vaults, and downstream services. This pattern applies beyond Marimo: any developer tool or notebook platform exposed to the internet that holds production credentials is a single exploit away from a full internal pivot. Check whether your Marimo deployment is internet-accessible and whether it has been upgraded to version 0.23.0 or later. If it has not, you should treat it as actively compromised until patched and isolated.

Narrative · 6 min read

The Context

Marimo is an open-source Python notebook platform used by data scientists, researchers, and developers to build and share interactive code workflows. It has roughly 20,000 GitHub stars and is common in AI development and academic research environments. Like most notebook platforms, Marimo is designed to run locally or inside a private network. When organizations expose it to the internet—a common shortcut in research and development settings—it becomes a direct entry point into whatever cloud environment it runs inside.

The vulnerability at the center of this attack, CVE-2026-39987, was rated 9.3 out of 10 on the severity scale. It affected all Marimo versions up to and including 0.20.4. The fix shipped in version 0.23.0.

Key terms

LLM agent: A large language model (an AI system trained on text) that is given tools, such as a command-line shell or an API, and instructed to complete a goal autonomously, making decisions and chaining actions without human direction at each step.
WebSocket: A communication protocol that keeps a persistent two-way connection open between a browser or client and a server, commonly used for real-time features like terminals and live updates.
PTY shell: A pseudo-terminal shell: a full interactive command-line session on a server, equivalent to sitting at the keyboard of that machine.
AWS Secrets Manager: An Amazon Web Services product that stores sensitive values such as passwords, API keys, and SSH private keys, and makes them available to authorized applications via an API call.
SSH bastion: A hardened server that acts as the single authorized gateway into a private network. Connecting through the bastion is the standard way to reach internal servers that are not directly internet-accessible.
Egress fan-out: Routing outbound network requests through many different IP addresses simultaneously, so that no single source address generates enough traffic to trigger an alert.

The Attack, Phase by Phase

Phase 1: Unauthenticated Shell via a Skipped Security Check

Marimo exposes several WebSocket endpoints. Every endpoint except one calls validate_auth() before accepting a connection. The terminal endpoint, /terminal/ws, skips that check entirely. It only verifies that the server is running in the right mode and on a supported platform, then hands the connecting client a full interactive shell running with the privileges of the Marimo process—often root inside a container.

An attacker who can reach that endpoint needs to send exactly one request. No password, no token, no prior foothold.

The advisory was published on April 8, 2026. Nine hours and forty-one minutes later, Sysdig's honeypot systems recorded the first exploitation attempt. No public proof-of-concept code existed at the time. Between April 11 and 14, Sysdig recorded 662 distinct exploitation events from 11 unique source IP addresses across 10 countries.

Phase 2: Cloud Credential Harvest and AWS Secrets Manager Pivot

With the shell established, the attacker handed control to an LLM agent. The agent enumerated the compromised host—reading environment files, checking standard configuration paths, and reasoning from general knowledge about cloud deployments. It extracted two sets of cloud credentials without any prior knowledge of this specific host.

The agent then called the AWS Secrets Manager API. To avoid triggering alerts based on unusual call volume from a single IP, it routed 12 API calls across 11 distinct Cloudflare Workers points of presence in a 22-second burst. Each call appeared to originate from a different IP address. The per-source-IP anomaly detection defenders rely on saw nothing unusual. The agent retrieved an SSH private key stored as a secret.

Phase 3: SSH Bastion Navigation and Database Exfiltration

Armed with the SSH private key, the agent opened eight parallel SSH sessions against a downstream bastion server from six separate IP addresses simultaneously, again defeating IP-based detection. From the bastion, the agent had no map of the internal network and no prior knowledge of the database it was about to hit.

It enumerated the schema of an internal PostgreSQL database in real time, identified a table containing credentials, and exfiltrated the full contents. The entire bastion phase—from first SSH connection to completed database dump—took under two minutes. The full chain from initial Marimo compromise to exfiltration completed in under one hour.

Phase 4: How Sysdig Knew It Was an AI Agent

Sysdig identified four properties in the recorded command stream that distinguish agent-driven execution from scripted automation or a human operator.

First, the agent improvised schema enumeration with zero prior knowledge of the target database—a script would need that hardcoded. Second, a Chinese-language planning comment meaning "see what else we can do" leaked into the command stream, revealing the agent reasoning in real time. Third, every command was formatted for machine consumption: structured delimiters, bounded output caps, stderr discarded. Fourth, the agent fed its own prior output into each subsequent action without human direction—what Sysdig calls adaptive self-chaining. All four signatures appeared within a 113-second window.

What Made This Possible

The developer toolchain is treated as lower-security than production. Organizations that would never expose a production database to the internet routinely expose notebook servers for researcher convenience. Those notebooks run with the same cloud credentials, SSH keys, and internal network access as the production systems they feed.
A single missing function call opened a root shell. The rest of Marimo's WebSocket surface called validate_auth() correctly. The terminal endpoint did not. One omission in one file was sufficient for unauthenticated remote code execution rated 9.3 out of 10.
Distributed egress defeated the detection layer designed to catch it. AWS anomaly detection is built around per-source-IP thresholds. Routing 12 API calls across 11 addresses means each made at most two calls. The detection layer saw normal traffic from many sources, not an attack from one.

What Should Have Stopped This

Every defense that would have reduced the blast radius shares one trait: it does not depend on the compromised tool's own integrity. Once Marimo was owned, any control that lived inside Marimo was also owned.

Network isolation for developer tools. Marimo should not have been reachable from the internet. Placing development infrastructure behind a VPN would have ended the attack before it began.
Credential scoping. The cloud credentials inside the Marimo environment should have had minimum permissions for notebook workflows—not the ability to call AWS Secrets Manager and retrieve SSH private keys.
Behavioral detection on API fan-out. Detection that looks at aggregate patterns—many calls to the same API endpoint from many sources in a short window—would have flagged the Secrets Manager access regardless of how many IPs the agent used.
Patch velocity. The fix was available in version 0.23.0. The attack happened on May 10, more than a month after the April 8 disclosure. Organizations running unpatched Marimo after the CISA Known Exploited Vulnerabilities listing on April 23 had documented warning and did not act in time.

The Takeaway

This attack is the same class of failure as the Stryker Intune wipe: a privileged management tool weaponized against the organization it was built to serve. The shared failure is treating internal tooling as a lower-security tier while giving it the same credential access as production systems.

What is new is the collapse of the attacker cost curve. Scripted attacks require an operator who understands the specific target well enough to write custom code. An LLM agent carries general priors about entire classes of infrastructure and composes the attack chain live against whatever it finds. The multi-pivot intrusion that used to require hours of preparation now requires an inference budget and a foothold.

Defenders built anomaly detection around the assumption that sophisticated attacks are slow. An agent that can enumerate an unknown database schema and exfiltrate its contents in under two minutes breaks that assumption. Detection needs to shift from "did this command look suspicious" to "did this sequence of outcomes—credential access, API fan-out, lateral movement—happen in an impossible timeframe."

Pattern to remember: Developer tools that hold production credentials are production attack surface, regardless of how they are classified internally.

What changed: Defenders can no longer treat attack sophistication as a proxy for attacker preparation time: an AI agent can improvise the full post-exploitation chain from general infrastructure knowledge, collapsing hours of manual reconnaissance into a single inference session.

Technical Deep Dive · 3 min

The Technical Mechanism

CVE-2026-39987 is a pre-authentication remote code execution vulnerability in Marimo's /terminal/ws WebSocket endpoint. The flaw is a missing authentication check: all other Marimo WebSocket endpoints invoke validate_auth() before accepting a connection, but the terminal endpoint performs only a mode check and a platform support check before spawning a full pseudo-terminal (PTY) session. The PTY runs with the privileges of the Marimo process, which in containerized deployments is typically root.

The CWE classification is CWE-306 (Missing Authentication for Critical Function). The CVSS 3.1 base score is 9.3 (Critical). The attack vector is network, attack complexity is low, no privileges are required, and no user interaction is required.

The post-exploitation chain proceeded as follows:

The LLM agent enumerated environment variables and standard cloud credential paths (~/.aws/credentials, instance metadata endpoints, .env files) using general priors about cloud deployment patterns.
The agent issued 12 GetSecretValue API calls to AWS Secrets Manager, routing each through a distinct Cloudflare Workers egress node. The 11 distinct source IPs across a 22-second window kept each address below per-IP anomaly thresholds.
The retrieved SSH private key was used to open 8 parallel SSH sessions to a bastion host from 6 distinct source IPs.
From the bastion, the agent issued \d and \dt meta-commands to enumerate the PostgreSQL schema, identified a credential table, and issued a COPY or pg_dump-equivalent command to exfiltrate the full database contents.

The four behavioral signatures Sysdig identified as distinguishing agent-driven execution from scripted automation: improvised schema enumeration, a leaked Chinese-language planning comment (看看还能做什么, meaning "see what else we can do"), machine-optimized command formatting with structured delimiters and 2>/dev/null stderr suppression, and adaptive self-chaining of prior command output into subsequent prompts.

Affected versions: Marimo 0.20.4 and all prior versions. Fixed in: Marimo 0.23.0.

CVE and Advisories

CVE-2026-39987: Marimo unauthenticated WebSocket terminal RCE. CVSS 9.3 (Critical). Disclosed April 8, 2026. Fixed in version 0.23.0.
CISA KEV entry, April 23, 2026: CISA added CVE-2026-39987 to the Known Exploited Vulnerabilities catalog. Federal Civilian Executive Branch agencies were required to remediate by May 7, 2026, under Binding Operational Directive 22-01.

MITRE ATT&CK Mapping

Technique ID	ATT&CK name	How it appeared
T1190	Exploit Public-Facing Application	Unauthenticated WebSocket request to /terminal/ws returned a full PTY shell without any credentials.
T1552.001	Credentials in Files	LLM agent enumerated environment files and configuration paths to extract cloud credentials.
T1555	Credentials from Password Stores	Agent called AWS Secrets Manager GetSecretValue to retrieve an SSH private key stored as a managed secret.
T1090.003	Multi-hop Proxy	12 AWS API calls routed across 11 Cloudflare Workers egress IPs in 22 seconds to defeat per-source-IP anomaly detection.
T1021.004	Remote Services: SSH	8 parallel SSH sessions opened from 6 distinct IPs against a downstream bastion server using the retrieved private key.
T1005	Data from Local System	Agent enumerated PostgreSQL schema in real time and exfiltrated full database contents including a credential table.

Indicators of Compromise

Detection is structurally difficult because the agent actively defeated the two most common detection approaches: per-IP rate limiting (defeated by egress fan-out) and command-signature matching (defeated by machine-optimized formatting with no fixed command strings).

Behavioral Indicators

Outbound WebSocket connections to /terminal/ws from any external IP address on an internet-exposed Marimo deployment
AWS GetSecretValue API calls from multiple distinct source IPs within a short time window targeting the same secret ARN
SSH sessions to a bastion host originating from multiple IPs within seconds of each other
PostgreSQL schema enumeration commands (\d, \dt, pg_dump) issued in rapid succession with no preceding application-layer authentication
Commands formatted with structured delimiters and 2>/dev/null stderr suppression in a shell session that was not initiated by a known user

Network Indicators

The initial access IP recorded by Sysdig was 157.66.54.26 (AS141892, geolocated to Indonesia). This single indicator has limited ongoing detection value given the distributed egress pattern used in subsequent phases.

Attribution

Unattributed. The initial access IP (157.66.54.26, AS141892) geolocates to Indonesia. A Chinese-language planning comment leaked into the command stream during the credential search phase. Sysdig Sr. Director Michael Clark explicitly declined to attribute the activity to a known threat group or nation-state actor. The specific LLM model and agent framework used by the attacker have not been identified, which limits defenders' ability to build detection signatures targeting the specific toolchain.

Primary Sources

01.
AI agent at the wheel: How an attacker used LLMs to move from a CVE to an internal database in 4 pivots
Sysdig Threat Research Team · May 30, 2026
02.
Marimo OSS Python Notebook RCE: From Disclosure to Exploitation in Under 10 Hours
Sysdig Threat Research Team · April 16, 2026
03.
CISA Adds One Known Exploited Vulnerability to Catalog (CVE-2026-39987)
U.S. Cybersecurity and Infrastructure Security Agency (CISA) · April 23, 2026
04.
Attackers Use LLM Agent for Post-Exploitation After Marimo CVE-2026-39987 Exploit
The Hacker News · May 30, 2026
05.
Hackers Use LLM Agent to Move From Marimo RCE to Internal Database in Four Pivots
Cyber Security News · May 30, 2026
06.
CVE-2026-39987 Marimo Exploit Used with LLM Agent for Post-Exploitation
Vulert · May 31, 2026