Web Page - Operational
Web Page
Runner - Operational
Runner
API Service - Operational
API Service
Storage - Operational
Storage
Notice history
Jan 2026
- PostmortemPostmortem
Postmortem: Claude Code Hanging in Sandbox
Date: 2026-01-19
Severity: P0
Duration: ~4 hours━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SUMMARY
Production agent runs were failing silently. Claude Code started but never produced output, timing out after 15+ minutes.
Root Cause: stdin was configured as "pipe" but never closed, causing Claude Code to hang waiting for EOF.
Why CI missed it: CI uses mock-claude which doesn't check stdin state. Real Claude Code does.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
THE BUG
Before (hangs - stdin pipe never closed):
spawn(cmd, args, { stdio: ["pipe", "pipe", "pipe"] })After (works - stdin is /dev/null, immediate EOF):
spawn(cmd, args, { stdio: ["ignore", "pipe", "pipe"] })━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
HOW WE FOUND IT
SSH into sandbox
cat /tmp/vm0-agent-*.log → empty (no Claude output)
ps aux | grep claude → process alive, using 23% memory
ps -p 510 -o wchan → ep_pol (waiting on I/O)
ls -la /proc/510/fd/0 → stdin connected to pipe
Manual "claude --print hello" → works (TTY mode)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
WHY NO ROLLBACK
Release included database migration. Forward-fix was safer.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
INVESTIGATION NOISE
Runner npm publish failure (@vm0/core not built) was unrelated but consumed investigation time.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ACTION ITEMS
✅ Fix stdin → "ignore" (#1316)
✅ Add spawn unit tests (#1319)
✅ Fix CI publish jobs (#1306, #1318)
🔲 Add real Claude test in CI (TODO)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
KEY LESSON
Mock ≠ Real: CI must include at least one test with real Claude Code to catch behavior differences like stdin handling.
- ResolvedResolvedThis incident has been resolved.
- UpdateUpdate
We have confirmed that the issue lies in the way VM0 calls the claude code. The minimal fix has been finalized and is currently being redeployed.
- UpdateUpdate
We have now restored normal operation for both the database and task dispatcher, and are currently working on getting the Claude code in the sandbox back online.
- UpdateUpdate
Locating the problem comes from a recent database change, the team is trying to fix the data that caused the problem
- UpdateUpdate
This glitch comes from a recent runner deployment, and the team is trying to fix the issue
- IdentifiedIdentifiedWe are continuing to work on a fix for this incident.
- InvestigatingInvestigatingWe are currently investigating this incident.
Dec 2025
No notices reported this month
Nov 2025
No notices reported this month