Postmortem: Claude Code Hanging in Sandbox
Date: 2026-01-19
Severity: P0
Duration: ~4 hours
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
SUMMARY
Production agent runs were failing silently. Claude Code started but never produced output, timing out after 15+ minutes.
Root Cause: stdin was configured as "pipe" but never closed, causing Claude Code to hang waiting for EOF.
Why CI missed it: CI uses mock-claude which doesn't check stdin state. Real Claude Code does.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
THE BUG
Before (hangs - stdin pipe never closed):
spawn(cmd, args, { stdio: ["pipe", "pipe", "pipe"] })
After (works - stdin is /dev/null, immediate EOF):
spawn(cmd, args, { stdio: ["ignore", "pipe", "pipe"] })
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
HOW WE FOUND IT
SSH into sandbox
cat /tmp/vm0-agent-*.log โ empty (no Claude output)
ps aux | grep claude โ process alive, using 23% memory
ps -p 510 -o wchan โ ep_pol (waiting on I/O)
ls -la /proc/510/fd/0 โ stdin connected to pipe
Manual "claude --print hello" โ works (TTY mode)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
WHY NO ROLLBACK
Release included database migration. Forward-fix was safer.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
INVESTIGATION NOISE
Runner npm publish failure (@vm0/core not built) was unrelated but consumed investigation time.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ACTION ITEMS
โ
Fix stdin โ "ignore" (#1316)
โ
Add spawn unit tests (#1319)
โ
Fix CI publish jobs (#1306, #1318)
๐ฒ Add real Claude test in CI (TODO)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
KEY LESSON
Mock โ Real: CI must include at least one test with real Claude Code to catch behavior differences like stdin handling.