Web Page - Operational
Web Page
Runner - Operational
Runner
API Service - Operational
API Service
Storage - Operational
Storage
Notice history
Mar 2026
No notices reported this month
Feb 2026
- PostmortemPostmortem
This incident was caused by a code refactoring that consolidated references to CLERK_PUBLISHABLE_KEY across several web sites. However, due to an oversight in omitting the variable name in the deployment script, the platform failed to locate the legacy CLERK_PUBLISHABLE_KEY in the production environment, resulting in page failures and inability to use platform.vm0.ai.
The related API services and container services were not affected.
The follow-up remediation plan primarily includes attempting to validate required environment variables during the build phase to prevent problematic code from being deployed. Additionally, introducing e2e testing for platform.vm0.ai to ensure the happy path workflow functions normally.
- ResolvedResolved
This incident has been resolved.
- IdentifiedIdentified
We determined that the issue originated from a recent deployment of the platform frontend code.
- InvestigatingInvestigatingWe are currently investigating this incident.
Jan 2026
- PostmortemPostmortem
Postmortem: Claude Code Hanging in Sandbox
Date: 2026-01-19
Severity: P0
Duration: ~4 hours━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SUMMARY
Production agent runs were failing silently. Claude Code started but never produced output, timing out after 15+ minutes.
Root Cause: stdin was configured as "pipe" but never closed, causing Claude Code to hang waiting for EOF.
Why CI missed it: CI uses mock-claude which doesn't check stdin state. Real Claude Code does.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
THE BUG
Before (hangs - stdin pipe never closed):
spawn(cmd, args, { stdio: ["pipe", "pipe", "pipe"] })After (works - stdin is /dev/null, immediate EOF):
spawn(cmd, args, { stdio: ["ignore", "pipe", "pipe"] })━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
HOW WE FOUND IT
SSH into sandbox
cat /tmp/vm0-agent-*.log → empty (no Claude output)
ps aux | grep claude → process alive, using 23% memory
ps -p 510 -o wchan → ep_pol (waiting on I/O)
ls -la /proc/510/fd/0 → stdin connected to pipe
Manual "claude --print hello" → works (TTY mode)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
WHY NO ROLLBACK
Release included database migration. Forward-fix was safer.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
INVESTIGATION NOISE
Runner npm publish failure (@vm0/core not built) was unrelated but consumed investigation time.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ACTION ITEMS
✅ Fix stdin → "ignore" (#1316)
✅ Add spawn unit tests (#1319)
✅ Fix CI publish jobs (#1306, #1318)
🔲 Add real Claude test in CI (TODO)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
KEY LESSON
Mock ≠ Real: CI must include at least one test with real Claude Code to catch behavior differences like stdin handling.
- ResolvedResolvedThis incident has been resolved.
- UpdateUpdate
We have confirmed that the issue lies in the way VM0 calls the claude code. The minimal fix has been finalized and is currently being redeployed.
- UpdateUpdate
We have now restored normal operation for both the database and task dispatcher, and are currently working on getting the Claude code in the sandbox back online.
- UpdateUpdate
Locating the problem comes from a recent database change, the team is trying to fix the data that caused the problem
- UpdateUpdate
This glitch comes from a recent runner deployment, and the team is trying to fix the issue
- IdentifiedIdentifiedWe are continuing to work on a fix for this incident.
- InvestigatingInvestigatingWe are currently investigating this incident.