VM0 - Notice history

Web Page - Operational

100% - uptime
Nov 2025 · 100.0%Dec · 100.0%Jan 2026 · 100.0%
Nov 2025
Dec 2025
Jan 2026

Runner - Operational

100% - uptime
Nov 2025 · 100.0%Dec · 100.0%Jan 2026 · 99.33%
Nov 2025
Dec 2025
Jan 2026

API Service - Operational

100% - uptime
Nov 2025 · 100.0%Dec · 100.0%Jan 2026 · 100.0%
Nov 2025
Dec 2025
Jan 2026

Storage - Operational

100% - uptime
Nov 2025 · 100.0%Dec · 100.0%Jan 2026 · 100.0%
Nov 2025
Dec 2025
Jan 2026

Notice history

Jan 2026

Agent can't run
  • Postmortem
    Postmortem

    Postmortem: Claude Code Hanging in Sandbox

    Date: 2026-01-19
    Severity: P0
    Duration: ~4 hours

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    SUMMARY

    Production agent runs were failing silently. Claude Code started but never produced output, timing out after 15+ minutes.

    Root Cause: stdin was configured as "pipe" but never closed, causing Claude Code to hang waiting for EOF.

    Why CI missed it: CI uses mock-claude which doesn't check stdin state. Real Claude Code does.

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    THE BUG

    Before (hangs - stdin pipe never closed):
    spawn(cmd, args, { stdio: ["pipe", "pipe", "pipe"] })

    After (works - stdin is /dev/null, immediate EOF):
    spawn(cmd, args, { stdio: ["ignore", "pipe", "pipe"] })

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    HOW WE FOUND IT

    1. SSH into sandbox

    2. cat /tmp/vm0-agent-*.log → empty (no Claude output)

    3. ps aux | grep claude → process alive, using 23% memory

    4. ps -p 510 -o wchan → ep_pol (waiting on I/O)

    5. ls -la /proc/510/fd/0 → stdin connected to pipe

    6. Manual "claude --print hello" → works (TTY mode)

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    WHY NO ROLLBACK

    Release included database migration. Forward-fix was safer.

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    INVESTIGATION NOISE

    Runner npm publish failure (@vm0/core not built) was unrelated but consumed investigation time.

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    ACTION ITEMS

    ✅ Fix stdin → "ignore" (#1316)
    ✅ Add spawn unit tests (#1319)
    ✅ Fix CI publish jobs (#1306, #1318)
    🔲 Add real Claude test in CI (TODO)

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    KEY LESSON

    Mock ≠ Real: CI must include at least one test with real Claude Code to catch behavior differences like stdin handling.

  • Resolved
    Resolved
    This incident has been resolved.
  • Update
    Update

    We have confirmed that the issue lies in the way VM0 calls the claude code. The minimal fix has been finalized and is currently being redeployed.

  • Update
    Update

    We have now restored normal operation for both the database and task dispatcher, and are currently working on getting the Claude code in the sandbox back online.

  • Update
    Update

    Locating the problem comes from a recent database change, the team is trying to fix the data that caused the problem

  • Update
    Update

    This glitch comes from a recent runner deployment, and the team is trying to fix the issue

  • Identified
    Identified
    We are continuing to work on a fix for this incident.
  • Investigating
    Investigating
    We are currently investigating this incident.

Dec 2025

No notices reported this month

Nov 2025

No notices reported this month

Nov 2025 to Jan 2026

Next