VM0 - Notice history

Web Page - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 99.98%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

Runner - Operational

100% - uptime
Jan 2026 · 99.33%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

API Service - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

Storage - Operational

100% - uptime
Jan 2026 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2026
Feb 2026
Mar 2026

Notice history

Mar 2026

No notices reported this month

Feb 2026

The platform.vm0.ai cannot be opened.
  • Postmortem
    Postmortem

    This incident was caused by a code refactoring that consolidated references to CLERK_PUBLISHABLE_KEY across several web sites. However, due to an oversight in omitting the variable name in the deployment script, the platform failed to locate the legacy CLERK_PUBLISHABLE_KEY in the production environment, resulting in page failures and inability to use platform.vm0.ai.

    The related API services and container services were not affected.

    The follow-up remediation plan primarily includes attempting to validate required environment variables during the build phase to prevent problematic code from being deployed. Additionally, introducing e2e testing for platform.vm0.ai to ensure the happy path workflow functions normally.

  • Resolved
    Resolved

    This incident has been resolved.

  • Identified
    Identified

    We determined that the issue originated from a recent deployment of the platform frontend code.

  • Investigating
    Investigating
    We are currently investigating this incident.

Jan 2026

Agent can't run
  • Postmortem
    Postmortem

    Postmortem: Claude Code Hanging in Sandbox

    Date: 2026-01-19
    Severity: P0
    Duration: ~4 hours

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    SUMMARY

    Production agent runs were failing silently. Claude Code started but never produced output, timing out after 15+ minutes.

    Root Cause: stdin was configured as "pipe" but never closed, causing Claude Code to hang waiting for EOF.

    Why CI missed it: CI uses mock-claude which doesn't check stdin state. Real Claude Code does.

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    THE BUG

    Before (hangs - stdin pipe never closed):
    spawn(cmd, args, { stdio: ["pipe", "pipe", "pipe"] })

    After (works - stdin is /dev/null, immediate EOF):
    spawn(cmd, args, { stdio: ["ignore", "pipe", "pipe"] })

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    HOW WE FOUND IT

    1. SSH into sandbox

    2. cat /tmp/vm0-agent-*.log → empty (no Claude output)

    3. ps aux | grep claude → process alive, using 23% memory

    4. ps -p 510 -o wchan → ep_pol (waiting on I/O)

    5. ls -la /proc/510/fd/0 → stdin connected to pipe

    6. Manual "claude --print hello" → works (TTY mode)

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    WHY NO ROLLBACK

    Release included database migration. Forward-fix was safer.

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    INVESTIGATION NOISE

    Runner npm publish failure (@vm0/core not built) was unrelated but consumed investigation time.

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    ACTION ITEMS

    ✅ Fix stdin → "ignore" (#1316)
    ✅ Add spawn unit tests (#1319)
    ✅ Fix CI publish jobs (#1306, #1318)
    🔲 Add real Claude test in CI (TODO)

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    KEY LESSON

    Mock ≠ Real: CI must include at least one test with real Claude Code to catch behavior differences like stdin handling.

  • Resolved
    Resolved
    This incident has been resolved.
  • Update
    Update

    We have confirmed that the issue lies in the way VM0 calls the claude code. The minimal fix has been finalized and is currently being redeployed.

  • Update
    Update

    We have now restored normal operation for both the database and task dispatcher, and are currently working on getting the Claude code in the sandbox back online.

  • Update
    Update

    Locating the problem comes from a recent database change, the team is trying to fix the data that caused the problem

  • Update
    Update

    This glitch comes from a recent runner deployment, and the team is trying to fix the issue

  • Identified
    Identified
    We are continuing to work on a fix for this incident.
  • Investigating
    Investigating
    We are currently investigating this incident.

Jan 2026 to Mar 2026

Next