VM0 - Notice history

Web Page - Operational

100% - uptime
Feb 2026 · 99.98%Mar · 99.83%Apr · 99.87%
Feb 2026
Mar 2026
Apr 2026

Runner - Operational

97% - uptime
Feb 2026 · 100.0%Mar · 100.0%Apr · 89.87%
Feb 2026
Mar 2026
Apr 2026

API Service - Operational

100% - uptime
Feb 2026 · 100.0%Mar · 99.92%Apr · 100.0%
Feb 2026
Mar 2026
Apr 2026

Storage - Operational

100% - uptime
Feb 2026 · 100.0%Mar · 100.0%Apr · 100.0%
Feb 2026
Mar 2026
Apr 2026

Connector - Operational

100% - uptime
Feb 2026 · 100.0%Mar · 100.0%Apr · 99.99%
Feb 2026
Mar 2026
Apr 2026

Notice history

Apr 2026

storage download failed
  • Postmortem
    Postmortem

    Guest Download Failure Due to Parallel Race Condition on Overlapping Mount Paths

    What Happened

    On 2026-04-10, agent jobs with multiple storages failed during VM initialization. The guest-download binary, which runs inside the VM to download and extract storage archives, crashed with canonicalize ENOENT errors when processing skills storages.

    Root Cause

    guest-download extracts storage archives in parallel (up to 4 concurrent threads). A recent change added a remove_dir_all(target_path) call at the start of each thread to clean stale files on VM reuse (keep-alive).

    The storage mount paths have a guaranteed parent-child overlap:

    - Instructions mount at /home/user/.claude

    - Skills mount at /home/user/.claude/skills/{name}

    When threads run concurrently, the parent path's remove_dir_all deletes child directories already created by sibling threads, causing those threads to fail with ENOENT.

    Impact

    • Scope: All jobs

    • Duration: ~11 hours (2026-04-09 16:38 UTC — 2026-04-10 03:21 UTC)

    Timeline (UTC)

    Time

    Event

    2026-04-09 16:38

    Code change merged — added remove_dir_all pre-cleanup in parallel download threads

    2026-04-10 ~02:00

    Job failures reported on prod-3

    2026-04-10 ~02:45

    Root cause identified via prod SSH log analysis

    2026-04-10 03:21

    Fix merged and deployed

    Fix

    • Removed remove_dir_all from download_and_extract() — threads now only do create_dir_all + streaming tar extraction

    • Disabled --keep-alive in CI and production to avoid VM reuse until proper stale file cleanup is implemented (#8757)

  • Resolved
    Resolved
    This incident has been resolved.
  • Identified
    Identified
    We are continuing to work on a fix for this incident.
  • Investigating
    Investigating
    We are currently investigating this incident.

Mar 2026

Feb 2026

The platform.vm0.ai cannot be opened.
  • Postmortem
    Postmortem

    This incident was caused by a code refactoring that consolidated references to CLERK_PUBLISHABLE_KEY across several web sites. However, due to an oversight in omitting the variable name in the deployment script, the platform failed to locate the legacy CLERK_PUBLISHABLE_KEY in the production environment, resulting in page failures and inability to use platform.vm0.ai.

    The related API services and container services were not affected.

    The follow-up remediation plan primarily includes attempting to validate required environment variables during the build phase to prevent problematic code from being deployed. Additionally, introducing e2e testing for platform.vm0.ai to ensure the happy path workflow functions normally.

  • Resolved
    Resolved

    This incident has been resolved.

  • Identified
    Identified

    We determined that the issue originated from a recent deployment of the platform frontend code.

  • Investigating
    Investigating
    We are currently investigating this incident.

Feb 2026 to Apr 2026

Next