System safety and control-plane skill that prevents agent deadlocks and freezes. Provides non-LLM control commands to inspect task state, flush message queues, cancel long-running work, and recover safely without restarting the container. Use when implementing or operating long-running tasks, sub-agents, benchmarks, background monitors (e.g., Moltbook, PNR checks), or when the system becomes unresponsive and needs immediate recovery controls.
OpenClaw skills run inside an OpenClaw container. EasyClawd deploys and manages yours — no server setup needed.
Control‑plane safety skill for OpenClaw that prevents agent freezes and deadlocks using sub‑agent isolation, event‑driven execution, watchdogs, and crash‑safe recovery.
--- name: error-guard description: > System safety and control-plane skill that prevents agent deadlocks and freezes. Provides non-LLM control commands to inspect task state, flush message queues, cancel long-running work, and recover safely without restarting the container. Use when implementing or operating long-running tasks, sub-agents, benchmarks, background monitors (e.g., Moltbook, PNR checks), or when the system becomes unresponsive and needs immediate recovery controls. --- # error-guard ⚠️ **System‑level skill (Advanced users)** This skill defines the **control‑plane safety primitives** for OpenClaw. It is intentionally minimal, non‑blocking, and designed to prevent agent freezes, deadlocks, and unrecoverable states when running long‑lived or high‑risk workloads. ## Design Principles > **Warning:** This skill operates at the agent control‑plane level. > It should be installed only by users who understand OpenClaw’s execution model and are running workloads that can block, hang, or run for extended periods. - **Main agent never blocks**: no long exec, no external I/O, no LLM calls. - **Event-driven**: workers emit events; the control plane listens. - **Fail-safe first**: recovery commands must always respond. - **Minimal state**: track only task metadata (never payloads). ## Command Surface (Phase 1) ### /status Report current system health and task registry state. Returns: - Active tasks (taskId, type, state) - Start time and last heartbeat - Flags for stalled or overdue tasks Constraints: - Must run in constant time - Must not call any model or external API ### /flush Emergency stop. Immediately: - Cancel all active tasks - Kill active exec/process sessions - Clear pending message queue - Reset in-memory task registry Constraints: - Must always respond - No waiting on workers - No model calls ### /recover Safe recovery sequence. Steps: 1. Execute `/flush` 2. Reset control-plane state 3. Optionally reload skills/state (no container restart) ## Future Extensions (Not Implemented Yet) - Sub-agent runner helper (event-driven) - Task watchdogs with TTL and silence detection - Structured event protocol (task.started, task.heartbeat, task.completed, ...) - Back-pressure and task classes (interactive / batch / background) ## Security & Privacy - This skill **does not** store payloads, prompts, messages, or model outputs - Only minimal task metadata is persisted (taskId, timestamps, state) - No API keys, credentials, or user data are read or written - Safe to publish and share publicly ## Non-Goals - No business logic - No background polling loops - No user-facing features - No LLM reasoning paths This skill is the **last line of defense**. Keep it small, fast, and reliable.