Restart Recovery
Make OpenClaw agent workflows restart-safe using checkpoint files, idempotent step tracking, wake/resume handoff, and stale-checkpoint monitoring. Use when u...
47 downloads
Free
Reviewed
Restart Recovery
Implement restart-safe execution with this sequence:
- checkpoint
- restart
- wake
- resume from file
Use bundled scripts
- Use
scripts/checkpoint_tool.pyfor deterministic checkpoint lifecycle:start,update,resume,complete,list
- Use
scripts/checkpoint_selfcheck.pyfor stale unfinished checkpoint alerts without LLM/tool-token usage.
Required operating rules
- Write checkpoints before any restart-prone operation (config patch/apply, update, service restart, long multi-step jobs).
- Use atomic file writes (
.tmpthen rename). - Track completed and remaining steps explicitly.
- Include an idempotency key per workflow to avoid duplicate side effects after resume.
- Never write secrets/tokens to checkpoint files.
- Acquire a resume lock before continuing unfinished work.
Recommended checkpoint location
- Per agent:
memory/checkpoints/*.json - Shared/default workspace flows:
memory/checkpoints/*.jsonat workspace root
Startup instruction to add in AGENTS.md
Add this exact section:
## Restart-safe workflow rule
On startup, check `memory/checkpoints/*.json` for unfinished workflows. If found, acquire resume lock, validate checkpoint schema/hash, and continue from the last completed idempotent step.
No-LLM stale checkpoint monitor
Use host scheduler (launchd/systemd/cron), not LLM cron jobs.
- Run every 10 minutes.
- Alert only when unfinished checkpoints are older than threshold.
- Log to local file for audit.
Suggested execution flow
checkpoint_tool.py startbefore risky step.- Perform step.
checkpoint_tool.py update --complete <step> --step <next>.- If restart happens, wake session/process.
- On startup/re-entry,
checkpoint_tool.py resumeand continue. checkpoint_tool.py completewhen done.
Validation checklist
- Simulate mid-work restart and verify resume from last completed step.
- Confirm idempotency (no duplicate sends/writes/actions).
- Confirm stale-check script only alerts after threshold.
- Confirm old checkpoint cleanup policy (expiry).
Download
ZIP package — ready to use
Skill Info
- Creator
- stanrails
- Downloads
- 47
- Published
- Mar 15, 2026
- Updated
- Mar 16, 2026