I used OpenClaw as an AI operator cockpit around TeachClaw: routing work to agents, surfacing commands, running validation lanes, preserving memory, and turning failures into repeatable proving loops.
Building an AI product with agents creates a second system to manage: context, memory, routing, validation, live/runtime drift, and task handoff. Without an operator layer, the team repeats prompts, loses state, and promotes changes without enough evidence.
OpenClaw provides the runtime shell:
TeachClaw then adds repo-local operating modes:
teachclawdevelopment: default monothread development
flowteachclawoperator: operator-over-workers escalation
pathteachclaw-test: proving runtimeThe eval stack evolved from simple tests into product-shaped gates:
trust_level,
next_action, failing gates, and drift statusregression-no-debug-chatter: prevent internal dev text
from leaking to teachers.worksheet-core-generation: verify core worksheet
artifact generation.delivery-file-send-check: check generated files are
delivered as files, not raw paths/text.provisioning-cloud-init-contract: protect runtime
provisioning assumptions.journey-local-change-to-promotion-candidate: model the
path from local change to promotion discussion.Generated current-state files can lag the real checkout. The operator model treats docs as useful but always verifies branch state, recent commits, and validation artifacts before trusting them.
Mechanical pass is not always product pass. .docx and
.pptx outputs may still require human quality judgement
even when the runtime is correct.
Long-running agent tests need checkpointing, resumability, artifact copying, and clear failure classification. Otherwise the harness itself becomes the failure source.
This maps to FDE/AI deployment roles because enterprise AI deployments need more than model calls:
That is the actual work behind “AI adoption” once prototypes meet real users.