AI Deployment Engineer / FDE portfolio

AI systems that survive real users.

BSc Computer Science finalist in London, graduating July 2026 and on track for a First. I build agentic systems from messy workflow discovery into deployed runtime, eval gates, rollback thinking, and production-shaped iteration.

Forward Deployed Engineering Applied AI LLM Evals Docker / FastAPI / TypeScript
TeachClaw and OpenClaw deployment architecture diagram
teachclaw-eval/latest
worksheet-core-generation
mechanical: pass
quality: needs_judgement
drift_status: pass

why it matters
Generated files can pass technically and still need teacher-quality review before promotion.
Scroll for proof, TeachClaw snippets, case studies, architecture, and application assets.

Evidence At A Glance

The story is not years of employment. It is dense founder/operator reps in AI workflow design, deployment, evals, runtime debugging, and direct user evidence.

95 real teacher messages reviewed across TeachClaw pilot usage.
100/100 teacher-style turns completed in a long-session routing comparison.
35 automated tests passed in the Story Trials full-stack project.
9 FastAPI gateway tests covering auth, evals, readiness, metrics, and audit events.
15/15 jeweller clienteling scenarios passed in deterministic local evals.

TeachClaw showcase

Not a prompt demo. A deployment loop.

TeachClaw turns teacher intent into finished school work: decks, worksheets, marking support, feedback, and memory-aware planning. The interesting engineering is the layer around the model: routing, deterministic artifact builders, trace checks, drift checks, and quality gates.

01

Teacher intent stays natural

A teacher can ask for a Year 11 Macbeth deck, a worksheet, a parent email, or marking support in normal language. The runtime classifies the task family before any artifact builder runs.

02

Routes become concrete tools

LLM output is not the finished product. For file-producing routes, the agent has to call deterministic builders like build-pptx.py or build-docx.py and return a real path, not a vague promise.

03

Evals check behavior, not vibes

Scenario packs assert route logs, expected output type, file extension, max tool calls, forbidden leakage, memory boundaries, and whether human quality judgement is still required.

04

Promotion is evidence-gated

The promotion summary captures commit SHA, built artifact hash, loaded runtime hash, scenario outcomes, risks, and whether live approval is safe. A mechanical pass can still be held back for teacher-quality review.

scenario.json routing
{
  "id": "ppt-macbeth-ambition-analysis",
  "family": "powerpoint",
  "goal": "Prove the English exam-analysis PPT lane",
  "expected_output": {
    "kind": "artifact_path",
    "extension": ".pptx"
  }
}
A real scenario shape, stripped to the public-safe part: task family, goal, and expected artifact.
trace_checks tool discipline
{
  "required_route_log_substrings": [
    "route=hybrid_staged"
  ],
  "required_exec_patterns": [
    "build-pptx.py --file ... .pptx"
  ],
  "max_exec_calls": 4
}
The eval does not just ask whether an answer exists. It checks the route, the tool call, and bounded execution.
promotion-summary.md release gate
marking-single-image:
  mechanical: pass
  quality: needs_judgement

worksheet-core-generation:
  mechanical: pass
  quality: needs_judgement

live checkpoint:
  explicit approval required
This is the deployment story employers care about: knowing when not to ship even after the code path works.

Why This Fits

I am earlier-career by title, but the work shape is already close to AI deployment: user workflow, LLM system, runtime, evals, launch proof, and iteration.

Positioning

I am not claiming ten years of enterprise delivery. I am claiming unusually direct founder/operator reps in the exact loop FDE and AI deployment teams care about: understand the workflow, build the AI path, ship it, debug it, and turn failures into evidence.

Deployment Loop

Workflow discovery Teacher, field-service, and clienteling workflows converted into concrete AI surfaces.
Production-shaped engineering FastAPI, Docker, CI, health checks, smoke checks, readiness gates, and hosted proof.
Eval-led iteration Scenario packs, source checks, memory boundaries, artifact checks, and promotion summaries.

Architecture Story

TeachClaw maps cleanly to AI deployment work: user workflow surfaces, a platform layer, an agent/runtime layer, deterministic builders, and a validation loop before live promotion.

Workflow first Teachers ask in chat. The system routes intent into concrete classroom artifacts.
Contracts over vibes LLM output becomes schema-shaped input for deterministic document builders.
Eval before promotion Routing, memory, artifacts, and delivery have local proof lanes before live smoke.
Runtime truth matters Source, gateway-loaded code, runtime mirrors, and live VPS payloads can drift.
TeachClaw and OpenClaw deployment architecture diagram