Liu, Zhao, Shang, Shen — VILA Lab, MBZUAI + UCL, 2026

Dive into Claude Code

Reverse-engineering the architecture of a production agentic coding system: five values, thirteen principles, and the design space every AI agent must navigate.

Prerequisites: LLM basics + Tool-use / function calling + Software architecture intuition
10
Chapters
4+
Simulations

Chapter 0: The Problem

You start typing a function name and your editor suggests the next line. That was 2021 — autocomplete. By 2024, you could ask a chat assistant to rewrite an entire file. But neither actually does anything. They suggest. You copy-paste. You run the tests. You fix what broke.

Now imagine something different: you describe a bug, and the tool reads the code, runs the failing test, edits three files, re-runs the test, sees it pass, and tells you what it did. The tool acts in your codebase, autonomously, in a loop.

This is the shift from suggestion to agency. And it introduces architectural requirements that have no counterpart in autocomplete tools.

The fundamental shift: An autocomplete tool maps input to output. An agent maps a goal to a sequence of actions, observing intermediate results and adjusting course. This means the system needs safety boundaries (what if it runs rm -rf /?), context management (what if the conversation exceeds the model's memory?), extensibility (what if the user needs custom tools?), and persistence (what if the user closes the terminal mid-task?).

This paper by Liu et al. reverse-engineers Claude Code — Anthropic's agentic coding tool — from its publicly available TypeScript source code. The goal is not to document one product, but to map the design space that every production AI agent must navigate: recurring questions about safety posture, context management, extensibility, delegation, and persistence.

The authors identify a remarkable ratio: only about 1.6% of Claude Code's codebase is AI decision logic. The remaining 98.4% is deterministic infrastructure — the operational harness that makes agency safe, reliable, and useful. The core agent loop is trivially simple. Everything interesting lives around it.

Running example throughout: The paper traces a single task — "Fix the failing test in auth.test.ts" — through every architectural layer: the agent loop, permission gates, tool dispatch, context assembly, subagent delegation, and session persistence. We will do the same.
Evolution of AI Coding Tools

Click through the four eras. Each adds new capabilities — and new architectural requirements.

What is the fundamental architectural difference between a code autocomplete tool and an agentic coding system?

Chapter 1: Five Values

Before looking at code, the paper asks a deeper question: what does the system believe matters? Every architectural decision in Claude Code traces back to five human values that its creators prioritize. These are not abstract philosophy — they produce concrete implementation choices.

1. Human Decision Authority

The human retains ultimate control. Not "the human can technically override," but "the architecture is designed so that humans can observe, approve, reject, interrupt, and audit." When Anthropic found that users approve 93% of permission prompts (approval fatigue), the response was not more warnings. It was restructuring the problem: defined sandboxed boundaries within which the agent works freely, reducing the number of decisions humans must make rather than adding more.

2. Safety, Security, and Privacy

Distinct from authority. Authority is the human's power to choose; safety is the system's obligation to protect even when that power lapses. The auto-mode threat model targets four risk categories: overeager behavior, honest mistakes, prompt injection, and model misalignment.

3. Reliable Execution

The agent does what the human actually meant, stays coherent over time, and supports verification. This spans single-turn correctness and long-horizon dependability across context boundaries, session resumption, and multi-agent delegation.

4. Capability Amplification

Approximately 27% of Claude Code-assisted tasks (per Anthropic's internal survey of 132 engineers) were work that would not have been attempted without the tool. The system enables qualitatively new workflows, not just faster existing ones. The architecture invests in deterministic infrastructure rather than decision scaffolding.

5. Contextual Adaptability

The system fits the user's specific project, tools, conventions, and skill level — and the relationship improves over time. Auto-approve rates increase from ~20% at fewer than 50 sessions to over 40% by 750 sessions. Trust is co-constructed, not fixed.

From values to principles: These five values are operationalized through thirteen design principles. For example, Human Decision Authority motivates "deny-first with human escalation" (unrecognized actions are blocked, not allowed). Capability Amplification motivates "minimal scaffolding, maximal operational harness" (don't constrain the model's choices — give it rich infrastructure to act within). Each principle answers a recurring design question that any production agent must face.
Value: Human Authority
Deny-first evaluation, graduated trust spectrum, append-only auditable state, externalized policy, values over rigid rules
Value: Safety
Defense in depth (layered mechanisms), deny-first defaults, reversibility-weighted risk assessment, isolated subagent boundaries
Value: Reliability
Context as scarce resource with progressive management, append-only durable state, graceful recovery
Value: Capability
Minimal scaffolding / maximal harness, composable extensibility, reversibility-weighted risk
Value: Adaptability
Transparent file-based memory, composable multi-mechanism extensibility, graduated trust, externalized programmable policy
What the architecture does NOT do: It does not impose explicit planning graphs on the model's reasoning. It does not provide a single unified extension mechanism. It does not restore session-scoped trust state across resume. These absences are consistent with the principles — they are deliberate design choices, not oversights.
Why did Anthropic respond to the 93% permission-approval rate by reducing the number of decisions rather than adding more warnings?

Chapter 2: The Agent Loop

The core of Claude Code is a while loop. Seriously. The queryLoop() function in query.ts is an async generator that repeats: call the model, check if the response contains tool calls, execute the tools, feed results back, repeat. When the model produces only text (no tool calls), the turn is complete.

The deceptive simplicity: The loop itself is trivial. But it sits inside a massive operational harness: context assembly before every model call, a permission gate for every tool invocation, a compaction pipeline that fires pre-call, recovery mechanisms for token limits and API failures, and streaming tool execution with concurrency control. The loop is the kernel. Everything else is the operating system.

The Turn Pipeline

Each iteration of the loop follows a fixed sequence:

  1. Settings resolution. Immutable parameters: system prompt, user context, permission callback, model config.
  2. Mutable state. A single State object stores messages, tool context, compaction tracking, recovery counters. Updated via whole-object assignment at seven "continue sites."
  3. Context assembly. Retrieve messages from the last compact boundary forward. Compacted content is represented by its summary, not the original.
  4. Pre-model shapers. Five context shapers execute sequentially (Chapter 4).
  5. Model call. Stream the response with assembled messages, system prompt, tool schemas, thinking config.
  6. Tool-use dispatch. If the response contains tool_use blocks, route to the tool orchestration layer.
  7. Permission gate. Every tool request passes through the permission system (Chapter 3).
  8. Tool execution + result collection. Results are appended as tool_result messages; loop continues.
  9. Stop condition. No tool_use blocks in the response = turn complete.

Tool Dispatch: Concurrent Reads, Serial Writes

When the model emits multiple tool calls, the StreamingToolExecutor begins executing them as they stream in, reducing latency. Read-only operations (file reads, searches) run in parallel. State-modifying operations (shell commands, file edits) are serialized. A sibling abort controller fires when any Bash tool errors, killing other in-flight subprocesses.

Stop Conditions

Five conditions can terminate the loop:

ReAct pattern: This follows the ReAct pattern (Yao et al., 2022): the model generates reasoning and tool invocations, the harness executes actions, results feed the next iteration. Alternatives include explicit graph-based routing (LangGraph) and tree-search methods (LATS). Claude Code trades search completeness for simplicity and latency: each turn commits to one action sequence without backtracking.

Recovery Mechanisms

The loop includes several self-healing behaviors:

Interactive Agent Loop

Watch the task "Fix failing test in auth.test.ts" flow through the agent loop. Click Step to advance one iteration, or Auto to animate.

Why is the core agent loop described as "deceptively simple"?

Chapter 3: Permission & Safety

When the model decides to run npm test to reproduce the auth test failure, the request enters a multi-layered permission pipeline. The default posture: deny or ask, never allow silently.

Seven Permission Modes

The system offers a graduated autonomy spectrum — from fully supervised to nearly autonomous:

ModeBehaviorAutonomy
planModel creates a plan; execution only after user approvalLowest
defaultStandard interactive use; most operations need user approvalLow
acceptEditsFile edits + certain shell commands auto-approved; others need approvalMedium
autoML classifier evaluates safety; auto-approves or escalatesHigh
dontAskNo prompting, but deny rules still enforcedHigher
bypassPermissionsSkips most prompts; safety-critical checks and bypass-immune rules remainHighest
bubbleInternal-only: subagent permissions escalate to parent terminalN/A

Seven Independent Safety Layers

A request must pass through all applicable layers. Any single layer can block it:

  1. Tool pre-filtering: Blanket-denied tools are removed from the model's view before it can even try to invoke them.
  2. Deny-first rule evaluation: Deny rules always beat allow rules, even when the allow rule is more specific. A broad "deny all shell commands" cannot be overridden by a narrow "allow npm test."
  3. Permission mode constraints: The active mode determines baseline handling for requests that match no explicit rule.
  4. Auto-mode classifier: An ML-based classifier evaluates tool safety — can deny requests the rule system would allow.
  5. Shell sandboxing: Even approved shell commands may execute inside a sandbox restricting filesystem and network access.
  6. Non-restoration on resume: Session-scoped permissions are not restored when resuming — users must re-grant.
  7. Hook-based interception: PreToolUse hooks can modify permission decisions; PermissionRequest hooks can resolve asynchronously.
Defense in depth, not defense in series: These layers operate in parallel. The independence assumption is that if one layer fails, others catch the violation. But the paper notes a real tension: security researchers found that commands with 50+ subcommands fall back to a single generic prompt (because per-subcommand parsing caused UI freezes). Defense-in-depth fails when layers share failure modes.

The Auto-Mode Classifier

When enabled, the classifier loads a base system prompt, an external permissions template, and (for internal users) a separate internal template. It evaluates the proposed tool invocation against the conversation transcript and produces: allow, deny, or request manual approval.

Crucially, when a deny occurs, the system treats it as a routing signal, not a hard stop. The model receives the denial reason, revises its approach, and attempts a safer alternative in the next loop iteration.

Pre-trust initialization vulnerability: Two independently verified CVEs share a root cause: code executing during project initialization (hooks, MCP server connections, settings resolution) runs before the interactive trust dialog is presented. This reveals that the permission pipeline captures spatial ordering (which layers check what) but not temporal ordering (when each layer becomes active during startup).
Why does Claude Code use deny-first rule evaluation where deny rules always override allow rules, even more specific ones?

Chapter 4: Context Management

By the time our "fix auth.test.ts" task has run a few iterations, the context window is filling up: the original request, npm test output, file reads, error messages, edit attempts, re-test outputs. The context window (200K–1M tokens) is the binding resource constraint — the one resource that, when exhausted, halts everything.

Claude Code does not use simple truncation. It uses a five-layer compaction pipeline that applies progressively more aggressive compression, escalating only when cheaper strategies prove insufficient.

The Five Layers

Layer 1: Budget Reduction
Always active. Enforces per-message size limits on tool results. Replaces oversized outputs with content references. Cheap, targeted, lossless for small outputs.
Layer 2: Snip
Feature-gated. Lightweight trim of older history segments. Returns {messages, tokensFreed, boundaryMessage}. Quick, removes temporal depth.
Layer 3: Microcompact
Feature-gated. Fine-grained compression with an optional cache-aware path. When enabled, boundary messages are deferred until after the API response so they can use actual cache deletion counts rather than estimates.
Layer 4: Context Collapse
Feature-gated. A read-time projection — replaces the message array with a virtual view. The full history remains available for reconstruction, but the model sees the collapsed version. Nothing is mutated.
Layer 5: Auto-compact
User-configurable. The nuclear option: a full model-generated summary via a separate compaction call. Fires only when all four previous layers are insufficient.
Lazy degradation principle: Apply the least disruptive compression first. Budget reduction costs nothing. Snip is fast. Microcompact is clever. Context collapse is virtual. Auto-compact is expensive (a separate model call). Each layer runs only if previous layers left context pressure unresolved. The graduated design contrasts with simpler systems that use single-pass truncation or a single summarization step.

Beyond Compaction: Other Context-Saving Decisions

Context pressure shapes decisions across the entire system, not just the compaction pipeline:

Context Window Assembly Order

Six layers are assembled into the context window, each loaded at different times:

  1. System layer (startup): System prompt, environment info (git status), skill descriptions, MCP tool names, output styles.
  2. Project config (startup / lazy): CLAUDE.md hierarchy (5 levels: managed, user, project, local, directory-specific). Path-scoped rules load lazily.
  3. Memory (startup): Auto memory entries, prefetched asynchronously.
  4. Conversation (carry forward): History + subagent summaries, subject to compaction.
  5. Runtime (carry forward): File reads, command outputs, tool results.
  6. On-demand (lazy): Deferred tool definitions loaded via ToolSearch.
The transparency trade-off: Compression is largely invisible to the user. When budget reduction replaces a long tool output, when context collapse substitutes a summary, or when snip trims older history, the user has no easy way to inspect what was lost. The five-layer design achieves effective management at the cost of opacity.
Compaction Pipeline

Drag the Context Pressure slider to see which compaction layers activate. At low pressure, nothing fires. As pressure increases, each layer kicks in progressively.

Context Pressure 20%
Why does Claude Code use five compaction layers instead of a single summarization step?

Chapter 5: Extensibility

Once Claude is trying to repair auth.test.ts and the npm test command has passed through the permission system, the next question is: what tools are available for the repair? The model sees not just built-in tools like BashTool and FileReadTool, but also database queries from an MCP server, a custom lint skill, and tools from an installed plugin. These arrive through four distinct mechanisms.

Why Four Mechanisms?

A natural question. The answer lies in context cost. Different kinds of extensibility consume different amounts of the bounded context window, and a single mechanism cannot span the full range without forcing unnecessary trade-offs.

MechanismUnique CapabilityContext CostInsertion Point
MCP ServersExternal service integration (multi-transport: stdio, SSE, HTTP, WebSocket)High (tool schemas)Tool pool
PluginsMulti-component packaging + distribution (10 component types)Medium (varies)All three points
SkillsDomain-specific instructions + meta-tool invocationLow (descriptions only)Context injection
HooksLifecycle interception + event-driven automation (27 event types)Zero by defaultPre/post tool execution
The graduated cost ordering: Hooks: zero context. Skills: low (only frontmatter descriptions stay in the prompt). Plugins: medium (varies by components). MCP servers: high (full tool schemas). This means cheap extensions can scale widely without exhausting the context window, while expensive ones are reserved for cases that genuinely require new tool surfaces.

Three Injection Points

Every agent loop iteration has three phases where extensions can plug in:

Tool Pool Assembly

The assembleToolPool() function is the single source of truth for combining built-in and external tools. It follows a five-step pipeline: base tool enumeration (up to 54 tools), mode filtering, deny rule pre-filtering, MCP tool integration, and deduplication (built-ins take precedence).

27 hook events: The hook system spans tool authorization (PreToolUse, PostToolUse, PermissionRequest, PermissionDenied), session lifecycle (SessionStart, SessionEnd, Setup, Stop), user interaction (UserPromptSubmit, Elicitation), subagent coordination (SubagentStart, SubagentStop, TeammateIdle), context management (PreCompact, PostCompact, InstructionsLoaded), and workspace events (CwdChanged, FileChanged, WorktreeCreate). Of these, 5 are safety-related; the remaining 22 serve lifecycle and orchestration.
Extension Surface Comparison

Click each mechanism to see where it plugs in and what it costs. The bar height shows context cost; arrows show injection points.

Why does Claude Code use four extension mechanisms instead of one unified tool API?

Chapter 6: Subagent Delegation

When Claude determines that fixing the auth test requires first understanding the authentication module's structure, it can delegate this exploration to a subagent. The delegation mechanism is the Agent tool — a meta-tool that spawns an isolated child agent running its own instance of the same queryLoop().

Built-in Subagent Types

Up to six types, depending on feature flags:

Beyond built-ins, users define custom subagents via .claude/agents/*.md files. Each file's markdown body is the agent's system prompt, and YAML frontmatter specifies tools, model, permissions, hooks, memory scope, and isolation mode. A custom agent is a fully configured, isolated sub-system.

Three Isolation Modes

ModeHowTrade-off
WorktreeCreates a temporary git worktree — the subagent gets its own copy of the repositoryFilesystem-level separation with zero external dependencies
RemoteLaunches in a remote Claude Code environment (internal-only), always backgroundFull environment isolation but requires infrastructure
In-processShares the filesystem with parent but has its own isolated conversation contextLightest weight, but file conflicts possible
Summary-only return: Subagents return only their final response text and metadata to the parent. The full subagent conversation history never enters the parent's context window. This is a critical context-conservation choice. Agent teams consume approximately 7x the tokens of a standard session — making summary-only return essential for preventing context explosion.

Permission Override Logic

When a subagent defines a permissionMode, the override applies unless the parent is already in bypassPermissions, acceptEdits, or auto mode — those always take precedence because they represent explicit user decisions about autonomy.

For async agents: explicit canShowPermissionPrompts checks first, then bubble mode (always show, since they escalate to parent), then default (sync = show, async = don't).

Sidechain Transcripts

Each subagent writes its own transcript as a separate .jsonl file with a .meta.json metadata file. This sidechain design means subagent histories are preserved for debugging and auditing but do not inflate the parent's session file.

The key difference from SkillTool: SkillTool injects instructions into the current context window. AgentTool spawns a new, isolated context window. The trade-off: most subagent invocations require a self-contained prompt because the default path does not inherit the parent's conversation history.
Why do subagents return only summary text to the parent, not their full conversation history?

Chapter 7: Session Persistence

By now our auth-test task has accumulated a full transcript: the original prompt, tool invocations and results, compaction boundaries, and the subagent summary. The question: which artifacts are durably recorded, and what can be recovered later?

Append-Only JSONL Transcripts

Session transcripts are stored as mostly append-only JSONL files (with explicit cleanup rewrites as a rare exception). Every event is human-readable, version-controllable, and reconstructable without specialized tooling. Three channels operate independently:

  1. Session transcripts: Conversation records (user, assistant, attachment, system messages + compaction events + metadata). One file per session, project-scoped.
  2. Global prompt history: User prompts only, stored in history.jsonl. Supports Up-arrow and Ctrl+R navigation.
  3. Subagent sidechains: Separate .jsonl + .meta.json per subagent.
Append-only is a deliberate choice: It favors auditability and simplicity over query power. Every event is preserved, enabling resume, fork, and audit. The cost: richer queries like "show me all tool calls that modified file X across sessions" require post-hoc reconstruction rather than direct lookup. A database-backed alternative would enable richer queries but introduce deployment dependencies and reduce transparency.

Resume, Fork, and NOT Restoring Permissions

The --resume flag rebuilds the conversation by replaying the transcript. Fork creates a new session from an existing one. But neither restores session-scoped permissions. Users must grant them again.

This is a deliberate safety-conservative choice: sessions are treated as isolated trust domains. Restoring previously granted permissions on resume would risk carrying stale trust decisions into a changed context. The architecture accepts user friction as the cost of the safety invariant that trust is always established in the current session.

Compaction + Persistence Integration

The compact_boundary marker records headUuid, anchorUuid, and tailUuid. These UUIDs enable the session loader to patch the message chain at read time. The mostly-append design means compaction never modifies or deletes previously written transcript lines; it only appends new boundary and summary events.

File-history checkpoints: The "checkpoints" in Claude Code are file-level snapshots stored at ~/.claude/filehistory/<sessionId>/. They support --rewind-files for reverting filesystem changes — these are file snapshots, not a generic checkpoint store.
Why does resume NOT restore session-scoped permissions?

Chapter 8: OpenClaw Contrast

The paper does not just analyze Claude Code. It compares it with OpenClaw, an independent open-source AI agent system that answers the same design questions from a completely different deployment context. OpenClaw is a local-first WebSocket gateway connecting ~24 messaging surfaces (WhatsApp, Telegram, Slack, Discord, Signal) to an embedded agent runtime.

The comparison reveals that the design questions are stable — every agent must answer them. But the answers vary with context.

Six Dimensions of Contrast

DimensionClaude CodeOpenClaw
System scopeEphemeral CLI process, single repositoryPersistent WebSocket gateway daemon, multi-channel control plane
Trust modelDeny-first per-action evaluation + ML classifier; 7 modesSingle trusted operator per gateway; DM pairing + allowlists; opt-in sandboxing
Agent runtimequeryLoop() async generator IS the system centerAgent runner is embedded INSIDE a larger gateway dispatch
Extension arch4 mechanisms at graduated context costsManifest-first plugin system with 12 capability types + central registry
Memory/contextCLAUDE.md 4-level hierarchy; 5-layer compactionWorkspace bootstrap files (AGENTS.md, SOUL.md, etc.); dreaming for long-term memory; hybrid vector+keyword search
Multi-agentTask-delegating subagents; worktree isolation; summary-only returnMulti-agent routing with isolated agents + sub-agent delegation with depth limits
Opposite bets: Claude Code invests in graduated per-action safety evaluation. OpenClaw invests in perimeter-level identity and access control. Claude Code treats the agent loop as the architectural center. OpenClaw treats the gateway control plane as the center, embedding the agent loop as one component. Claude Code's extensions modify a single context window. OpenClaw's plugins extend a shared gateway surface. These inversions follow from different trust models and deployment topologies.

Three Observations from the Contrast

  1. The questions are universal. Where reasoning lives, what safety posture to adopt, how to manage context, how to structure extensibility — OpenClaw answers all of them, but from the starting point of a multi-channel personal assistant.
  2. The systems make opposite bets on several dimensions. Per-action vs. perimeter safety. Agent loop as center vs. as component. Single-window vs. gateway-wide extensions.
  3. They compose. OpenClaw can host Claude Code as an external coding harness via ACP (Agent Client Protocol). The design space is layered, not flat: gateway-level and task-level systems can stack.
The most fundamental divergence: Claude Code is an ephemeral CLI process that starts and ends with the terminal. OpenClaw is a persistent daemon that owns all messaging connections and coordinates clients, tools, and device nodes. This difference in system scope determines how every other design question is framed.
What is the most important insight from comparing Claude Code with OpenClaw?

Chapter 9: Connections

This paper maps a design space. Let's connect it to the broader landscape and surface the open questions that remain.

Six Open Directions

  1. The Observability-Evaluation Gap. Industry surveys estimate 78% of AI failures are invisible. The architecture gives operators visibility into tool calls, hooks, and transcripts — but nearly 89% of teams adopt observability while only 52% adopt offline evaluation. Closing this gap likely requires generator-evaluator separation inside the harness, not just model improvements.
  2. Cross-Session Persistence. What belongs between static instructions (CLAUDE.md) and a single session's transcript? Durable state that accumulates across sessions — learned strategies, reusable procedures, relationship evolution. The experiential tier is the natural next step.
  3. Harness Boundary Evolution. The space of interesting harness combinations doesn't shrink as models improve — it moves. Four axes: where (local vs. cloud), when (reactive vs. proactive), what (text vs. multimodal vs. physical), with whom (single agent vs. role-differentiated teams).
  4. Horizon Scaling. Current architecture units are turn, session, and subagent. What happens when autonomous work extends to days or weeks? Multi-session research programs test whether compaction, summary-only return, and append-only persistence remain sufficient.
  5. Governance at Scale. The EU AI Act (fully applicable August 2026) and evolving copyright jurisprudence may impose external constraints on logging, transparency, and human oversight. The deny-first evaluation is internally auditable but not yet externally auditable in the forms emerging frameworks contemplate.
  6. Long-Term Human Capability. The most provocative open question: while the architecture amplifies short-term capabilities, it offers limited mechanisms that explicitly preserve long-term human understanding, codebase coherence, or the developer pipeline. Future systems could treat this sustainability gap as a first-class design problem.
The provocative finding: A randomized controlled trial found AI tools made experienced developers 19% slower despite a perceived 20% improvement. A causal analysis of 807 repositories found code complexity increased by 40.7% after AI adoption. An EEG study found weakened neural connectivity that persisted after AI was removed. Whether architecture can respond to these signals — through comprehension-preserving surfaces, generator-evaluator separation for the human loop, or mechanisms not yet named — is the deepest open question the paper raises.

Relation to Other Architectural Patterns

Cheat Sheet

AspectClaude Code
Core loopwhile-true: model call → tool dispatch → result append → repeat
Design philosophy1.6% decision logic, 98.4% deterministic infrastructure
Permission system7 modes, deny-first, ML classifier, 7 independent safety layers
Context management5-layer graduated compaction pipeline
Extensibility4 mechanisms at graduated context costs (hooks → skills → plugins → MCP)
DelegationSubagents with isolated context, summary-only return, worktree isolation
PersistenceAppend-only JSONL; no permission restoration on resume
Safety postureDeny-first with human escalation; defense in depth
Tool poolUp to 54 built-in + MCP tools via assembleToolPool()
Key insightThe design questions are universal; the answers vary with deployment context
The broader lesson: Production coding agents are converging toward operating-system-like abstractions. The core loop is the kernel. The permission system, context management, tool routing, extensibility, and persistence are the OS. As frontier models converge in capability, the quality of this surrounding harness becomes the principal differentiator — validating an architecture that invests in infrastructure over decision scaffolding.
What recurring design pattern does the paper identify across all six subsystems of Claude Code (safety, context, extensibility, delegation, persistence, loop)?