C
Claude Mastery Hubground → extreme
mastered 0%
0 / 109
a single-file system for getting the most out of Claude

Master Claude from the ground up to the frontier.

Every capability, prompting technique, agentic pattern, and production skill — ladder-ordered, animated, and interactive. From your first prompt to systems that improve themselves.

22 domains 109 cards bird's-eye system map cheat codes + L0–L5 animated, interactive flows progress saved locally build-your-own workflow OS
iterate until it is rightYou — intent + contextYouintent + contextPrompt — shaped askPromptshaped askClaude — reasonClaudereasonTools — search·code·skillsToolssearch·code·skillsArtifact — verifiable outputArtifactverifiable outputYou — inspect·refineYouinspect·refine
Ground

Foundations

Right surface, right model, a clean prompt.

Core

Craft

Steer with examples, structure, context + memory.

Advanced

Systems

Tools, agents, RAG, evals, the API.

Extreme

Frontier

Systems that measure and improve themselves.

00
Orientation

The Map

How to read this hub, and the ladder you are climbing. Master each rung before you lean on the next — prompting craft compounds; agentic patterns sit on top of it.

Start here

How this hub works

Twelve domains, low to high. Each card opens to four moves: Concept (the idea), Trade-offs (where it bites), On the job (how it shows up in real work), and a Mastery move (the thing that separates fluent from competent).

Tap a card to expand; tap the square to mark it mastered — progress persists in your browser.
Search filters every card live. The left rail shows per-domain completion.
Animated diagrams are the spine: watch the data move, not just the boxes.
Mastery moveDon't read it front-to-back like a manual. Treat domain 11 (Your Operating System) as the destination, and pull the earlier domains in as you need them.
The ladder

Ground → Extreme: the climb

ConceptCapability with Claude is layered. Skipping rungs is why people plateau: they reach for multi-agent orchestration while still writing vague single prompts.
TierYou can…Failure if skipped
Groundpick the right surface + model, write a clean promptwasted tokens, wrong tool
Coresteer with examples, structure, context + memorybrittle, one-shot results
Advancedcompose tools, agents, RAG, evals, the APIdemos that never reach production
Extremeorchestrate systems that improve themselvesscale without control or trust
Mastery moveYour edge is that you already live at Advanced/Extreme in code. The gap most engineers have is Core craft — prompting discipline. Tightening that multiplies everything above it.
01
Orientation

Bird's-eye View

The whole system in one frame before you dive in: you, the Claude core, the tools it reaches for, the surfaces it ships through, and the model tiers above — up to the Fable 5 horizon.

The map

Everything, at a glance

ConceptStrip away the detail and every interaction is the same shape: your intent + context + data flow into the context window, Claude reasons and optionally reaches for tools, and the result ships through whichever surface you chose — all while you climb the mastery tiers underneath.
Intent — what you wantIntentwhat you wantContext — schema · rulesContextschema · rulesData — the messy inputDatathe messy inputWeb search — pull current factsWeb searchSkills — procedural expertiseSkillsCode exec — run code, build filesCode execMCP / tools — act on your systemsMCP / toolsClaude core — reasons over the context windowClaudereason · plan · actthe context windowChat — think + draftChatProjects — durable workspaceProjectsClaude Code — agentic reposClaude CodeCowork — multi-step workCoworkAPI / Platform — embed in productsAPI / PlatformChrome·Excel·PPT·Design — act in appsChrome·Excel·PPT·Designsurfaces → deliverablesyou → inputsGroundCoreAdvancedExtremeyour climb
you → core ↔ tools → surfaces · over the four mastery tiers
Mastery moveCome back to this frame whenever a task feels tangled. Ask: which input is missing, which tool is the answer, and which surface should this even happen in? Three questions, most problems dissolved.
Leveling

The six usage levels (L0 → L5)

ConceptMatch the level to the task. Most people get stuck running L1 prompts at L4 problems — or burning L4 machinery on an L1 question.
L0 — askL0askL1 — steerL1steerL2 — structureL2structureL3 — toolsL3toolsL4 — agentsL4agentsL5 — systemsL5systems
ask → steer → structure → tools → agents → systems
LevelYou're doing…Looks like
L0 · Askraw questions“explain X”
L1 · Steeradding role, format, constraints“as a reviewer, table only”
L2 · Structureexamples, tags, chained promptsmultishot + sections
L3 · Toolssearch, code, files, connectorsgrounded + computed answers
L4 · AgentsReAct, orchestrator, RAGautonomous multi-step work
L5 · Systemsevals, optimization, product OSself-improving pipelines
Mastery moveName the level a task needs before you start. It tells you immediately how much prompt, how much tooling, and how much verification to bring.
Model tiers

The tier ladder & the Fable 5 horizon

ConceptCapability rises as you climb the model ladder; access tightens at the top. The three public tiers cover everything you do today. Above them sits the frontier Mythos tier.
Haiku 4.5 — fast · high-volumeHaiku 4.5fast · high-volumeSonnet 4.6 — the daily workhorseSonnet 4.6the daily workhorseOpus 4.8 — hard reasoning · frontier publicOpus 4.8hard reasoning · frontier publicMythos 5 — Mythos tier · restrictedMythos 5Mythos tier · restricted◉ lockedFable 5 — safety-hardened sibling · horizonFable 5safety-hardened sibling · horizon◉ lockedcapability ↑ · access ↓
capability up · access down

Mythos 5 and Fable 5 share one underlying frontier model — Fable is the safety-hardened sibling, with extra measures around sensitive domains. Plus Mythos Preview, used by a small set of trusted orgs under Project Glasswing. Access to this tier is currently restricted, so treat it as horizon, not daily tooling.

Trade-offs & gotchas
Don't architect today's pipeline around a tier you can't reliably call.
Frontier capability is a moving target — verify availability before you depend on it.
Mastery moveBuild on the stable rungs (Haiku / Sonnet / Opus) and design so a model swap is one config line. When the frontier opens up, you inherit the gains for free — no rewrite.
02
Ground Level

Foundations

What Claude is, the surfaces it lives in, which model to reach for, and the anatomy of a prompt that doesn't waste a turn.

Mental model

What Claude actually is (and isn't)

ConceptA large language model that predicts useful continuations of text, wrapped in tools, memory, and surfaces. It reasons over what's in its context window — it has no live view of your machine, your databases, or today's web unless a tool fetches it in.
Trade-offs & gotchas
No persistent state between separate chats except what memory/projects carry over.
Confident phrasing is not evidence of correctness — it will assert wrong facts fluently.
Knowledge has a training cutoff; anything current must come through search or you.
In practice
When you ask about a library you 'know', its current version may differ from training — paste the docs or let search fetch them rather than trusting recall.
Mastery moveStop asking 'does it know X'. Ask 'is X in the context window'. If not, put it there or give it a tool to get it. That single reframe fixes most bad outputs.
Selection

Which model to reach for

ConceptThree publicly available tiers, plus a frontier Mythos tier. Match model to the job's difficulty and your latency/cost budget — not 'biggest is best'.
ModelReach for it whenTrade
Opus 4.8hard reasoning, architecture, gnarly debugging, eval designslower, priciest
Sonnet 4.6the daily workhorse — most coding, drafting, analysisbalanced
Haiku 4.5high-volume, latency-sensitive, simple classification/extractionleast depth

Above these sits the Mythos tier (Mythos 5 / Fable 5, and Mythos Preview under Project Glasswing) — not generally available, and access is currently restricted. Treat it as horizon, not daily tooling.

Mastery movePrototype on Opus to find the ceiling of what's possible, then drop to Sonnet/Haiku for the production path once the prompt is proven. Pin the API string (e.g. claude-sonnet-4-6) so a model swap is a one-line change.
Surfaces

The surfaces — pick the right room

ConceptSame intelligence, different ergonomics. The surface decides how much context, autonomy, and tooling Claude gets.
SurfaceBest for
Chat (web/desktop/mobile)thinking, drafting, one-off analysis, learning
Projectsa durable workspace with shared knowledge + chat history
Claude Codeagentic coding from terminal / IDE / desktop — repos, multi-file
Coworkmulti-step knowledge work that spans files + tools
Chrome / Excel / PowerPoint / Designacting inside a browser, sheet, deck, or canvas
API / Platformembedding Claude into your own products + pipelines
In practice
A multi-file refactor belongs in Claude Code (it reads the repo); a polished deck belongs in the pptx skill; a one-off SQL question belongs in chat.
Mastery moveThe surface is a context decision. Don't paste 2,000 lines into chat when Claude Code can read the file itself — you're paying tokens to recreate state the tool already has.
Craft

Anatomy of a prompt that lands

ConceptA strong prompt usually carries five things: role/context, the task, the inputs, the constraints, and the output shape. Missing any one is where vague answers come from.
iterate until it is rightYou — intent + contextYouintent + contextPrompt — shaped askPromptshaped askClaude — reasonClaudereasonTools — search·code·skillsToolssearch·code·skillsArtifact — verifiable outputArtifactverifiable outputYou — inspect·refineYouinspect·refine
intent → shaped prompt → reason → tools → artifact → refine
Role: You are a senior data engineer reviewing a SQL Server ingestion script.
Task: Find correctness bugs and race conditions; ignore style.
Input: <the script below>
Constraints: Windows/pyodbc; preserve ? placeholders; pure-ASCII output.
Output: a table of {line, issue, severity, fix} — nothing else.
Mastery moveLead with the output shape when you know it. 'Give me a table of {line, issue, fix}' constrains the whole response and kills rambling preambles before they start.
Mental model

The context window is your RAM

ConceptEverything Claude can use this turn — system prompt, project knowledge, memory, your message, tool results — lives in one finite window. It's working memory, not a hard drive.
Trade-offs & gotchas
Stuffing irrelevant history dilutes attention on what matters — more is not better.
Very long contexts can bury the key instruction; put critical constraints near the end.
Each tool result consumes window space too.
In practice
For a million-row reasoning task you don't paste a million rows — you paste the schema, a representative sample, and the rule, and let code or retrieval handle the volume.
Mastery moveCurate context like you'd curate a function's arguments: minimal, relevant, ordered. The skill isn't filling the window — it's deciding what earns a place in it.
03
The Lineage

Model Evolution

Every Claude from the first release to today's frontier — the lineage, the tier system, the capability milestones, and the Mythos tier above Opus (including why Fable 5 access stopped).

History

Claude 1 → Fable 5 — the timeline

ConceptClaude began in March 2023 and has shipped across four major generations, with the Opus / Sonnet / Haiku tier system arriving with Claude 3 and a new Mythos tier landing in 2026.
Claude 1 — 2023Claude 12023Claude 2 — 2023Claude 22023Claude 3 — 2024Claude 320243.5 / 3.7 — 2024–253.5 / 3.72024–25Claude 4 — 2025Claude 42025Opus 4.8 — 2026Opus 4.82026Fable 5 — 2026Fable 52026
first release → today's frontier
EraModelYearWhat it brought
Pre-tierClaude 1 / InstantMar 2023first public release (selected users)
Claude 2 / 2.1Jul–Nov 2023public access, 200K context, system prompts
Claude 3Opus / Sonnet / Haiku 3Mar 2024the three-tier system + vision
Claude 3.5/3.73.5 Sonnet → 3.7 Sonnet2024–25Artifacts, Computer Use, extended thinking
Claude 4Sonnet 4 / Opus 4 → 4.82025–26agents, MCP, 1M context, agent teams
MythosFable 5 / Mythos 5Jun 2026the frontier tier above Opus

Sources: Anthropic models overview (platform.claude.com/docs) and the public release history. Dates are approximate launch dates.

Mastery moveThe recurring pattern: each new Sonnet tends to match or beat the previous generation's Opus at a fraction of the cost. 'Newer mid-tier' often beats 'older flagship' — re-evaluate your default model every generation.
Structure

The tier system + version naming

ConceptSince Claude 3, each generation ships in named size tiers — Haiku (fast, cheap), Sonnet (balanced), Opus (max capability) — with a new Mythos class now sitting above Opus.
TierOptimized for
Haikulatency + high volume + low cost
Sonnetthe balanced daily workhorse
Opushard reasoning, long-horizon agentic work
Mythos (Fable / Mythos 5)frontier capability — restricted access
Naming is generation.version — the '4' is the architecture generation, '.8' a capability increment within it.
Model IDs are pinned snapshots (e.g. claude-opus-4-8) — not moving pointers.
Older models get deprecated and retired on a published schedule; pin a string you control.
Mastery moveDon't hard-code a tier into your mental model of a task — hard-code the requirement (speed vs depth vs cost) and let the current best model in that lane fill it.
Capabilities

Capability milestones

ConceptThe lineage isn't just bigger models — each step unlocked a new capability that changed what you can build.
WhenMilestone
Mar 2024Vision (read images, charts, documents)
Jun 2024Artifacts — the side-panel rendering canvas
Oct 2024Computer Use — control a desktop / browser
Feb 2025Extended thinking — the first public reasoning mode
May 2025Code execution, MCP connectors, Files API
Feb 20261M-token context + agent teams (multi-agent)
Apr 2026Effort levels (high / xhigh), stronger vision + self-verification
Mastery moveWhen something feels impossible, check whether a newer capability quietly made it possible — the ceiling moves every few months, and yesterday's workaround is often today's one-liner.
Frontier

The Mythos tier — Fable 5 & Mythos 5

ConceptIn June 2026 Anthropic introduced a Mythos-class tier above Opus. Fable 5 and Mythos 5 share the same underlying frontier model; they differ mainly in safeguards.
ModelWhat it is
Fable 5the generally-available Mythos-class model, with classifiers that block high-risk (e.g. cyber) outputs
Mythos 5the same model with some of those constraints removed, for separately vetted orgs
Mythos Previewa research preview (Project Glasswing) for defensive cybersecurity, invite-only
Trade-offs & gotchas
This tier is the frontier — most capable, but with the tightest access controls.
Capability and availability move in opposite directions as you climb.
Mastery moveBuild on the stable public tiers and keep model selection a one-line config change. If frontier access opens to you, you inherit the gains without a rewrite.
Status

Why Fable 5 / Mythos 5 access stopped

ConceptAccess to both models was suspended on 12 June 2026 after a US government export-control directive — it is a regulatory action, not a model failure.
The directive, citing national-security authorities, barred access by any foreign national — inside or outside the US, including Anthropic's foreign-national staff.
Filtering by nationality in real time wasn't feasible, so Anthropic disabled both models for all customers to comply.
All other Claude models (Opus 4.8, Sonnet 4.6, Haiku 4.5) were unaffected.
Anthropic says the order appears to stem from a narrow, non-universal jailbreak, that equivalent capability exists in other public models, and that it believes this is a misunderstanding.

Anthropic's full statement: anthropic.com/news/fable-mythos-access. This situation may have changed — check Anthropic's announcements for the current status.

Mastery moveThe real lesson for builders: a model you depend on can vanish for reasons outside your control. Design a fallback path (e.g. auto-route to Opus 4.8) so a single model's removal degrades gracefully instead of breaking you.
04
Core

Prompting Craft

The highest-leverage skill on this page. Most quality gains come from steering better, not from a bigger model. Eight techniques you combine fluidly.

Technique

Specificity beats politeness

ConceptReplace adjectives with criteria. 'Make it better' carries no signal; 'cut it to 120 words, active voice, no adverbs, keep the three figures' is executable.
Trade-offs & gotchas
Over-specifying creativity kills it — leave open what should be open.
Contradictory constraints make Claude silently pick one; check for conflicts.
Rewrite this status update.
Keep: the three metrics, the blocker.
Drop: hedging, throat-clearing.
Tone: direct, peer-to-peer.
Length: <= 90 words.
Mastery moveBefore sending, ask: 'could a competent stranger execute this exactly one way?' If two readings exist, you'll get the one you didn't want.
Technique

Multishot — show, don't tell

ConceptOne or two worked examples (input → desired output) steer format and judgment far better than paragraphs of description. Include a negative example when a failure mode keeps recurring.
In practice
Classification shines here: give two solved cases — one clear match, one tricky edge that should stay 'uncertain' — and the rule generalizes without re-explaining every criterion.
Trade-offs & gotchas
Examples anchor hard — a biased example biases every output.
Too many examples crowd the window; 2–3 sharp ones beat ten loose ones.
Mastery moveWhen output format matters, your examples ARE the spec. Make them exactly the shape you want, down to the punctuation — Claude mirrors them.
Technique

Let it think before it answers

ConceptFor multi-step reasoning, ask for the reasoning first, answer last — or enable extended thinking. Quality on logic, math, and architecture jumps when the model isn't forced to commit instantly.
Work through this step by step, then give the final answer last.
1) restate the constraint
2) enumerate options
3) trade-offs
4) pick + justify
Trade-offs & gotchas
You pay latency/tokens for thinking — skip it on trivial asks.
For pure recall ('what's the flag for X'), thinking adds nothing.
Mastery moveSeparate the thinking from the deliverable: 'reason in a scratchpad, then give me ONLY the final JSON.' You get the benefit of deliberation without the mess in your output.
Technique

Structure with tags and sections

ConceptDelimiting parts of a prompt — with XML-ish tags or clear headers — removes ambiguity about what's instruction, what's data, and what's an example.
<context>schema + rules here</context>
<data>the messy rows</data>
<task>classify each row; output one JSON per line</task>
In practice
Tagging is the same instinct as labelling sections in a well-structured document: explicit boundaries so instruction, data, and example never blur together.
Mastery moveWhen you reference a section later ('using the rules in …'), tags let you point precisely instead of re-pasting. Cleaner prompts, fewer tokens.
Technique

Role and framing set the lens

ConceptA role ('staff SRE', 'skeptical reviewer', 'patent examiner') shifts vocabulary, depth, and what the model treats as important. Framing the stakes ('this ships to prod tonight') raises rigor.
Trade-offs & gotchas
Role-play is a lens, not a fact source — it won't invent expertise it lacks.
Over-theatrical personas can drift from the task; keep the role functional.
Mastery movePair the role with an adversary: 'You're the reviewer whose job is to find why this fails.' Critical framing surfaces issues that a helpful framing glosses over.
Technique

Control the output shape

ConceptState the format explicitly — JSON schema, markdown table, bullet list, word count, 'code only, no prose'. For machine consumption, demand strict JSON and nothing else.
Return ONLY valid JSON, no markdown fences, matching:
{"verdict":"matched|pending","reason":string,"confidence":0-1}
Trade-offs & gotchas
'JSON only' still occasionally wraps in fences — strip ```json defensively when parsing.
Rigid formats can truncate nuance; allow a free-text field if you need it.
Mastery moveFor pipelines, give the schema AND one example instance. Schema defines the contract; the example removes the last 5% of formatting drift.
Technique

Positive instructions + guardrails

ConceptTell Claude what to do, not only what to avoid — 'use plain ASCII' beats 'don't use unicode'. Reserve negatives for hard failure modes you've actually hit.
In practice
A good model case is encoding constraints positively: 'output plain ASCII' beats 'avoid weird characters' because it is concrete and testable.
Mastery moveEncode your recurring guardrails once and reuse them as a preamble block — an explicit, pasteable rule set.
Loop

The refine loop is the real skill

ConceptFirst output is a draft to react to, not a verdict on the prompt. Diagnose WHY it missed (ambiguous task? missing context? wrong format?) and fix that specific cause.
iterate until it is rightYou — intent + contextYouintent + contextPrompt — shaped askPromptshaped askClaude — reasonClaudereasonTools — search·code·skillsToolssearch·code·skillsArtifact — verifiable outputArtifactverifiable outputYou — inspect·refineYouinspect·refine
each pass tightens intent → output
Trade-offs & gotchas
Re-rolling the same prompt hoping for better is gambling, not iterating.
Long back-and-forth can accumulate contradictions — sometimes restart clean is faster.
Mastery moveKeep a running 'prompt diff' habit: when an output is wrong, change ONE variable and observe. That's how you learn what each lever actually does — the same discipline as your eval loops.
05
Core+

Power Techniques

The advanced moves that separate fluent operators from prompt-typists. Chaining, meta-prompting, verification loops — the techniques you reach for when one good prompt isn't enough.

Technique

Prompt chaining

ConceptSplit a complex task into a sequence of linked prompts — each does one transformation, and its output feeds the next. Easier to steer, debug, and verify than one mega-prompt.
Prompt A — extractPrompt AextractOutput A — clean dataOutput Aclean dataPrompt B — transformPrompt BtransformOutput B — resultOutput Bresult
each step does one thing, cleanly
In practice
A scrape → structure → summarize → match pipeline is a prompt chain at the macro level: never one prompt trying to do all four.
Trade-offs & gotchas
Errors propagate — validate each link's output before passing it on.
More calls = more latency/cost; chain only where a single step would be unreliable.
Mastery moveWhen a prompt keeps half-failing, don't tune it harder — split it. Two reliable steps beat one clever step that works 70% of the time.
Technique

Meta-prompting — let Claude write the prompt

ConceptHand Claude the goal and have it draft (or critique and improve) the prompt itself. It knows what makes prompts work, so this often beats writing from scratch.
meta-prompting loopGoal — fuzzy askGoalfuzzy askDraft prompt — Claude writes itDraft promptClaude writes itRun — try itRuntry itCritique — improve promptCritiqueimprove prompt
draft → run → critique → improve
I want to <goal>. Write the best prompt to get this from you.
Then critique your own prompt and give me the improved version.
Mastery moveUse it to bootstrap your reusable templates. Generate a strong first draft, then refine it against real outputs — you're meta-prompting your way to a permanent asset.
Technique

Chain-of-verification

ConceptHave Claude draft an answer, then generate the questions that would expose errors in it, answer those, and revise. A structured self-check that cuts hallucinations sharply.
chain-of-verificationDraft — first answerDraftfirst answerGen checks — what's wrong?Gen checkswhat's wrong?Run checks — verify eachRun checksverify eachRevise — correctedRevisecorrected
draft → generate checks → verify → revise
Answer the question. Then list 3 ways your answer could be wrong,
check each against the facts, and give a corrected final answer.
Mastery moveFor anything load-bearing — a figure, a claim, a regex — make verification a required step, not an optional one. It's the cheapest reliability you can buy.
Technique

Decomposition (least-to-most)

ConceptSolve the simplest sub-problem first, then build on it toward the full answer. Forces a clean reasoning path instead of one intuitive leap that hides errors.
Trade-offs & gotchas
Over-decomposing trivial tasks adds friction — reserve it for genuinely hard ones.
The split itself can be wrong; sanity-check the decomposition before solving.
Mastery moveThis is the task-segregation funnel applied inside a single answer. Same instinct, smaller scale — reach for it whenever a problem has dependent parts.
Technique

Prefilling & steering the start

ConceptOn the API you can prefill the beginning of Claude's reply to force a shape — start it with { to lock JSON, or with Step 1: to force structured reasoning. In chat, you steer the same way with a worked example.
messages=[{'role':'user','content': ask},
          {'role':'assistant','content':'{'}]  # forces JSON output
Trade-offs & gotchas
Prefill is API-only; in chat, examples do the steering.
A too-rigid prefill can cut off needed preamble — use it where format matters most.
Mastery movePrefill + 'JSON only' is the most reliable way to get clean machine-parseable output. Belt and braces for any pipeline that consumes Claude's response.
Technique

Reverse prompting — ask what it needs

ConceptBefore a big task, ask Claude what information would let it do the job well. It surfaces the gaps you didn't think to fill — turning a vague request into a precise one.
Before you start: what do you need from me to do this excellently?
List the questions, then wait for my answers.
Mastery moveGreat for kicking off any real project (a spec, an architecture, a tricky rewrite). Five minutes of its questions saves an hour of wrong-direction output.
Technique

Self-consistency (sample & vote)

ConceptFor hard reasoning, generate several independent attempts and take the answer they converge on. Trades cost for reliability on problems where a single pass is shaky.
Trade-offs & gotchas
Multiplies cost by the sample count — reserve for high-stakes, error-prone questions.
Needs a way to compare/aggregate answers — exact-match, or a judge.
Mastery movePair it with evals: if three samples disagree, that's a signal the task is under-specified or genuinely hard — useful information, not just a tie to break.
Technique

Persona & constraint stacking

ConceptLayer multiple lenses and limits at once — 'staff SRE, hostile reviewer, output a table, ASCII only, under 150 words'. Each constraint narrows the output toward exactly what you want.
Trade-offs & gotchas
Too many constraints can conflict — Claude silently drops one; check for contradictions.
Order matters: put the non-negotiables (format, hard rules) last so they stick.
Mastery moveBuild a standard 'stack' for each recurring job and reuse it. Your review-stack, your build-artifact-stack — debugged once, deployed forever.
06
Core

Context & Memory

Where durable knowledge lives so you stop re-explaining yourself. System prompts, projects, styles, preferences, memory, and reference-past-chats — layered.

Mental model

The four layers of context

ConceptContext arrives from four places, longest-lived to most immediate. Knowing which layer to put something in is the difference between repeating yourself forever and never again.
System — role + rulesSystemrole + rulesProject — durable knowledgeProjectdurable knowledgeMemory — across chatsMemoryacross chatsTurn — this askTurnthis ask
system → project → memory → this turn
LayerHoldsLifespan
System / custom instructionsrole, hard rules, toneevery chat
Projectsshared docs, schemas, standing contextwithin a project
Memoryfacts learned across chatsongoing, evolving
This turnthe immediate ask + tool resultsone message
Mastery movePromote anything you've said twice. A rule you re-type each chat belongs in preferences/system; a schema you keep pasting belongs in a project.
Feature

Projects — a workspace with a brain

ConceptA project bundles chats with shared knowledge files and instructions. Every chat in it inherits that context, and memory/search are scoped to the project.
In practice
One project per real thing — a product, a client, a study subject. Drop the schema, conventions, and key facts in as knowledge, and every chat starts already briefed.
Trade-offs & gotchas
Knowledge that's wrong or stale silently poisons every chat — curate it.
Cross-project context doesn't leak; that's a feature, but mind which project you're in.
Mastery moveTreat project knowledge like a README for Claude. The hour you spend writing it back pays itself off across every future conversation in that project.
Feature

Styles, preferences, and tone

ConceptPreferences capture standing format/tone wants ('concise, no fluff, prose over bullets'). Styles customize writing voice. Both apply silently so you stop restating them.
In practice
A 'concise, minimal formatting' preference is the classic case — set once, honoured everywhere, instead of restating it each chat.
Mastery moveKeep tone preferences and task rules separate. Tone goes in preferences; testable constraints (ASCII, schema) go in the prompt or project where you can verify them.
Feature

Memory and reference-past-chats

ConceptMemory carries facts about you and your work across conversations; reference-past-chats lets Claude search prior threads on demand. Together they create continuity without you re-briefing.
Trade-offs & gotchas
Memory updates in the background — the very latest chat may not be reflected yet.
It's a convenience, not a database — verify anything load-bearing.
You can edit/forget specific memories when something changes (new role, ended project).
Mastery moveAudit it occasionally. When a fact changes — you switch teams, a figure gets corrected — update memory explicitly so stale context doesn't quietly steer future answers.
Technique

Managing long / overflowing context

ConceptWhen the material exceeds what's useful to hold at once: chunk it, summarize rolling state, or offload to retrieval. Don't fight the window — architect around it.
Summarize-and-carry: compress finished sections into a running state block.
Map-reduce: process chunks independently, then merge.
Retrieve-on-demand: store everything, pull only the relevant slice per turn (→ RAG).
In practice
This is the retrieval pattern: you never reason over an entire corpus at once; you pull the relevant slice and reason over that.
Mastery moveThe moment you think 'I'll just paste more', stop and ask whether retrieval or summarization is the real answer. Paste-more scales linearly into a wall; architecture doesn't.
07
Core

Modes & Controls

Same model, different gears. Thinking depth, effort, speed, research, and computer control — the knobs that trade latency and cost for capability.

Mode

Standard vs extended thinking

ConceptExtended thinking (introduced with Claude 3.7 Sonnet) lets the model reason at length before answering. Standard mode replies fast; extended mode spends compute on harder problems first.
Standard — fast replyStandardfast replyExtended — reason firstExtendedreason firstEffort ↑ — high · xhighEffort ↑high · xhighAnswerAnswer
standard → extended → more effort → answer
Trade-offs & gotchas
Extended thinking costs more latency and tokens — skip it on trivial asks.
For pure recall it adds nothing; for multi-step logic it can change the answer.
Mastery moveReach for it on architecture, debugging, math, and anything with dependent steps. For a quick fact or a format tweak, standard is faster and just as good.
Mode

Effort levels & fast mode

ConceptNewer models expose an effort control (e.g. high / xhigh) that dials how much reasoning the model spends, and some offer a Fast mode that trades a little quality for speed and price.
Higher effort → better on hard tasks, more cost/latency.
Fast mode → quicker, cheaper, for routine work.
Match the knob to the stakes: xhigh for a gnarly bug, fast for a file read.
Mastery moveTreat effort like a throttle, not a default. Most work runs fine at normal effort; save the expensive settings for the few tasks that actually reward deeper thinking.
Mode

Research & deep research

ConceptResearch mode runs an extended, multi-source investigation — searching, reading, and cross-checking across many pages — then synthesizes a cited report, rather than answering from a single search.
Trade-offs & gotchas
Slower and heavier than a quick search — use it when breadth and depth both matter.
Still verify load-bearing claims; synthesis can paper over conflicting sources.
Mastery moveScope it tightly. A precise question ('compare X, Y, Z on these four axes') yields a sharp report; a vague one yields a sprawling one.
Mode

Computer & browser use

ConceptComputer Use (since Claude 3.5 Sonnet v2) lets Claude operate a desktop or browser — moving the cursor, clicking, typing, navigating UIs — to carry out multi-step tasks across apps.
Trade-offs & gotchas
Acts on a real interface — scope and supervise it, especially for irreversible actions.
Slower and more error-prone than an API call; prefer a direct tool when one exists.
Mastery moveUse it when there is no API — legacy systems, web flows, GUI-only tools. When a clean API or MCP connector exists, that's almost always the more reliable path.
08
Core

Multimodal & Inputs

Claude isn't text-only. Feed it images, PDFs, screenshots, and data files — and it reads, analyzes, and reasons over them. Most people under-use this entirely.

Input

Images & vision

ConceptUpload or paste images and Claude can describe them, transcribe text, read charts and diagrams, and analyze layouts. It sees — it doesn't just guess from a filename.
Image / PDF — raw inputImage / PDFraw inputVision / parse — read itVision / parseread itStructure — schemaStructureschemaReason — over contentReasonover contentOutput — answerOutputanswer
image → read → structure → reason → answer
In practice
Paste a screenshot of a failing stack trace, a hand-drawn architecture, or a UI mockup — and get analysis, not a request to describe it first.
Trade-offs & gotchas
Tiny text or low-res images degrade accuracy — crop and zoom to what matters.
Don't ask it to identify real people from photos — it won't, and shouldn't.
Mastery moveFor debugging, a screenshot often beats a paragraph. Show the error, the UI, the diagram — visual context carries information that's tedious to type.
Input

PDFs & documents

ConceptHand Claude a PDF or document and it extracts text, summarizes, answers questions, and pulls structured data out. For producing or filling PDFs, the pdf skill handles it cleanly.
In practice
Messy source documents — reports, regulatory filings, conference PDFs — are exactly this: Claude reads them, then you layer structure and extraction on top.
Trade-offs & gotchas
Scanned/image PDFs need OCR — quality depends on the scan.
Very long PDFs hit context limits — chunk or target the relevant section.
Mastery movePair document input with a strict output schema: 'extract these 17 fields as JSON per record.' That turns a wall of PDF into a clean dataset in one pass.
Input

Data files — CSV, Excel, JSON

ConceptUpload tabular data and Claude analyzes it with real code execution — cleaning, computing, charting, validating. The numbers are computed, not estimated.
In practice
Drop a CSV and ask for the breakdown, the duplicates, the chart — it runs the analysis instead of eyeballing it.
Mastery moveAlways prefer 'run the analysis on this file' over 'here's the data, what do you think'. Execution removes the one thing models are weak at — arithmetic at volume.
Guardrail

Input hygiene & privacy

ConceptWhat you paste enters the context window. Be deliberate with secrets, PII, and other people's data — redact what doesn't need to be there.
Strip credentials, tokens, and SSNs before pasting — they're not needed for the task.
Aggregate/anonymize personal data where you can.
Use Incognito chats for sensitive one-offs you don't want retained in memory.
Mastery moveTreat the context window like a log line: assume anything in it could be retained or surfaced later. Minimal, relevant, de-identified — the same discipline you'd apply to any sensitive dataset.
09
Core

Skills & Artifacts

How Claude produces real deliverables — documents, decks, sheets, apps — and the Skills system that encodes hard-won procedure into reusable expertise.

Concept

Skills — procedural expertise on tap

ConceptA Skill is a folder of instructions (a SKILL.md) that teaches Claude a procedure: how to build a clean .docx, fill a PDF form, or follow your own conventions. Claude reads the relevant skill before doing the work.
SkillProduces
docxWord documents — reports, letters, templated docs
pptxslide decks and presentations
xlsxspreadsheets, models, data cleanup
pdfcreating, filling, merging, splitting PDFs
In practice
A formal Word report, a slide deck, a multi-sheet workbook — each maps to a skill, which is why those outputs come out polished instead of raw markdown.
Mastery moveYou can author your OWN skills (skill-creator). Encode your conventions — your house style, file structure, fixed figures — once, and every run inherits them.
Concept

Artifacts — when work becomes a thing

ConceptAn artifact is standalone content rendered beside the chat: code, HTML, React, diagrams, long-form docs — anything you'll keep, reuse, or iterate on rather than read once.
Trade-offs & gotchas
Not for short answers or quick lists — those stay inline.
Browser storage (localStorage) is blocked in the live preview; use in-memory state, or it works once you download and open the file yourself.
In practice
A single-file HTML tool you iterate across many turns is the artifact loop at its best — exactly how a page like this one gets built.
Mastery moveIterate on the same artifact instead of regenerating. Ask for targeted edits ('add a domain', 'fix the flow') — it's faster, cheaper, and preserves everything that already works.
Advanced

AI-powered artifacts (Claude-in-Claude)

ConceptAn artifact can call the Anthropic API itself — so you can build apps where Claude is the engine: a study-quizzer that generates questions, a tool that summarizes pasted text, a grader.
const r = await fetch('https://api.anthropic.com/v1/messages',{
  method:'POST', headers:{'Content-Type':'application/json'},
  body: JSON.stringify({ model:'claude-sonnet-4-6', max_tokens:1000,
    messages:[{role:'user', content: prompt}] })
});
Trade-offs & gotchas
For structured output, instruct 'JSON only, no preamble' and parse defensively.
No memory between calls — pass full state each request.
Avoid HTML
tags in React artifacts; wire onClick handlers instead.
Mastery moveThis turns a static study page into an adaptive one: generate fresh questions on demand, explain wrong answers, scale difficulty — the same engine, now live inside the page.
Feature

Persistent storage in artifacts

ConceptArtifacts can persist data across sessions via a key-value store (window.storage) — enabling trackers, journals, leaderboards, saved progress — with personal or shared scope.
Hierarchical keys: table:record_id; batch related data into one key.
Wrap every call in try/catch; non-existent keys throw, not return null.
Shared scope is visible to all users of the artifact — say so when you use it.
Mastery moveFor a study hub, store the user's mastery state and mock-test history. Progress that survives a refresh is what turns a page into a tool people return to.
10
Core

Tools & Retrieval

How Claude reaches beyond its own knowledge — web search, code execution, connectors — and the judgment of when to use a tool versus just answer.

Tool

Web search and fetch

ConceptSearch pulls current information; fetch reads a specific page in full. Use them for anything past the training cutoff, fast-moving, or that you need verified rather than recalled.
Trade-offs & gotchas
Snippets are short — fetch the page when you need the real content.
Results vary in quality; favor primary sources over aggregators.
Don't search settled facts you already know — it just adds latency.
Mastery moveChain it: search to find the authoritative URL, fetch to read it, then reason. One-shot search answers are shallow; the fetch is where the depth is.
Tool

Code execution + file creation

ConceptClaude can run code in a sandbox — to compute, analyze data, generate charts, and build the actual .docx/.xlsx/.pptx/.pdf files you download. Math and data get executed, not guessed.
In practice
A revenue chart, a multi-sheet workbook, validating a CSV, running a threshold over a sample — all real execution, so the numbers are real.
Mastery moveFor anything numeric, prefer 'compute it' over 'estimate it'. Asking Claude to run the calculation removes the one thing LLMs are genuinely shaky at — arithmetic at scale.
Tool

Deep Research — many sources, one report

ConceptDeep research runs an extended, multi-source investigation — searching, reading, and cross-checking across many pages — then synthesizes a structured report with citations. For big questions that a single search can't answer.
Trade-offs & gotchas
Slower and heavier than a quick search — use it when breadth and depth both matter.
Still verify load-bearing claims; synthesis can smooth over source conflicts.
In practice
A competitive landscape, or a survey of tooling in a fast-moving area — questions where you'd otherwise run twenty searches by hand.
Mastery moveScope it tightly. A precise research question ('compare X, Y, Z on these 4 axes') gets a sharp report; a vague one gets a sprawling one.
Connectors

MCP — connect Claude to your systems

ConceptThe Model Context Protocol lets Claude talk to external apps — Gmail, Drive, Calendar, issue trackers, your own servers — through a standard interface, so it can read and act on real data.
Trade-offs & gotchas
Permissions matter — a connector can read/act on real accounts; scope deliberately.
An instruction hidden inside fetched data is not you talking — treat it with suspicion.
More connectors = more surface; enable what the task needs.
Mastery moveMCP is how a chat becomes a workflow. The leap from 'Claude drafts the email' to 'Claude reads the thread, drafts, and files it' is a connector — and you can expose your own services the same way.
Judgment

Tool or no tool — the decision

ConceptReach for a tool when the answer depends on something outside the model: current facts, your private data, exact computation, or an action in the world. Otherwise, reasoning is faster.
Current / changeable → search.
Your files / accounts → connector.
Exact numbers / data → code.
Reasoning / explanation / drafting → just answer.
Mastery moveThe anti-pattern is tool-for-everything (slow) or tool-for-nothing (stale/wrong). Fluency is knowing which kind of question you're holding — the same triage you'd do designing an agent.
11
Advanced

Agentic Patterns

The reusable shapes of autonomous systems — ReAct, orchestrator-worker, RAG, reflection — framed as prompting + tooling discipline, not magic.

Pattern

ReAct — reason + act in a loop

ConceptThe model alternates Thought → Action (tool call) → Observation, looping until it can answer. It's the backbone of most useful agents: think, do, look, repeat.
loop until goal metThought — plan next stepThoughtplan next stepAction — call a toolActioncall a toolObservation — read resultObservationread resultAnswer — when confidentAnswerwhen confident
thought → action → observation → (loop)
In practice
A support-triage or research agent is this shape — the big efficiency wins come from letting the agent choose each next action against live observations instead of a fixed script.
Trade-offs & gotchas
Loops can run away — cap steps and add a stop condition.
Bad observations compound; validate tool output before feeding it back.
Every loop costs tokens/latency — budget it.
Mastery moveMake the 'when do I stop' condition explicit and checkable. The difference between a demo and a production agent is almost always the quality of its termination + error handling.
Pattern

Orchestrator–worker

ConceptOne orchestrator decomposes a goal into sub-tasks, hands each to a focused worker (own prompt, own tools), then synthesizes. Parallelism + separation of concerns.
Orchestrator — decomposeOrchestratordecomposeWorker · researchWorker · researchWorker · draftWorker · draftWorker · codeWorker · codeWorker · checkWorker · checkSynthesize — merge worker outputs and verifySynthesizemerge + verify
decompose → workers → synthesize + verify
In practice
A pipeline that fans work to specialized workers over a shared control point, then merges results centrally, is orchestrator–worker in production.
Trade-offs & gotchas
Synthesis is the hard part — conflicting worker outputs need a real merge policy.
Over-decomposition adds coordination cost; only split where it buys parallelism or clarity.
Mastery moveGive each worker the narrowest possible context. A worker that only sees its slice is cheaper, faster, and easier to evaluate than one that sees everything.
Pattern

RAG — retrieval-augmented generation

ConceptInstead of relying on model memory, retrieve relevant chunks (vector or keyword/SQL) and generate grounded, citable answers over them. Knowledge stays fresh and traceable.
Query — user askQueryuser askRetrieve — vector / SQLRetrievevector / SQLContext — top-k chunksContexttop-k chunksGenerate — grounded answerGenerategrounded answerCite — traceableCitetraceable
query → retrieve → context → grounded answer
In practice
A production assistant answering over a large, changing knowledge base is the canonical example — retrieval keeps it current and auditable without retraining.
Trade-offs & gotchas
Garbage retrieval → garbage answer; retrieval quality is the ceiling.
Chunking strategy (size, overlap, boundaries) quietly decides everything.
Always carry source IDs through so answers stay traceable.
Mastery moveEvaluate retrieval and generation separately. Most 'the LLM is wrong' bugs are actually 'we retrieved the wrong context' bugs — fix the right layer.
Pattern

Tool use / function calling

ConceptExpose typed functions; the model picks which to call and with what arguments, you execute, and feed results back. This is the primitive under every agent.
tools=[{ 'name':'lookup_record',
  'description':'Fetch a record by id',
  'input_schema':{'type':'object',
    'properties':{'id':{'type':'string'}}, 'required':['id'] }}]
Trade-offs & gotchas
Vague tool descriptions = wrong tool chosen; write them like good docstrings.
Validate arguments before executing — never trust them blindly.
Handle the 'no tool needed' case so it doesn't force a call.
Mastery moveDesign tools at the right granularity: one search(filters) beats ten narrow endpoints. The model reasons better over a small set of expressive tools.
Pattern

Reflection / self-critique

ConceptThe agent reviews its own draft against criteria and revises before returning. A cheap, powerful quality lever — 'generate, then critique, then fix'.
regression-guardedGolden set — real casesGolden setreal casesRun — RAGAS·DeepEvalRunRAGAS·DeepEvalScore — faithful·relevantScorefaithful·relevantRefine — prompt·retrievalRefineprompt·retrieval
draft → critique → refine
Trade-offs & gotchas
Self-critique can rubber-stamp — give it sharp, specific criteria to check against.
Diminishing returns after 1–2 passes; don't loop forever.
Mastery moveCritique with a different framing than you generated with ('now you're the reviewer who must find the bug'). The shift in lens is what catches what the first pass missed.
Pattern

Multi-agent & sub-agents

ConceptSeveral specialized agents collaborate — a researcher, a coder, a critic — each with its own context and tools, coordinated by an orchestrator or a shared protocol.
Trade-offs & gotchas
Coordination overhead is real; a single good agent often beats a messy swarm.
Error propagation across agents is hard to debug — log every hop.
Cost multiplies with agent count.
Mastery moveReach for multi-agent only when sub-tasks are genuinely different in skill or context, not just to parallelize. The bar: would two human specialists do this better than one generalist?
12
Advanced

Claude Code & Dev

Agentic engineering for real repos. Where Claude reads your codebase, plans, edits across files, runs commands, and you stay the architect.

Surface

Claude Code — the agent in your repo

ConceptAn agentic coding tool that works from terminal, IDE, or desktop. It reads files itself, navigates the project, makes multi-file edits, runs commands, and explains its plan.
In practice
For any real codebase this beats pasting: it reads the actual files, imports, and existing patterns, so edits fit the project instead of a hallucinated one.
Trade-offs & gotchas
Autonomy needs guardrails — review diffs, especially around migrations and prod branches.
It acts on real files; work on a branch, not production directly.
Mastery moveGive it the goal and the constraints, then let it plan first and edit second. You're moving from typist to reviewer — your leverage is in the spec and the review, not the keystrokes.
Workflow

Plan, then execute

ConceptFor non-trivial changes, have it produce a plan — files to touch, approach, risks — before writing code. You catch wrong assumptions when they're cheap to fix.
Before editing: list the files you'll change, the approach, and what could break.
Wait for my go-ahead, then implement on a feature branch.
Mastery movePlan-mode is task-segregation applied to code. The plan IS the spec; approving it is where you exercise judgment, and the implementation becomes mechanical.
Surface

Cowork — agentic knowledge work

ConceptBeyond code: Cowork hands off multi-step work that spans files and tools — research, analysis, long documents, anything with many steps and artifacts.
In practice
Assembling a multi-part deliverable — research plus draft plus formatted docs — across many steps is the Cowork shape: many artifacts, one coherent result.
Mastery moveHand off the assembly, keep the judgment. Let it gather, draft, and format; you decide what's true, what ships, and what the figures must be (your canonical figures never get altered).
Discipline

Repo hygiene with an agent

ConceptAgentic edits make good Git discipline non-optional: branches, small reviewable diffs, clear messages, and a way to roll back.
In practice
Working on a feature branch and resolving conflicts there — never straight on the main branch — is exactly why agentic edits demand good Git discipline.
Mastery moveMake 'feature branch + reviewed diff' the default the agent operates under. Speed without reviewability isn't speed — it's debt you take on at 3x the rate.
13
Advanced

Claude Code Superpowers

The extensibility stack most people never touch. CLAUDE.md, skills, subagents, hooks, plugins, and MCP turn Claude Code from a smart terminal into a programmable engineering platform.

Overview

The extensibility stack

ConceptClaude Code gives you several ways to steer behavior, each with different authority and lifetime. Knowing which to reach for is what separates reactive use from building a system.
Claude Code — the harnessClaude Codethe harnessCLAUDE.mdCLAUDE.mdSkillsSkillsSubagentsSubagentsHooksHooksMCP · PluginsMCP · PluginsSynthesize — merge worker outputs and verifySynthesizemerge + verify
one harness, many ways to steer it
MethodUse it for
CLAUDE.mdalways-true project conventions, loaded every session
Skills (+ slash commands)reusable procedures, auto- or /-invoked
Subagentsisolated, parallel work in a separate context
Hooksdeterministic actions on lifecycle events
Pluginsbundle + share all of the above
MCPconnect external services + your own tools
Mastery moveLayer them: CLAUDE.md for standards, skills for procedures, subagents for isolation, hooks for non-negotiables. The combination is the platform — not any single feature.
Feature

CLAUDE.md — the project constitution

ConceptA markdown file at the repo root that loads into context at session start and stays there. Build commands, directory layout, conventions, and team norms live here.
Trade-offs & gotchas
Keep it lean — a 500-line CLAUDE.md hurts more than it helps; aim well under 200.
Put procedural workflows (deploy, release) in skills, not here.
Code snippets go stale — point to file:line instead of pasting.
Mastery moveTreat it like a README written for Claude. The few always-true facts about your repo go here so you never re-explain them; everything situational lives in skills it loads on demand.
Feature

Skills & slash commands

ConceptA skill is a folder with a SKILL.md (plus optional helper files) defining a reusable procedure. Slash commands and skills are now unified — every skill gets a /name interface.
Auto-invocable: Claude reads the description and applies it when the task matches.
User-invocable: you type /skill-name to run it explicitly.
Only name + description load at session start; the body loads when invoked, keeping context lean.
Mastery movePut procedural knowledge — review checklists, deploy steps, house style — in skills, not in your head. A skill is a workflow you teach Claude once and call by name forever.
Feature

Subagents — isolated parallel workers

ConceptA subagent is a separate Claude session with its own context window, tools, and even model. Its verbose work stays isolated; only the summary returns to your main thread.
Trade-offs & gotchas
Best when a side task (deep search, log analysis, a test run) would clutter your main context.
Restrict subagents to read-only tools and let the parent handle edits/approvals.
Route grunt work to a cheaper model (e.g. Haiku) and reserve the capable model for reasoning.
Mastery moveReach for a subagent when you want the result but not the mess. It's the orchestrator-worker pattern built into the tool — keep the main conversation clean and on-task.
Feature

Hooks — deterministic lifecycle control

ConceptHooks are commands, HTTP endpoints, or prompts that fire automatically on events in Claude's lifecycle — a file edit, a tool call, session start. Unlike instructions, they always run.
SessionStartSessionStartUserPromptSubmit — promptUserPromptSubmitpromptPreToolUse — before toolPreToolUsebefore toolTool runsTool runsPostToolUse — after toolPostToolUseafter toolStop — doneStopdone
events where a hook can fire
Key events: SessionStart/End, UserPromptSubmit, PreToolUse, PostToolUse, Stop / SubagentStop, PreCompact — ~two dozen in total.
PreToolUse is the main guardrail: it can veto a dangerous command before it runs.
Types include command, HTTP, mcp_tool, prompt, and agent hooks.
Defined in settings.json, or in skill/subagent frontmatter scoped to that component.
Trade-offs & gotchas
Deterministic beats probabilistic: a 'never run rm -rf' rule in CLAUDE.md is followed most of the time; a PreToolUse hook blocks it every time.
Hooks run real code with your environment — keep them fast and safe.
Mastery moveUse hooks for the non-negotiables you can't trust to a prompt — secret scanning, blocking destructive commands, running the linter on every edit. They're rules that enforce themselves.
Feature

Plugins, MCP & sharing a setup

ConceptA plugin is a versioned bundle that ships skills, subagents, hooks, output styles, and MCP server definitions together — a one-command install that gives a whole team the same setup.
MCP connects Claude Code to external services (GitHub, databases, browsers) and your own internal tools through one standard protocol.
Plugin skills are name-spaced (e.g. /security:scan) to avoid collisions.
Output styles and a custom status line let you tune how Claude communicates and what it shows.
Mastery moveOnce a few skills + hooks earn their keep, package them as a plugin. Reusable setup that travels across repos and teammates is how individual tricks become institutional capability.
14
Advanced

Evaluation & Quality

The discipline that turns demos into systems you can trust. If you can't measure it, you can't improve it or defend it.

Principle

Why eval-driven, not vibe-driven

ConceptLLM outputs are non-deterministic and fail in subtle ways. A fixed set of real test cases with scored criteria is the only way to know if a change helped or quietly broke something.
regression-guardedGolden set — real casesGolden setreal casesRun — RAGAS·DeepEvalRunRAGAS·DeepEvalScore — faithful·relevantScorefaithful·relevantRefine — prompt·retrievalRefineprompt·retrieval
golden set → run → score → refine
Mastery moveBuild the golden set from real failures, not invented ones. The cases that bit you in production are worth ten synthetic examples — they encode the actual distribution.
Tooling

RAGAS — evaluating RAG

ConceptRAGAS scores RAG systems on dimensions like faithfulness (answer grounded in retrieved context), answer relevance, and context precision/recall — separating retrieval quality from generation quality.
In practice
It tells you whether a wrong answer came from bad retrieval or bad generation — the single most useful split when debugging a production RAG pipeline.
Mastery moveTrack faithfulness as your anti-hallucination metric. A confident answer that isn't grounded in the retrieved context is the failure mode that erodes trust fastest.
Tooling

DeepEval, pytest, LLM-as-judge

ConceptDeepEval brings LLM evaluation into a pytest-style harness — assertions on relevancy, hallucination, toxicity, custom metrics — so evals run in CI like ordinary tests.
LLM-as-judge: a model scores outputs against a rubric — scalable but needs a calibrated rubric.
Wrap evals as pytest cases so regressions fail the build, not the user.
Combine deterministic checks (schema, exact-match) with judged checks (quality).
Mastery moveMake eval a gate, not a report. If a prompt change drops faithfulness below threshold, the pipeline should refuse to ship — same rigor you'd put on a failing unit test.
Discipline

Regression sets and golden data

ConceptEvery fixed bug becomes a permanent test case. Over time your golden set becomes an institutional memory of every way the system has ever failed.
Trade-offs & gotchas
Golden sets rot — prune cases that no longer reflect reality.
Over-fitting prompts to the eval set is real; hold out a fresh slice.
Mastery moveWhen a stakeholder reports a wrong answer, your first move isn't to fix the prompt — it's to add the case to the eval set. Then fix until it passes. The fix is proven, not hoped.
15
Advanced

API & Production

Embedding Claude into your own products and pipelines — the Messages API, streaming, batching, structured output, caching, and the retry discipline that survives real load.

API

The Messages API — the core call

ConceptOne endpoint: you send a model, a token budget, and a list of role-tagged messages; you get back content blocks (text and/or tool calls). Everything else — tools, search, MCP — layers on.
resp = client.messages.create(
  model='claude-sonnet-4-6', max_tokens=1024,
  system='You are a precise data extractor.',
  messages=[{'role':'user','content': payload}])
Mastery movePin the model string in config, not in code. When you graduate a prototype from Opus to Sonnet or swap to a newer snapshot, it's a one-line change — not a hunt-and-replace.
API

Streaming and structured output

ConceptStream tokens for responsive UIs; for machine consumption, instruct strict JSON (no prose, no fences) and parse defensively. The API returns mixed content blocks — read by type, not index.
Extract blocks by their type field (text / tool_use / tool_result), never by position.
Strip stray ```json fences before parsing — belt and braces.
Validate against your schema and fail loudly on drift.
Mastery moveTreat the model's JSON like untrusted input from any external service: schema-validate at the boundary. A parse that 'usually works' will page you at the worst time.
API

Batching for volume

ConceptFor large, latency-tolerant jobs — classifying thousands of records, backfilling summaries — batch processing trades immediacy for throughput and cost efficiency.
In practice
A large backfill or bulk classification job is batch-shaped: thousands of items, no human waiting, cost matters.
Mastery moveDecide per workload: interactive → streaming; bulk → batch. Mixing them (real-time calls for a backfill) is how token bills and rate limits surprise you.
Optimization

Prompt caching + cost discipline

ConceptReuse a large stable prefix (schema, instructions, examples) across many calls via caching so you don't re-pay for the same context every request. Big lever on repetitive pipelines.
Trade-offs & gotchas
Cache the stable part; keep the variable part (the actual record) outside it.
Token cost = (input + output) × volume — trim both; verbose prompts scale badly.
Mastery moveProfile your token spend like you'd profile latency. The 90% that's a fixed instruction block is the 90% to cache — and the verbose preamble nobody reads is the first cut.
Resilience

Retries with exponential backoff

ConceptNetworks and rate limits fail transiently. Retry with increasing delays + jitter so you recover gracefully instead of hammering a busy endpoint.
for i in range(5):
    try: return call()
    except RateLimit: time.sleep((2**i) + random.random())
raise
In practice
Routing around transient infrastructure failure is the same instinct everywhere — a flaky network or a busy endpoint shouldn't fail the whole job.
Mastery moveCap retries and surface the final failure — infinite retry hides outages. Jitter matters: without it, every client retries in lockstep and you build your own thundering herd.
16
Quick Reference

Cheat Codes

The dense quick-reference layer. Feature toggles, one-liner prompt openers, output-control hacks, and the highest-leverage moves — the stuff you'll keep coming back to mid-task.

Cheat sheet

Feature toggles — what to turn on, when

ConceptMost capability is one toggle away. Reach for the right one instead of fighting a limitation.
ToggleTurns onReach for when…
Web searchcurrent info from the webfacts past the cutoff, anything live
Extended thinkingdeeper reasoning before answeringhard logic, math, architecture
Deep researchmulti-source investigation + reportbig questions needing many sources
Code execution + filesruns code, builds docx/xlsx/pptx/pdfcomputation, data, real deliverables
Artifactsstandalone rendered output beside chatcode, HTML, long docs you'll keep
Projectsshared knowledge + scoped historyan ongoing thing (a product, a subject)
Memoryfacts carried across chatsyou're tired of re-briefing
Reference past chatssearch prior threads on demand“continue what we did last week”
Styles / preferencesstanding tone + formatyou keep restating the same wants
Incognitono memory, not retainedsensitive one-offs
Mastery moveWrong tool, not wrong model, is the usual cause of a bad session. When stuck, the first question is 'which toggle would make this trivial?'
Cheat sheet

One-liner prompt openers

ConceptPasteable first lines that instantly raise output quality. Keep them in muscle memory.
• “Reason step by step, then give ONLY the final answer.”
• “Before you start, ask me what you need to do this well.”
• “Write the best prompt for this, then critique and improve it.”
• “You're the reviewer whose job is to find why this fails.”
• “Return ONLY valid JSON matching this schema — no prose, no fences.”
• “List 3 ways this could be wrong, check each, then give the corrected version.”
• “Give me the thinnest end-to-end slice that runs, nothing more.”
• “Be concise. Prose, not bullets. No preamble.”
Mastery moveAn opener is a debugged instruction. Collecting the ones that work for you is the cheapest prompt-engineering you'll ever do.
Cheat sheet

Output-control hacks

Want…Say
Strict JSON“ONLY valid JSON, no markdown fences” (+ schema + 1 example)
A table“output a table of {col, col, col} — nothing else”
Brevity“<= N words” / “no preamble” / “prose, not bullets”
Just code“code only, no explanation”
Reasoning hidden“think in a scratchpad, then give only the result”
Consistencygive 2 worked examples — they become the format spec
No hedging“state it directly; flag uncertainty only where it's real”
Mastery moveLead with the output shape when you know it. Constraining the container constrains everything that goes in it — the single highest-leverage habit on this page.
Cheat sheet

Faster & cheaper

Drop to Sonnet/Haiku once a prompt is proven — reserve Opus for finding the ceiling.
Cache the stable prefix (schema, rules, examples); keep only the record variable.
Batch high-volume, latency-tolerant jobs instead of real-time calls.
Trim verbose preambles nobody reads — tokens in AND out cost you.
Let Claude Code read files itself instead of pasting them (don't pay to recreate state).
Mastery moveProfile token spend like latency. The fixed instruction block is what to cache; the rambling context is what to cut. Two moves, most of the savings.
Cheat sheet

Interaction & flow moves

Edit an earlier message to branch the conversation instead of fighting a derailed thread.
Start a fresh chat to reset a context that's accumulated contradictions — often faster than untangling.
Iterate on an artifact ('add a domain', 'fix the flow') rather than regenerating it.
Use a Project so every chat starts already briefed on the thing you're building.
Thumbs-down sends feedback to Anthropic when something's off — use it.
Mastery moveConversation shape is a tool. Branching, resetting, and iterating in place are how you keep momentum instead of wrestling a single thread into submission.
Cheat sheet

The 10 highest-leverage moves

1. Put the context in the window — don't rely on recall.
2. Lead with the output shape.
3. Show, don't tell — 2 sharp examples beat a paragraph.
4. Let it think, then give only the result.
5. Split the task before tuning the prompt.
6. Verify anything load-bearing (figures, claims, regex).
7. Reach for the right toggle (search / code / files).
8. Promote anything you've said twice into a project or preference.
9. Iterate on artifacts; don't regenerate.
10. Turn every solved problem into a reusable asset.
Mastery moveIf you internalize only this card, you're already operating above most users. The other 70 cards are how each line becomes second nature.
17
Reliability

Anti-patterns & Recovery

What goes wrong, why, and the exact move to fix it. Knowing the failure modes is half of mastery — the other half is recovering fast instead of re-rolling and hoping.

Diagnosis

Common failure modes → fixes

ConceptMost bad outputs trace to a handful of causes. Diagnose the cause, apply the specific fix — don't just resend the same prompt.
SymptomReal causeFix
Vague / genericunder-specifiedadd criteria + output shape
Confidently wrongrecall, not retrievaladd context or turn on search
Ignored an instructionburied or contradictedmove it to the end, emphasize, exemplify
Too long / ramblingno length or format capset ‘<= N words’ / table-only
Over-engineeredunscoped‘simplest thing that works’ + constraints
Drifting / loopingaccumulated contradictionsrestart clean, re-state the goal
Mastery moveBuild the diagnosis reflex: 'why did this miss?' before 'try again'. One targeted change beats ten hopeful re-rolls — same as debugging code by hypothesis, not by shuffle.
Recovery

When it hallucinates

ConceptFluent and wrong is the dangerous failure. Counter it by grounding, verifying, and inviting uncertainty rather than trusting confident phrasing.
Ground it: turn on search, or paste the source — don't let it answer from memory.
Verify numbers with code execution; verify claims with chain-of-verification.
Ask explicitly: 'flag anything you're not sure of and what would confirm it.'
Mastery moveConfidence is not a signal of correctness — ever. Treat any unsourced specific (a stat, a date, an API name) as a hypothesis until grounded.
Recovery

When it ignores your instruction

ConceptUsually a placement or contradiction problem, not defiance. The instruction got buried mid-prompt, conflicted with another, or wasn't concrete enough to act on.
Move the non-negotiable to the END of the prompt — recency helps it stick.
Make it concrete and testable ('pure ASCII', not 'avoid weird characters').
Show a tiny example of compliance — examples beat declarations.
Check for a conflicting instruction it had to choose between.
Mastery moveA rule like 'output pure ASCII' is the template: positive, concrete, testable. Load-bearing instructions go last and get an example — that's why they hold where vague rules slip.
Recovery

Too verbose, too terse, over-built

Verbose → cap it: '<= 120 words', 'no preamble', 'prose not bullets'.
Too terse → ask for the reasoning or the trade-offs explicitly.
Over-engineered → 'give me the thinnest version that works; I'll ask for more.'
Mastery moveCalibrate once, then save it as a preference. A saved 'concise, minimal formatting' preference exists precisely so you stop re-fighting verbosity every chat.
Recovery

Context rot — know when to restart

ConceptLong threads accumulate contradictions and stale assumptions. Past a point, a clean restart with a tight summary beats trying to steer a tangled conversation back.
Trade-offs & gotchas
Restarting loses in-thread context — carry a short state summary across.
Branching (editing an earlier message) is the middle option between steer and restart.
Mastery moveRecognize the rot early: when corrections stop landing and it keeps reverting, that's the signal. Fresh chat, one-paragraph state, move on — don't sink ten turns into untangling.
18
Extreme

Frontier Mastery

Systems that compound. Where evals become products, prompts get optimized in loops, and a single engineer ships what used to take a team.

Frontier

Eval-as-product, self-improving loops

ConceptAt the top end, the eval set and the optimization loop are the product. You build a harness that generates candidates, scores them, and keeps what wins — the system tightens itself.
regression-guardedGolden set — real casesGolden setreal casesRun — RAGAS·DeepEvalRunRAGAS·DeepEvalScore — faithful·relevantScorefaithful·relevantRefine — prompt·retrievalRefineprompt·retrieval
candidates → score → keep winners → repeat
Mastery moveWhoever owns the eval owns the system. Make the rubric the artifact you defend — a senior engineer is judged on the quality of the measuring stick, not the cleverness of one prompt.
Frontier

Systematic prompt optimization

ConceptTreat prompts like models: version them, A/B them against the golden set, and let data pick the winner instead of intuition. Optimization becomes a search, not a guess.
Trade-offs & gotchas
Optimizing to a small eval set over-fits — keep a held-out slice honest.
A prompt that wins today can lose on a model upgrade — re-run evals on every model change.
Mastery moveKeep a prompt changelog with the eval delta each change produced. That record is how you turn 'prompting' from craft into engineering — reproducible, defensible, teachable.
Frontier

Agentic product OS

ConceptThe endgame: AI isn't a feature bolted on, it's the operating layer — ingestion, structuring, reasoning, and action all agentic, with humans setting goals and guarding quality.
In practice
An AI-native platform where the GenAI layer is the spine — ingestion, structuring, reasoning, action — rather than a feature bolted on the side.
Mastery moveDesign for the human-in-the-loop boundary first. The mark of a mature agentic system is a clean answer to 'where exactly does a person approve, override, or get paged?'
Frontier

Compounding leverage

ConceptThe extreme tier isn't bigger prompts — it's reusable assets that compound: skills you authored, eval sets you grew, templates you refined, agents you can recombine.
Every solved problem becomes a skill, a template, or a test — never just a one-off.
A signature pattern you repeat (scrape → structure → GenAI, say) is a reusable asset; name it, template it, redeploy it.
Leverage = (assets you've built) × (speed you can recombine them).
Mastery moveAudit your last ten tasks: how many produced a reusable asset versus a throwaway answer? Raising that ratio is the whole game at the senior level.
19
Make It Yours

Build Your Workflow OS

The payoff of everything above: turning techniques into a personal operating system. How to prompt consistently, break work down, move from raw idea to shipped project, and compound what you build.

Method

A house style for prompting

ConceptStop re-deriving your standards every chat. Settle on a standard preamble — context, hard rules, the task, the output shape — and reuse it. Consistency is what turns instinct into craft.
[CONTEXT] stack + schema + key facts here.
[RULES] environment + hard constraints (concrete, testable).
[TASK] one sentence, one outcome.
[OUTPUT] exact shape; nothing else.
Mastery moveSave that block as a preference or project so it is ambient. The best operators don't re-type their standards — they encode them once and let the environment enforce them.
Method

Task segregation — the funnel

ConceptBig asks fail because they are fuzzy. Run every one through a funnel: decompose into atomic units, define 'done' for each, execute one at a time, gate each on verification before moving on.
Big ask — fuzzyBig askfuzzySegment — atomic unitsSegmentatomic unitsSpec — done = ?Specdone = ?Execute — one at a timeExecuteone at a timeVerify — gateVerifygate
big ask → segment → spec → execute → verify
Atomic = one unit you could hand to one worker (or one Claude turn) and verify alone.
Define done BEFORE you start — a testable acceptance line per segment.
Verify-gate: don't start segment N+1 until N passes, so errors never compound.
Mastery moveThis is the orchestrator-worker pattern applied to your own work. The discipline that makes a pipeline reliable is the same one that makes your day reliable — use it on yourself.
Method

Velocity — thought to project, fast

ConceptSpeed comes from a fixed path, not from rushing: raw idea → one-screen spec → thinnest POC → harden → ship. Each stage has a clear exit, so you never stall in ambiguity.
Idea — raw thoughtIdearaw thoughtSpec — 1-screen briefSpec1-screen briefPOC — thinnest slicePOCthinnest sliceHarden — tests·edgeHardentests·edgeShip — real usersShipreal users
idea → spec → POC → harden → ship
StageExit when…Time-box
Ideayou can name the one job in a sentenceminutes
Specdone-criteria fit on one screen< 1 hr
POCthinnest end-to-end slice runsa sitting
Hardenedge cases + tests passas needed
Shipa real user can use itcommit
Mastery moveMost speed loss is at the Idea→Spec gate — building before the job is named. Forcing the one-sentence job is what unlocks the rest.
Method

Name your signature pattern

ConceptAcross projects you tend to repeat one move — a sequence of steps that keeps working. A common one in data work is: scrape messy input → structure with a schema + stable IDs → layer GenAI summarization and matching → ship as RAG or a dashboard.
Scrape — messy sourceScrapemessy sourceStructure — schema + IDsStructureschema + IDsGenAI — summarize·matchGenAIsummarize·matchShip — RAG / dashboardShipRAG / dashboard
scrape → structure → GenAI → ship
Mastery moveName it, template it, reuse it deliberately. It isn't five projects — it's one pattern applied five times. Packaging it as a reusable asset is the senior story: 'I built the method, not just the instance.'
Method

Protect your canonical facts

ConceptSome facts are load-bearing — headline metrics, key figures, names that must stay exact across every document. Treat them as a single source of truth that drafts inherit, never re-type.
Trade-offs & gotchas
Never let a draft round, embellish, or drift these — consistency is the credibility.
When Claude drafts from them, the figures are inputs, not variables to improvise.
Mastery moveTreat your canonical numbers like canonical IDs in a pipeline: immutable, single-source. Put them in project knowledge so every draft inherits the exact values automatically.
Toolkit

Reusable prompt templates

ConceptA handful of templates cover most of your day. Keep them pasteable.
# REVIEW
You are a skeptical staff engineer. Find correctness + race issues only.
Context: <stack/schema>. Output: table {file:line, issue, severity, fix}. Nothing else.
# BUILD ARTIFACT
Build a single-file, self-contained <html/py>. Constraints: pure-ASCII,
validate tag + bracket balance before output. Iterate on edits, don't regenerate.
# DECOMPOSE
Break this goal into atomic, independently-verifiable segments.
For each: {name, done-criteria, risk}. Don't start work — just the plan, then wait.
Mastery moveA template is a prompt you've already debugged. Every time you tune one, you upgrade every future task that uses it — compounding, like a well-factored function.
20
Zoom Out

The Wider Landscape

Claude isn't the only tool, and pretending otherwise wouldn't serve you. Here's an honest map of the field, where each assistant leads, what's free, the open/local options, and how power users combine them. Prices and models change fast — treat figures as approximate and re-check.

Comparison

The field at a glance

ConceptAs of mid-2026 there's no single 'best' assistant — specialization is the defining feature. Each major player has a clear identity. Prices are approximate USD/month and vary by region and plan.
Route by job — not by hypeRoute by jobnot by hypewrite / docswrite / docscode / agentscode / agentsresearchresearchGoogle stackGoogle stacklive weblive webself-hostself-hostSynthesize — merge worker outputs and verifySynthesizemerge + verify
route by the job, not by the hype
AssistantMakerBest known forPlan (approx USD/mo)
ClaudeAnthropicwriting, careful docs, coding, agentic + Projects~$20 (Pro), free tier
ChatGPTOpenAImost versatile all-rounder, image gen, ecosystemFree / ~$8 / ~$20
GeminiGoogleGoogle Workspace, huge context (1M+), multimodalFree / ~$20
GrokxAIreal-time X / social contextwith X Premium ~$8–22
CopilotMicrosoftnative Microsoft 365 (Word/Excel/Outlook)Free / ~$20
PerplexityPerplexitysearch-native, source-cited answersFree / ~$20
DeepSeekDeepSeekcheap API, strong math/coding (open-weight)app free / low API
LlamaMetaopen weights, big context, self-hostingfree (self-host)
MistralMistralopen multimodal/multilingual, EU / on-premfree (self-host)
QwenAlibabastrong open models, multilingualfree (self-host)

Sources: 2026 comparison roundups (tminusai.com, fieldguidetoai.com, gurusup.com) and maker docs. Figures move — verify current pricing on each provider's site.

Mastery moveDon't shop for 'the smartest model' — shop for the one whose identity matches your task. The row that fits your work matters more than any benchmark leaderboard.
Judgment

When to reach for something else

ConceptBeing good with Claude includes knowing when a different tool is simply the right call. An honest rundown:
If you need…Strong choice
image / video generationChatGPT (image models, Sora)
work inside Gmail / Docs / DriveGemini
what the web/X is saying right nowGrok or Perplexity
answers with inline citationsPerplexity
native Word / Excel / OutlookMicrosoft Copilot
lowest API cost at high volumeDeepSeek (mind data residency)
self-hosting / data never leavesopen models (Llama, Mistral, Qwen)
Claude tends to lead on long-form writing, careful document work, coding, and agentic workflows.
ChatGPT is the broadest default; Gemini wins inside Google; Grok/Perplexity win on freshness.
Mastery movePick the tool by the shape of the job. The mark of fluency isn't loyalty to one assistant — it's reaching for the right one without ego.
Free

Free tiers & free access

ConceptYou can do a great deal without paying. Every major assistant has a capable free tier, and there are free routes to model APIs too.
Free optionWhat you get
Claude (free)daily access to chat, good for writing + analysis
ChatGPT (free)broad general-purpose use; a low-cost 'Go' tier sits above
Gemini (free)the Flash model with daily limits; Google-native
Perplexity (free)cited, search-backed answers
Grok (free, on X)limited free use within X
Google AI Studiofree experimentation with the Gemini API
Provider API creditstrial credits to test the paid APIs
Trade-offs & gotchas
Free tiers cap usage and often the best models — fine for casual + learning, limiting for heavy work.
Free consumer apps may use your chats to improve models unless you opt out / use a no-train mode.
Mastery moveStart free, on purpose. Run the same real task through two or three free tiers before paying — you'll learn which one actually fits your work, not which one markets best.
Open / Local

Open-weight & local models

ConceptOpen-weight models you can download and run yourself have reached frontier-competitive quality in 2026. They're free to use (you supply the compute) and keep data fully under your control.
Model familyMakerNotable for
LlamaMetabest-known open weights, big context, strong community
DeepSeekDeepSeekstrong reasoning/coding, very low cost
MistralMistralopen multimodal/multilingual, EU data-residency / on-prem
QwenAlibabastrong open models, multilingual
Run them locally with Ollama or LM Studio; browse models on Hugging Face.
Upsides: free at scale, private, customizable, no vendor lock-in.
Trade-offs: you manage hardware/ops, and polish often trails the hosted consumer apps.
Note: a model's hosted API may store data in specific jurisdictions — check residency for regulated work.
Mastery moveOpen + local is the answer when cost, privacy, or data-sovereignty dominate. For everyday convenience the hosted apps still win — know which constraint you're optimizing for.
Power move

Multi-model workflows

ConceptThe power-user pattern in 2026 isn't picking one winner — it's routing each task to the model that fits, and even using one model to drive another.
Have Claude design the architecture + prompts, then hand bulk/boilerplate generation to a cheaper model.
Use a search-native tool for fresh facts, then bring the findings into Claude to reason + write.
For builders: the orchestration around the models (routing, RAG, evals) matters more than which single model you pick.
Mastery moveThink in stacks, not loyalties. A simple router — 'this kind of task → this model' — plus good evals will out-perform any single frontier model used for everything.
Cheat sheet

Useful resources & tools

ConceptA short, durable set of places worth bookmarking — for learning, comparing, and building.
Learn Claude — docs.claude.com + the prompt engineering guide and the Anthropic cookbook.
Compare models head-to-head — LMArena (community leaderboard / blind voting).
Open models + datasets — Hugging Face; run locally with Ollama.
Evaluate your system — RAGAS, DeepEval, promptfoo (covered in the Evaluation domain).
Stay current — each provider's news/blog; arXiv for papers; the model's own changelog.
Mastery moveCurate a tiny, trusted set of sources and check them on a cadence. In a field that moves monthly, a short list you actually read beats a huge list you never open.
21
Horizon

Future Scope

Where this is heading, and how to keep your edge as it moves. Framed as direction to lean into, not predictions to bank on.

Direction

Where agentic AI is heading

ConceptThe trajectory is clear in shape even if dates aren't: longer-horizon autonomy, more reliable tool use, agents that plan and self-correct over many steps, and deeper integration into the tools you already use.
→ 01

Longer autonomy

Agents holding goals across many steps with less hand-holding.

→ 02

Tighter tool use

More reliable function calling, richer connectors, fewer dead-ends.

→ 03

Eval-native dev

Measurement built into the loop, not bolted on after.

→ 04

Ambient integration

Claude inside the repo, sheet, deck, browser — not a separate tab.

Mastery moveBet on the durable skills, not the feature of the month: decomposition, evaluation, and clean tool design age well across every model generation.
Strategy

How to keep your edge

ConceptAs raw capability rises, the differentiator moves up the stack: from 'can you make it work' to 'can you make it trustworthy, fast, and reusable' — judgment, taste, and systems thinking.
Own the evals — the measuring stick is the moat.
Build reusable assets (skills, templates, patterns), not one-offs.
Stay close to messy real-world data — that's where models still need a human who understands the domain.
Mastery moveDeep domain knowledge plus agentic fluency is a rare pair. Lean into that intersection — it is where the most durable senior roles sit.
Maintenance

Keep this hub alive

ConceptThis page is an artifact — iterate on it. Add a domain when a new capability lands; add a card when you learn something worth keeping; mark mastery as you go.
New model or feature drops → add or update a card, don't start over.
Turn each real lesson into a card — the hub becomes your second memory.
Ask for targeted edits ('add domain 13: X') so everything that works stays intact.
Mastery moveTreat your own learning like your pipelines: every fixed gap becomes a permanent entry. A hub that grows with you beats any course that ends.
Provenance

Sources & last updated

ConceptThe time-sensitive facts in this hub — the model lineage, capability dates, Claude Code features, and the Fable 5 / Mythos 5 status — were verified against Anthropic's official docs and announcements. They change; treat them as perishable and re-check the live sources.
Models, tiers & history — platform.claude.com (models overview)
Claude Code, hooks & subagents — docs.claude.com (Claude Code)
Fable 5 / Mythos 5 access status — anthropic.com/news/fable-mythos-access
Prompt engineering — docs.claude.com (prompting)
Product news — anthropic.com/news

Last verified: 28 June 2026. Model names, pricing, availability, and the Fable 5 status may have changed since — the linked pages are the source of truth.

Mastery moveTreat any model or status fact as perishable: re-verify before you depend on it. A reference is only as trustworthy as its last-checked date — which is exactly why this card exists.