Section 03
Context Engineering
Prompt engineering tells an AI what to do. Context engineering determines everything it knows while doing it. This is where the real leverage is.
If prompt engineering is about how you ask, context engineering is about everything you put in the room before you ask. It's the practice of deliberately constructing, managing, and optimizing the information that flows into an LLM's context window, so that by the time it starts generating, it has exactly what it needs and nothing it doesn't.
Most people spend 90% of their time tweaking prompts and 0% thinking about context. The professionals who get consistently extraordinary results from AI flip that ratio.
The key distinction
Prompt Engineering
Crafting the specific instruction or question you give the model.
What you say to the AI.
Context Engineering
Designing the entire information environment the model operates in: what it knows, what it remembers, what it can see.
Everything the AI knows when you say it.
Explain context engineering to me as if I'm completely new to AI. How is it different from prompt engineering? Why do experts say it's where the real leverage in AI systems is? Give me a clear mental model and a concrete real-world example showing the difference between a poorly-engineered context and a well-engineered one.
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
The Context Window: The AI's Entire World
An LLM doesn't have memory the way you do. Every time it generates a response, it can only work with what's currently in its context window, a fixed-size buffer measured in tokens. Think of it as a whiteboard. The model can see everything written on the whiteboard right now, but nothing that was erased.
The context window is the AI's entire working memory
Whatever is in the context window is what the model"knows" for that response. Your conversation history, any documents you pasted in, the system prompt, tool results. All of it competes for space on that whiteboard. When the window fills up, something has to go.
Explain how the context window works in LLMs at a deeper level. What happens when the context fills up? How do models like Claude handle very long contexts? What is "context compression" and "context summarization"? Are there performance differences between something at the beginning vs end of the context?
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
The βLost in the Middleβ Problem
Here's something most people don't know: LLMs don't pay equal attention to everything in the context window. Research has shown they pay significantly more attention to content at the beginning and end of the context, and less attention to content buried in the middle. This is called the lost in the middle phenomenon.
Attention distribution across the context window
What this means for you
- Put your most critical instructions at the beginning (system prompt) or end (just before asking)
- Don't bury key facts in the middle of a massive document paste
- If you paste a long document, summarize the key points and repeat them at the end
- For agents doing long tasks, periodically re-state the goal
Explain the "lost in the middle" phenomenon in LLMs. What research showed this? How significant is the effect across different models? What are the practical strategies developers use to work around it, things like positional emphasis, re-stating instructions, and structured context ordering?
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
The 6 Types of Context
Context isn't just your prompt. A well-engineered AI system typically assembles context from several distinct sources before the model ever sees your question.
1. System Prompt (Instructions)
The system prompt is the foundational layer of context, set before the conversation begins. It defines the model's role, constraints, personality, output format, and any standing rules. In agent systems like Claude Code, this is where tools, capabilities, and behavioral guidelines live.
You are a senior software engineer at Quick2Bid specializing in Salesforce CPQ integrations.
ROLE:
- Answer questions about CPQ configuration, pricing rules, and quote generation
- Always recommend solutions compatible with Salesforce Spring '24 or later
- Flag anything that could break existing quote templates
OUTPUT FORMAT:
- Lead with the direct answer
- Follow with a code example if relevant
- End with any caveats or gotchas
CONSTRAINTS:
- Never recommend third-party packages without noting they require additional licensing
- If unsure, say so β do not hallucinate Salesforce API names2. Conversation History (Memory)
Every message you've exchanged in the conversation gets appended to the context. This is why AI can βrememberβ what you said earlier in a chat: it's literally still in the window. As conversations grow long, older turns get pushed toward the middle (lower attention) or eventually truncated entirely.
Conversation history management
Expert systems handle this by:
- Summarizing old turns and replacing them with a compressed summary
- Pruning irrelevant exchanges that don't affect the current task
- Pinning critical facts from earlier in the conversation
3. Retrieved Context (RAG)
RAG (Retrieval Augmented Generation) is the technique of dynamically fetching relevant information from an external knowledge base and injecting it into the context at query time. Instead of trying to put your entire company wiki into the context (impossible), you retrieve only the 3-5 most relevant chunks and inject those.
How RAG works
- 1Embed your knowledge base: Convert all your documents into vector embeddings (numerical representations of meaning) and store them in a vector database (Pinecone, Weaviate, pgvector).
- 2Embed the user's query: When a user asks a question, convert that question into the same vector space.
- 3Semantic search: Find the document chunks whose embeddings are most similar to the query embedding (cosine similarity).
- 4Inject and generate: Prepend the retrieved chunks to the context: "Here is relevant context: [chunks]. Now answer: [question]".
Explain Retrieval Augmented Generation (RAG) from first principles. What problem does it solve that fine-tuning doesn't? What are vector embeddings and how does semantic search work? What is a vector database? Walk me through building a simple RAG system step by step. What are the main failure modes of RAG systems and how do you fix them?
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
4. Tool / Function Call Results
When an AI agent uses a tool (searches the web, runs code, reads a file), the result gets appended back into the context. The model then βreadsβ that result and decides what to do next. This is the core mechanism that makes agents work: the context grows with each action-observation pair.
[System]: You have access to the following tools: read_file, run_bash, search_web
[User]: What does the pricing function in our codebase return?
[Assistant β Tool call]: read_file("src/pricing/calculate.ts")
[Tool result]:
export function calculatePrice(qty: number, unitPrice: number): number {
const subtotal = qty * unitPrice;
const discount = qty > 100 ? 0.15 : qty > 50 ? 0.08 : 0;
return subtotal * (1 - discount);
}
[Assistant]: The pricing function applies tiered discounts β 8% for 51-100 units,
15% for 100+ units β and returns the discounted total.5. Structured Data & Documents
PDFs, spreadsheets, JSON payloads, database query results, API responses: all of these can be injected into context directly. The key skill here is pre-processing: don't dump 40 pages of raw data. Extract, summarize, and structure it so the model can navigate it efficiently.
Context formatting matters enormously
The same information structured differently produces dramatically different results:
β Raw dump
Q1 revenue 142000 Q2 revenue 168000 Q3 revenue 201000 expenses Q1 98000 Q2 112000...
β Structured
Revenue: Q1=$142k, Q2=$168k, Q3=$201k Expenses: Q1=$98k, Q2=$112k Margin trending up 12% QoQ
6. Long-Term Memory
The context window is wiped between sessions. But what if the AI needs to remember things across days, weeks, or projects? That's where long-term memory systems come in. These store information externally and retrieve it back into context when relevant. Claude Code's memory system (the MEMORY.md file) is exactly this: a persistent knowledge store that gets loaded into context at the start of each session.
Specific past events and interactions. "Last Tuesday you told me the deadline moved to Friday."
General facts and knowledge. "This user prefers TypeScript over JavaScript."
How to do things. "When this user asks for code, always include tests."
Explain the different types of memory in AI agent systems. What is the difference between in-context memory, external memory, and parametric memory? How do production AI applications implement persistent memory across sessions? What are the tradeoffs between storing information in context vs. in a vector database vs. fine-tuning it into the model?
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
Context Engineering Patterns That Actually Work
Pattern 1: The Context Sandwich
Use the attention distribution to your advantage. Put the most important instructions at the very beginning (system prompt), put background/reference material in the middle, and re-state the key objective right before the question. The model gets a strong signal at both ends.
[SYSTEM β Beginning, high attention]
You are a CPQ specialist. Your job is to find pricing errors.
Always cite the specific line item when flagging an issue.
[MIDDLE β Reference material, lower attention]
Here is the quote data:
Line 1: Widget A, qty 50, unit price $200, total $9,500...
[...hundreds of lines of data...]
[END β Re-stated objective, high attention]
Review the quote data above. List every line item where the
calculated total does not match qty Γ unit price Γ applicable
discount. Format as a table.Pattern 2: Progressive Context Loading
Don't dump everything at once. Start with high-level context, let the model respond, then load in the detailed context for the next step. This mirrors how a human expert would brief a colleague: overview first, details as needed.
Pattern 3: Context Distillation
Use the AI itself to compress context before passing it to another AI call. Long conversation? Ask the model to summarize the key decisions and facts, then use that summary as the context for the next call instead of the full history. This is how production agent systems handle very long tasks.
// Step 1: Compress the long conversation history
PROMPT: "Summarize this conversation into bullet points covering:
(1) the original goal, (2) decisions made so far,
(3) open questions, (4) constraints discovered.
Be extremely concise β this summary will replace the full history."
// Step 2: Use the summary as context for the next phase
CONTEXT: [summary from step 1]
PROMPT: "Given the above context, continue with the next phase..."Pattern 4: Explicit Context Framing
Label your context clearly so the model knows what kind of information it's reading and how much to trust it. Unlabeled context blurs together; labeled context gets processed with appropriate weight.
[VERIFIED FACTS β treat as ground truth]
- Product SKU: Q2B-ENT-001
- Current price: $4,200/year
- Contract end date: 2026-09-30
[DRAFT DATA β may contain errors, verify before using]
- Proposed renewal price: $4,800/year
- Discount applied: 10%
[USER'S STATED GOAL]
Generate a renewal quote and flag any pricing inconsistencies.What are the most effective context engineering patterns used in production AI systems? I want to learn about: context sandwiching, progressive context loading, context distillation, explicit context framing, and any other advanced patterns. For each one, give me a concrete example of when and why you'd use it.
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
Context Engineering for Agents (Claude Code)
Agents are where context engineering gets both most powerful and most complex. An agent runs a loop (observe, think, act) and the context window grows with every step. Managing that growth is the difference between an agent that completes complex tasks and one that loses the plot halfway through.
CLAUDE.md: Persistent Agent Context
Claude Code reads a file called CLAUDE.md from your project root at the start of every session and loads it directly into context. This is the most powerful context engineering lever available in Claude Code: persistent, version-controlled briefing document for the agent.
# Project: Quick2Bid CPQ Integration
## What this codebase does
Salesforce LWC components for PandaDoc e-signature integration within the CPQ quote flow.
## Architecture
- /force-app/main/default/lwc/ β all Lightning Web Components
- pandaDocPanel β the main signing panel, loads inside the quote record page
- Always check existing components before creating new ones
## Coding standards
- TypeScript-style JSDoc on all public methods
- Never use @wire without a fallback for null data
- All API callouts go through QuoteApiService β never call Salesforce directly from LWC
## Current context (update as you go)
- Sprint goal: PandaDoc webhook status sync
- Known issue: pandaDocPanel throws when quote has no line items β ticket #204
- Do NOT touch anything in /legacy β it's being deprecated
## How I like to work
- Show me a plan before writing code on anything > 50 lines
- Prefer editing existing files over creating new ones
- Run the linter after every changeCLAUDE.md is context engineering in practice
Every line in CLAUDE.md is context you don't have to re-explain in every session. A well-maintained CLAUDE.md means the agent starts each session already knowing your architecture, standards, current sprint goals, and known issues, without you having to repeat any of it.
Think of it as the briefing document you'd give a new contractor before their first day.
The Agent Context Stack
When Claude Code starts working on a task, its context is assembled in layers:
How do production AI agent systems manage context across long, multi-step tasks? I want to understand: how does Claude Code's CLAUDE.md work as a context mechanism, how do agents handle context window overflow during long tasks, what strategies do companies like Anthropic and OpenAI recommend for agentic context management, and what does a well-engineered CLAUDE.md look like for a complex software project?
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
Context Engineering Anti-Patterns
These are the mistakes that silently kill the quality of your AI outputs.
The problem
Dumping everything you can think of into the context on the theory that more information = better answers.
The fix
Be surgical. Every token in context costs attention. Include only what's relevant to the current task. More signal, less noise.
The problem
Leaving outdated information in the context that contradicts current reality. The model can't tell what's current.
The fix
Update or remove stale facts. Explicitly mark temporal context: "As of March 2026, the price is X" rather than just "the price is X".
The problem
Providing conflicting instructions or facts in different parts of the context. The model will average them or pick one arbitrarily.
The fix
Audit your context for contradictions before sending. Later instructions generally override earlier ones, but don't rely on this.
The problem
Assuming the model knows things it doesn't: internal acronyms, project-specific terms, unwritten rules.
The fix
Define every domain-specific term. If you wouldn't expect a smart contractor on day one to know it, put it in context.
The problem
Providing great context but not specifying the exact format, length, or structure you need. The model guesses.
The fix
Always specify output format explicitly: "Respond as a JSON array", "Write exactly 3 bullet points", "Keep the answer under 200 words".
The problem
In systems that accept user input, malicious users can inject instructions into the context that override your system prompt.
The fix
Sanitize user inputs. Clearly separate user-provided content from trusted instructions. Consider using delimiters and instructing the model to treat user content as data, not instructions.
What are the most common context engineering mistakes that degrade AI performance in production systems? I want detailed explanations of: context stuffing, stale context problems, contradictory instructions, prompt injection attacks, and implicit assumptions. For each one, give me a real-world example and the specific fix.
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
Advanced: Context in Multi-Agent Systems
When multiple AI agents work together (one planning, one coding, one reviewing), context engineering becomes a system design problem. Each agent has its own context window, and you need to deliberately decide what information flows between them and in what form.
The multi-agent context problem
Agent A finishes a task and produces 10,000 tokens of output. Agent B needs to use that output. Do you pass all 10,000 tokens? A compressed summary? Structured data extracted from it? The wrong choice wastes tokens, loses information, or overwhelms Agent B's context.
The rule: pass structured summaries between agents, not raw outputs.
// β Bad: passing raw agent output between agents
Agent B context: [10,000 tokens of Agent A's raw reasoning and output]
// β
Good: structured handoff document
Agent B context:
HANDOFF FROM: Planning Agent
TASK COMPLETED: Architecture design for auth module
DECISIONS MADE:
- Use JWT with 15min expiry + refresh tokens
- Store refresh tokens in httpOnly cookies
- Rate limit: 5 attempts per IP per 15min
ARTIFACTS PRODUCED:
- /docs/auth-spec.md (full spec, read if needed)
FILES TO CREATE:
- src/auth/jwt.ts
- src/auth/middleware.ts
- src/auth/routes.ts
CONSTRAINTS:
- Must be compatible with existing User model schema
- No new npm packages without approval
YOUR TASK: Implement the above spec.How do you design context passing in multi-agent AI systems? I want to understand: how do you structure handoffs between specialized agents, what information should flow between agents vs. be stored in shared memory, how do orchestrator agents manage subagent contexts, and what are real-world patterns companies use when building systems with multiple LLM agents coordinating on complex tasks?
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
The Context Engineering Checklist
Run through this before any important AI interaction or before building an AI-powered feature:
Content
- Is this information actually relevant to the current task?
- Are there any stale or outdated facts in the context?
- Are there any contradictions between different parts of the context?
- Have I defined all domain-specific terms and acronyms?
Structure
- Are the most critical instructions at the beginning or end?
- Is the context labeled and organized so the model knows what it's reading?
- Have I pre-processed any raw data into a cleaner format?
- Is my output format explicitly specified?
Size
- Is there anything in the context I can safely remove?
- For long tasks, is there a summarization/compression step?
- Am I within a comfortable margin of the context window limit?
Security (for user-facing systems)
- Can user input inject instructions that override my system prompt?
- Is user-provided content clearly separated from trusted instructions?
I want to build a strong context engineering practice for my AI work. Can you synthesize the most important principles into a framework I can actually use day-to-day? Cover: how to audit existing context for quality, how to design context for different use cases (chatbots vs. agents vs. RAG systems), and what metrics or signals tell me my context engineering is working well vs. needs improvement.
π Copy this prompt and paste it into Claude, ChatGPT, or Gemini β it will explain everything in as much detail as you want.
Go Even Deeper
Paste any of these prompts into Claude, ChatGPT, or Gemini for a full tutorial on each topic:
Teach me how to build a production RAG system from scratch using LangChain and Pinecone. Walk me through every step including chunking strategy, embedding model choice, and retrieval tuning.
Explain vector embeddings and semantic search from first principles. How do word embeddings work? How does cosine similarity find relevant documents? What's the difference between sparse and dense retrieval?
Walk me through designing a CLAUDE.md file for a complex software project. What sections should it have? How detailed should it be? How do I keep it updated without it becoming stale?
What is constitutional AI and RLHF? How do these training techniques affect what ends up in a model's 'parametric memory' vs. what needs to be in context?