Using GenAI and Claude Code: Prompt and Context Engineering
I had spent the past few months extensively learning and using Claude Code for my daily work, averaging 200M tokens per day (excluding media files).
This experience completely transformed me from a GenAI skeptic to a heavy user (with some brain fry. Yes the AI brain fry is real).
I would like to share a collection of what I leared technically in this post. This post is organized in two parts: first a vocabulary of key concepts, then practical tips.
This is for anyone with basic understanding of GenAI and curious about my perspective on using it. I hope these notes are helpful.
(No hype, no marketing, no clickbaits. I hate them. All written by hand, proofread by GenAI)
Task-Oriented Workflow
For our purposes, we focus on the task-oriented workflow:
- a well-defined task
- clear success criteria
Example: create a website
- the task is generating code and/or instructions to deploy a website
- one success criterion is that a user can open and view the content successfully in their browser
Example that is beyond our scope: ask me a random question at a random time while I’m sleeping
Automation vs Augmentation
For automation, no human is involved. The whole process doesn’t require human presence.
- narrow tasks
- success criteria can be automatically verified deterministically
For augmentation, the human drives. GenAI, guided by human’s judgement, improves the human’s work or how the human works.
- most modern GenAI uses fall in this category
My Perspective of Using GenAI: Prompt and Context Engineering Is All
For GenAI use, essentially we are only doing two things:
- prompt engineering
- context engineering
while we use prompt engineering and other ways, such as config files, to manage the context.
When we install and enable a Skill, the description part of the Skill is added to the system prompt at startup.
When the model executes the skill, it first finds the Skill file from its context, then loads the complete SKILL.md file progressively, which is again pre-written prompt that are instructions on how to execute the subtask described by the Skill.
The model reads the skill instructions into its context, then produces output or calls tools accordingly.
There are definitely other perspectives. But I will use this “prompt + context” engineering view.
Primary Problems When Using GenAI
There are many issues as of today when we use GenAI to augment our work. Below are three main problems that I have to always keep in mind when using GenAI:
- Hallucination
- Non-determinism
- Limited human energy
Hallucination
GenAI, as of today, can present and operate on wrong information confidently and convincingly.
When certain questions do not have well-defined answers, or do not have an answer at all, the model may attempt to make its guess instead of admitting it doesn’t know the answer. And the guess is almost always not expected by the user.
While doing prompt and context engineering, our top priority is to design the prompt and the context in a way that minimizes the room for hallucination.
One quick mitigation is to explicitly tell the model:
Say don't know if you don't know. Say you can't reach a conclusion if you don't have sufficient evidence
Non-Determinism
GenAI is probability based and not deterministic.
Usually this isn’t a problem with well-defined tasks.
However, this means that the process can never be fully “verified”.
The same model can run one billion times without issue. But for the next run, it can still generate something that’s completely wrong.
The most we can claim is high confidence. The only thing we can truly verify is deterministic code or other artifacts it produces
(But, this is also a beauty of GenAI. This is some real “randomness”)
When using the GenAI tools, we need to always stay aware of the probability.
Example: Even the user said “No”, the model still treated that as a yes: Shall i implement it?
More to discuss later.
Limited Human Energy
Whoever uses GenAI to augment their work is responsible for the final output.
Unfortunately, the human only has limited energy.
We can’t review every single conversation with the model. We can’t review every single line generated by the model.
No matter how confident the model sounds, no matter how thorough the verification script is, GenAI can still make extremely obvious errors. They are sometimes so obvious that humans won’t think the models can make those mistakes.
For example, I asked the GenAI to simplify the code. The agent then divided one function into smaller functions. Then it realizes that some variables are lost, and decided to compute the variables multiple times in each function. I don’t know how that can be called “simplification”.
See Vibe Coding Failures for well-known GenAI incidents with real-world impacts.
Several mitigation tips:
- explore and let the model interview you to get as many edge cases as possible before starting work
- break the task into smallest subtasks
- aggressively and proactively verify each substep in each subtask
- ask the model to review the output with you together: “Now let’s walk through the code line by line together. Starting from the first function…”
This one can only be solved by you, the user.
Brief Glossary Overview
Token
The basic unit of text that a model can process.
- not necessarily one word
Can be used to measure model usage to bill the users.
LLM - Large Language Model
A math computation to autocomplete a sentence.
We have a plethora of models to choose from, with different
- speed
- performance (for different areas)
- cost
- context size
- non-text input supports
Harness vs Models
The term “harness” is unfortunately very new and there is no clear definition. I will avoid using this word later on.
For now, let’s define the “harness” as all the programs, architectures that connect users to a model so that the user can use the LLM to complete certain tasks. The user can customize these to use specific “persona” prompts, skills, save some “memories”…
All the “harness” will eventually send data to a model to compute, and present the output to the user in some way.
When I say I use Claude Code, I mean I use the “harness” called Claude Code developed by Anthropic to send data to some model, which may or may not be Anthropic’s models.
And “harness engineering” means setting up those programs, configs, customizations, architectures… to serve the users using models.
Read more: Harness engineering for coding agent users
Context
A context window is the model’s working memory: the maximum amount of tokens an LLM can process in one interaction.
Claude has a very nice visualization of their context window: Explore the context window - Claude Code Docs
For Claude Code, the context consists of
- Claude Code System prompt
- CLAUDE.md and rules
- previous conversation history, including user’s prompt and model’s output
- what tools are installed and how/when to use
- data that the prompts asked the model to read
- …
LLM remembers the beginning and end better than the middle (Lost in the Middle: How Language Models Use Long Contexts)
Prompt
The input for an LLM to generate a response or to behave in a certain way.
I alwyas try to ensure the prompt has:
- Clear and specific task instructions
- Just enough background information and data for this task. No more no less
- Constraints, examples, and edge cases if applicable
Also output format if needed.
Example Prompt with Above 4 Parts
Use GenAI to fix a bug.
Locate the root cause, fix, and verify the change for the bug where ...
Spawn parallel subagents to explore the C++ classes in @example/source/code/ and any other code that
is potentially related to the bug.
Read the log in @example/log/path, using filters of ..., with timestamp of the bug at 11:22:33.
The bug is triggered when ..., and the expected behavior is...
Enter plan mode after you explored the codebase for the fix and verification.
Use /example-skill-run-test to run the tests, and /example-skill-run-on-actual-device to
run the code on an actual device to verify it works.
Do NOT modify code in @example/vendor/code. Only use ... for the fix. Your fix shouldn't have any downstream side effects for unrelated modules.
Do NOT stop until the full test suites pass and the bug is confirmed to be fixed.
Use GenAI to sort files.
You are in a directory of documents.
1. Understand the current structure but do NOT read the files since they are large and some are binary;
2. Collect the files with meaningful names into a new directory called ...
3. For the files with meaningless names, such as hash or "no title" or "1", read the text files, sort the non-text files based on types
...
Extensions
MCP
A networking protocol designed for the models to use external services.
Models can still use existing APIs (such as RESTful ones) and are often preferred over MCP for well-documented services.
Subagent
Another model session that has its own independent context window.
- isolate context
- parallel execution
- background processing
Subagents can be used without an AGENT.md file. Just ask in the prompt.
Skill
A set of instructions for specific tasks or workflows, such as MCP use.
Most useful when
- repeated workflow
- domain knowledge
A skill can invoke other skills, teach the model to use MCP, spawn subagents with specific prompts to use specific skills.
Read more: The Complete Guide to Building Skill for Claude
Hook
Commands that Claude Code runs at lifecycle events, such as before tool call, session start.
Example: show some notification when the model finishes working.
Which One to Use
- External API: MCP
- Parallel processing: sub-agents
- Domain knowledge, specific workflow: Skill
- Explicit shortcuts: slash command (system level “skill”)
- Event triggers: Hooks
The “slash command” has overlaps with “skills”. You can ask Claude Code to explain the difference.
Creating a Subagent or Skill
I personally no longer creates subagents since I can create skills that hold the prompts of the subagents.
To create a skill,
- directly write the skill file
- ask the model to write the skill
- using skill to ask the model to write the skill
- …
There are many skill creating skills. Example: Anthropic Skill Creator Skill
To create a skill from repeated workflows:
- execute the workflow once with manual prompting
- at the end of the workflow, explicitly prompt the model to summarize and create a skill from the conversation
Prompt and Context Engineering Tips
Here are the tips I have found most useful:
Context Principle
Keep the context window as small and focused as possible.
Only add to persistent context if the model can’t infer on its own.
Avoid Over-Installing Extensions
Imagine you are Claude Code. If you have hundreds of skills installed, and some have overlapping functions, how would you know which one to use?
Even the extensions are loaded to the context at startup progressively (progressive disclosure), with hundreds of skills installed, the context will be bloated.
Avoid Over-Using Extensions
We don’t need a skill for every single thing.
- Yes, skills can automate and the model can run all the commands.
- However, for some commands it’s simply faster, safer, and more efficient to run by hand.
Example: I always run git commands myself.
- First I already developed the muscle memories for those commands, with command line aliases that I have used for decades.
- Second, I can always use
gitas the source of truth for the changes made by the model. - No matter what mess the model creates, I can always
git restoreto discard all the changes, and usegit committo save the code checkpoints.
Decompose
Break tasks into smallest possible steps.
Keep CLAUDE.md or Rule Files Minimal
Only keep the generic instructions that will actually change model behavior in CLAUDE.md or rule files.
- do not include proofs, full lectures, lengthy quotes from textbooks…
- model doesn’t need to be “persuaded” or “lectured”. Model needs clear instructions
I only use rule files. And each of my rule files is mostly a few lines of bullet points.
Pre-Filter Input to the Model
The model doesn’t need verbose sentences or humand-readable command outputs.
Example plugins to save token usage:
For specific large command outputs, such as compiler output, I either ask the model to write the file to a tmp text file
on disk, then grep " error:" to find the error only, or ask the model to write python scripts to parse the output
Only Include Relevant Info
Do not feed everything.
“Meta Prompting”
Ask the model to write prompts for you.
- “I want to …, please help me write a short and specific prompt to …”
- “Give me a good starting point for meta prompting for my task …”
Iteration
Go through the steps iteratively:
- explore
- plan
- clarify
- iterate above until a well-defined plan
- execute
- verify
- iterate above until everything verified
Discuss alternative options.
Ask the model to interview you.
Establish a self-feedback loop for the model to iterate on its own.
Let the Model Teach Itself
Ask theoretical questions with answers that you want the model to apply in the next steps.
Example:
- User: “How do I implement a round button using HTML and CSS? What are pros and cons of different approachs?”
- Model: “Here are X ways to do it…”
- User: “Use the third way to implement a button for …”
Here I don’t know which approach is best, so I let the model explain before committing. With this extra step, we can control how the model behaves more precisely.
Aggressively Clean the Context
Use /clear as soon as a new task is started.
Aggressively Trim Persistent Context Files
For Claude Code, persistent context files may include CLAUDE.md and memory files it creats.
- Claude can create very specific memory files that will never be useful again.
- New models will be smarter and old instructions usually no longer needed.
Aggressively Save “Checkpoints” and Update TODO Lists
- use
/compactwhen you see a “milestone” of the coversation, don’t wait until the context window gets full. - explicitly ask model to update TODO lists, save the progress to a local text file
Aggressively Use Subagents
For each step, I always ask the model to figure out how to parallelize processing using subagents.
- “spawn subagents if possible to review the code for correctness, performance, style, consistency, side effects, security”
- “spawn subagents if possible to come up with 3 different answers and consolidate them into 1 before presenting to me”
Aggressively Specify
Never let the model guess anything.
Download the whole document for the model to read.
Copy the whole website for the model to learn from.
Example: there is a “copy page” button in https://code.claude.com/docs/en/skills
Proactively Address Model “laziness”
Models frequently get lazy:
Passive waiting
- “Should I continue?”
- “Does that sound good?”
- “…” Doesn’t say anything. Just stop at the middle of working
Brute-force retry
- keep trying one command that keeps failing
- doesn’t think or try to fix the error
Blame others
- “The MCP server is down and I can’t continue”
Idle tool use
- Run a tool that doesn’t do anything
Busywork
- keep trying random things but never fixing the actual root cause or invoking the correct command
- Runs super fancy scripts, keep “working” on the script that the script gets super complicated and still fails
I haven’t found concrete solutions, but I’ve updated my rule files to explicitly instruct the model to avoid them.
Read more:
Claude Code Specific
Keeping this part short.
Claude Code Directory
Useful Claude Code Features
Use /effort to change the current model token budget for extended thinking
- this is a inference API parameter
ultrathinkkeyword in prompt to trigger high effort- higher effort means deeper reasoning
We can pre-approve “harmless” readonly bash commands like this in .claude/settings.json
{
"permissions": {
"allow": [
"Bash(ls:*)",
"Bash(cat:*)",
"Bash(grep:*)",
"Bash(find:*)",
"Bash(git log:*)",
"Bash(git diff:*)",
"Bash(git status:*)"
]
}
}
Can also use a pre-tool-use hook to programmatically approve.
Danger zone: claude --dangerously-skip-permissions
- use only in sandboxes
Claude sessions are persisted on disk
/renameto name your current claude sessionclaude -cto continue the last conversationclaude -rto resume specific session
! to run one command to share the output with Claude.
Links
Anthropic Claude Code
- https://www.anthropic.com/engineering
- https://resources.anthropic.com/hubfs/The-Complete-Guide-to-Building-Skill-for-Claude.pdf
- https://www.anthropic.com/engineering/claude-code-best-practices
- https://code.claude.com/docs/en/claude-directory
- https://code.claude.com/docs/en/context-window
- https://www.anthropic.com/engineering/building-effective-agents
- https://www.anthropic.com/engineering/writing-tools-for-agents
- https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
OpenAI
Others
- What happens when sending a message to Claude Code https://ccunpacked.dev/
Conclusion
Prompt and context engineering is all I’m working on when use GenAI models.
- context is king
- prompt is instructing the model to build its context and execute tasks based on the built context
Hallucination and non-determinism is unavoidable as of today.
I always review all the GenAI output, with the help of GenAI.
“There is no one correct way to use Claude Code”
- Boris Cherny, creator of Claude Code
Comments