Using GenAI and Claude Code: Prompt and Context Engineering | Blog

I had spent the past few months extensively learning and using Claude Code for my daily work, averaging 200M tokens per day (excluding media files).

This experience completely transformed me from a GenAI skeptic to a heavy user (with some brain fry. Yes the AI brain fry is real).

I would like to share a collection of what I leared technically in this post. This post is organized in two parts: first a vocabulary of key concepts, then practical tips.

This is for anyone with basic understanding of GenAI and curious about my perspective on using it. I hope these notes are helpful.

(No hype, no marketing, no clickbaits. I hate them. All written by hand, proofread by GenAI)

Task-Oriented Workflow

For our purposes, we focus on the task-oriented workflow:

a well-defined task
clear success criteria

Example: create a website

the task is generating code and/or instructions to deploy a website
one success criterion is that a user can open and view the content successfully in their browser

Example that is beyond our scope: ask me a random question at a random time while I’m sleeping

Automation vs Augmentation

For automation, no human is involved. The whole process doesn’t require human presence.

narrow tasks
success criteria can be automatically verified deterministically

For augmentation, the human drives. GenAI, guided by human’s judgement, improves the human’s work or how the human works.

most modern GenAI uses fall in this category

My Perspective of Using GenAI: Prompt and Context Engineering Is All

For GenAI use, essentially we are only doing two things:

prompt engineering
context engineering

while we use prompt engineering and other ways, such as config files, to manage the context.

When we install and enable a Skill, the description part of the Skill is added to the system prompt at startup.

When the model executes the skill, it first finds the Skill file from its context, then loads the complete SKILL.md file progressively, which is again pre-written prompt that are instructions on how to execute the subtask described by the Skill.

The model reads the skill instructions into its context, then produces output or calls tools accordingly.

There are definitely other perspectives. But I will use this “prompt + context” engineering view.

Primary Problems When Using GenAI

There are many issues as of today when we use GenAI to augment our work. Below are three main problems that I have to always keep in mind when using GenAI:

Hallucination
Non-determinism
Limited human energy

Hallucination

GenAI, as of today, can present and operate on wrong information confidently and convincingly.

When certain questions do not have well-defined answers, or do not have an answer at all, the model may attempt to make its guess instead of admitting it doesn’t know the answer. And the guess is almost always not expected by the user.

While doing prompt and context engineering, our top priority is to design the prompt and the context in a way that minimizes the room for hallucination.

One quick mitigation is to explicitly tell the model:

Say don't know if you don't know. Say you can't reach a conclusion if you don't have sufficient evidence

Non-Determinism

GenAI is probability based and not deterministic.

Usually this isn’t a problem with well-defined tasks.

However, this means that the process can never be fully “verified”.

The same model can run one billion times without issue. But for the next run, it can still generate something that’s completely wrong.

The most we can claim is high confidence. The only thing we can truly verify is deterministic code or other artifacts it produces

(But, this is also a beauty of GenAI. This is some real “randomness”)

When using the GenAI tools, we need to always stay aware of the probability.

Example: Even the user said “No”, the model still treated that as a yes: Shall i implement it?

More to discuss later.

Limited Human Energy

Whoever uses GenAI to augment their work is responsible for the final output.

Unfortunately, the human only has limited energy.

We can’t review every single conversation with the model. We can’t review every single line generated by the model.

No matter how confident the model sounds, no matter how thorough the verification script is, GenAI can still make extremely obvious errors. They are sometimes so obvious that humans won’t think the models can make those mistakes.

For example, I asked the GenAI to simplify the code. The agent then divided one function into smaller functions. Then it realizes that some variables are lost, and decided to compute the variables multiple times in each function. I don’t know how that can be called “simplification”.

See Vibe Coding Failures for well-known GenAI incidents with real-world impacts.

Several mitigation tips:

explore and let the model interview you to get as many edge cases as possible before starting work
break the task into smallest subtasks
aggressively and proactively verify each substep in each subtask
ask the model to review the output with you together: “Now let’s walk through the code line by line together. Starting from the first function…”

This one can only be solved by you, the user.

Brief Glossary Overview

Token

The basic unit of text that a model can process.

not necessarily one word

Can be used to measure model usage to bill the users.

LLM - Large Language Model

A math computation to autocomplete a sentence.

We have a plethora of models to choose from, with different

speed
performance (for different areas)
cost
context size
non-text input supports

Harness vs Models

The term “harness” is unfortunately very new and there is no clear definition. I will avoid using this word later on.

For now, let’s define the “harness” as all the programs, architectures that connect users to a model so that the user can use the LLM to complete certain tasks. The user can customize these to use specific “persona” prompts, skills, save some “memories”…

All the “harness” will eventually send data to a model to compute, and present the output to the user in some way.

When I say I use Claude Code, I mean I use the “harness” called Claude Code developed by Anthropic to send data to some model, which may or may not be Anthropic’s models.

And “harness engineering” means setting up those programs, configs, customizations, architectures… to serve the users using models.

Context

A context window is the model’s working memory: the maximum amount of tokens an LLM can process in one interaction.

Claude has a very nice visualization of their context window: Explore the context window - Claude Code Docs

For Claude Code, the context consists of

Claude Code System prompt
CLAUDE.md and rules
previous conversation history, including user’s prompt and model’s output
what tools are installed and how/when to use
data that the prompts asked the model to read
…

LLM remembers the beginning and end better than the middle (Lost in the Middle: How Language Models Use Long Contexts)

Prompt

The input for an LLM to generate a response or to behave in a certain way.

I alwyas try to ensure the prompt has:

Clear and specific task instructions
Just enough background information and data for this task. No more no less
Constraints, examples, and edge cases if applicable

Also output format if needed.

Example Prompt with Above 4 Parts

Use GenAI to fix a bug.

Locate the root cause, fix, and verify the change for the bug where ...

Spawn parallel subagents to explore the C++ classes in @example/source/code/ and any other code that
is potentially related to the bug.

Read the log in @example/log/path, using filters of ..., with timestamp of the bug at 11:22:33.

The bug is triggered when ..., and the expected behavior is...

Enter plan mode after you explored the codebase for the fix and verification.

Use /example-skill-run-test to run the tests, and /example-skill-run-on-actual-device to
run the code on an actual device to verify it works.

Do NOT modify code in @example/vendor/code. Only use ... for the fix. Your fix shouldn't have any downstream side effects for unrelated modules.

Do NOT stop until the full test suites pass and the bug is confirmed to be fixed.

Use GenAI to sort files.

You are in a directory of documents.

1. Understand the current structure but do NOT read the files since they are large and some are binary;
2. Collect the files with meaningful names into a new directory called ...
3. For the files with meaningless names, such as hash or "no title" or "1", read the text files, sort the non-text files based on types
...

Extensions

MCP

A networking protocol designed for the models to use external services.

Models can still use existing APIs (such as RESTful ones) and are often preferred over MCP for well-documented services.

Subagent

Another model session that has its own independent context window.

isolate context
parallel execution
background processing

Subagents can be used without an AGENT.md file. Just ask in the prompt.

Skill

A set of instructions for specific tasks or workflows, such as MCP use.

Most useful when

repeated workflow
domain knowledge

A skill can invoke other skills, teach the model to use MCP, spawn subagents with specific prompts to use specific skills.

Hook

Commands that Claude Code runs at lifecycle events, such as before tool call, session start.

Example: show some notification when the model finishes working.

Which One to Use

External API: MCP
Parallel processing: sub-agents
Domain knowledge, specific workflow: Skill
Explicit shortcuts: slash command (system level “skill”)
Event triggers: Hooks

The “slash command” has overlaps with “skills”. You can ask Claude Code to explain the difference.

Creating a Subagent or Skill

I personally no longer creates subagents since I can create skills that hold the prompts of the subagents.

To create a skill,

directly write the skill file
ask the model to write the skill
using skill to ask the model to write the skill
…

There are many skill creating skills. Example: Anthropic Skill Creator Skill

To create a skill from repeated workflows:

execute the workflow once with manual prompting
at the end of the workflow, explicitly prompt the model to summarize and create a skill from the conversation

Prompt and Context Engineering Tips

Here are the tips I have found most useful:

Context Principle

Keep the context window as small and focused as possible.

Only add to persistent context if the model can’t infer on its own.

Avoid Over-Installing Extensions

Imagine you are Claude Code. If you have hundreds of skills installed, and some have overlapping functions, how would you know which one to use?

Even the extensions are loaded to the context at startup progressively (progressive disclosure), with hundreds of skills installed, the context will be bloated.

Avoid Over-Using Extensions

We don’t need a skill for every single thing.

Yes, skills can automate and the model can run all the commands.
However, for some commands it’s simply faster, safer, and more efficient to run by hand.

Example: I always run git commands myself.

First I already developed the muscle memories for those commands, with command line aliases that I have used for decades.
Second, I can always use git as the source of truth for the changes made by the model.
No matter what mess the model creates, I can always git restore to discard all the changes, and use git commit to save the code checkpoints.

Decompose

Break tasks into smallest possible steps.

Keep CLAUDE.md or Rule Files Minimal

Only keep the generic instructions that will actually change model behavior in CLAUDE.md or rule files.

do not include proofs, full lectures, lengthy quotes from textbooks…
model doesn’t need to be “persuaded” or “lectured”. Model needs clear instructions

I only use rule files. And each of my rule files is mostly a few lines of bullet points.

Pre-Filter Input to the Model

The model doesn’t need verbose sentences or humand-readable command outputs.

Example plugins to save token usage:

For specific large command outputs, such as compiler output, I either ask the model to write the file to a tmp text file on disk, then grep " error:" to find the error only, or ask the model to write python scripts to parse the output

Only Include Relevant Info

Do not feed everything.

“Meta Prompting”

Ask the model to write prompts for you.

“I want to …, please help me write a short and specific prompt to …”
“Give me a good starting point for meta prompting for my task …”

Iteration

Go through the steps iteratively:

explore
plan
clarify
iterate above until a well-defined plan
execute
verify
iterate above until everything verified

Discuss alternative options.

Ask the model to interview you.

Establish a self-feedback loop for the model to iterate on its own.

Let the Model Teach Itself

Ask theoretical questions with answers that you want the model to apply in the next steps.

Example:

User: “How do I implement a round button using HTML and CSS? What are pros and cons of different approachs?”
Model: “Here are X ways to do it…”
User: “Use the third way to implement a button for …”

Here I don’t know which approach is best, so I let the model explain before committing. With this extra step, we can control how the model behaves more precisely.

Aggressively Clean the Context

Use /clear as soon as a new task is started.

Aggressively Trim Persistent Context Files

For Claude Code, persistent context files may include CLAUDE.md and memory files it creats.

Claude can create very specific memory files that will never be useful again.
New models will be smarter and old instructions usually no longer needed.

Aggressively Save “Checkpoints” and Update TODO Lists

use /compact when you see a “milestone” of the coversation, don’t wait until the context window gets full.
explicitly ask model to update TODO lists, save the progress to a local text file

Aggressively Use Subagents

For each step, I always ask the model to figure out how to parallelize processing using subagents.

“spawn subagents if possible to review the code for correctness, performance, style, consistency, side effects, security”
“spawn subagents if possible to come up with 3 different answers and consolidate them into 1 before presenting to me”

Aggressively Specify

Never let the model guess anything.

Download the whole document for the model to read.

Copy the whole website for the model to learn from.

Example: there is a “copy page” button in https://code.claude.com/docs/en/skills

Proactively Address Model “laziness”

Models frequently get lazy:

Passive waiting

“Should I continue?”
“Does that sound good?”
“…” Doesn’t say anything. Just stop at the middle of working

Brute-force retry

keep trying one command that keeps failing
doesn’t think or try to fix the error

Blame others

“The MCP server is down and I can’t continue”

Idle tool use

Run a tool that doesn’t do anything

Busywork

keep trying random things but never fixing the actual root cause or invoking the correct command
Runs super fancy scripts, keep “working” on the script that the script gets super complicated and still fails

I haven’t found concrete solutions, but I’ve updated my rule files to explicitly instruct the model to avoid them.

Claude Code Specific

Keeping this part short.

Claude Code Directory

Useful Claude Code Features

Use /effort to change the current model token budget for extended thinking

this is a inference API parameter
ultrathink keyword in prompt to trigger high effort
higher effort means deeper reasoning

We can pre-approve “harmless” readonly bash commands like this in .claude/settings.json

{
  "permissions": {
    "allow": [
      "Bash(ls:*)",
      "Bash(cat:*)",
      "Bash(grep:*)",
      "Bash(find:*)",
      "Bash(git log:*)",
      "Bash(git diff:*)",
      "Bash(git status:*)"
    ]
  }
}

Can also use a pre-tool-use hook to programmatically approve.

Danger zone: claude --dangerously-skip-permissions

use only in sandboxes

Claude sessions are persisted on disk

/rename to name your current claude session
claude -c to continue the last conversation
claude -r to resume specific session

! to run one command to share the output with Claude.

Conclusion

Prompt and context engineering is all I’m working on when use GenAI models.

context is king
prompt is instructing the model to build its context and execute tasks based on the built context

Hallucination and non-determinism is unavoidable as of today.

I always review all the GenAI output, with the help of GenAI.

“There is no one correct way to use Claude Code”

Boris Cherny, creator of Claude Code