Prompting Principles
These principles apply across all major LLMs -- GPT, Gemini, Llama, and others. Master them once and they transfer to any model. Platform-specific techniques are covered in each tool track.
Why Prompting Matters
The same model can produce wildly different outputs depending on how you prompt it. Prompting is not just "asking questions" -- it is the primary programming interface for LLMs. A well-crafted prompt can turn a mediocre response into an exceptional one without changing the model, fine-tuning, or writing any code. This makes prompt engineering one of the highest-leverage skills in applied AI.
Anatomy of a Strong Prompt
System Prompts vs User Prompts
Most LLM APIs distinguish between two types of messages. Understanding the difference is critical for building reliable applications.
What it is: A hidden instruction set that the end user typically does not see. Set once per conversation or application.
Purpose: Defines the model's persona, rules, constraints, and output format. Acts as the "operating system" for the conversation.
Example: "You are ShopMate, a customer service assistant for ThreadCo. You can only discuss ThreadCo products and orders. Never discuss competitors. Always respond in the customer's language."
Persistence: Sent with every API call. The model reads it first, before any user messages.
What it is: The actual message from the human user (or from your application code acting on behalf of a user).
Purpose: Contains the specific request, question, or data for this turn of the conversation.
Example: "Where is my order #12345? I placed it three days ago and haven't received a shipping confirmation."
Persistence: Changes every turn. In multi-turn conversations, the full history of user and assistant messages is typically included.
Put stable instructions in the system prompt (persona, rules, format). Put variable data in the user prompt (the specific question, the document to analyse, the code to review). This separation makes your prompts easier to maintain and test -- you can swap user inputs without touching the system instructions.
Prompting Strategies
Different tasks require different prompting strategies. Here are the most important ones, ordered from simplest to most sophisticated.
Zero-Shot Prompting
What: Give the model a task with no examples. Rely entirely on its pre-trained knowledge.
When to use: Simple, well-defined tasks that the model has seen extensively in training (translation, summarisation, classification).
Example: "Classify this customer email as one of: billing, shipping, returns, product_question, other."
Limitation: The model may interpret the task differently than you intend if the instructions are ambiguous.
Few-Shot Prompting
What: Provide 2-5 input-output examples before the actual task. The model learns the pattern from examples.
When to use: When you need a specific output format, style, or classification scheme that the model would not infer from instructions alone.
Example: "Email: 'My shirt arrived ripped' -> Category: returns. Email: 'Do you ship to Canada?' -> Category: shipping. Email: 'I was charged twice' -> Category: billing. Now classify: 'Can I get this in size XL?'"
Tip: Choose diverse, representative examples. Poor examples teach poor patterns.
Chain-of-Thought (CoT)
What: Ask the model to show its reasoning step by step before giving a final answer.
When to use: Math, logic, multi-step reasoning, complex analysis -- any task where the answer depends on intermediate steps.
Example: "Think through this step by step: If ThreadCo sells 200 shirts/day at $25 each, and the return rate is 8%, what is the net monthly revenue?"
Why it works: Forcing the model to generate intermediate reasoning tokens gives it "working memory" to solve problems it would otherwise get wrong.
Structured Output Prompting
What: Explicitly define the output format using JSON schemas, XML tags, or markdown templates.
When to use: Any time your application needs to parse the model's output programmatically.
Example: "Respond with valid JSON matching this schema: {\"category\": string, \"confidence\": number, \"reasoning\": string}"
Tip: Many APIs now support forced JSON output mode. Use it -- it eliminates parsing failures.
Weak vs Strong Prompts: Worked Examples
Fix my code.
You are a Python expert. The following function raises a KeyError when the input dict is empty. Identify the bug, explain why it occurs, and return a corrected version with an inline comment explaining the fix.
Write me a product description.
You are a copywriter for ThreadCo, a trendy online T-shirt brand targeting 25-35 year olds. Write a product description for a vintage-wash cotton tee in 60-80 words. Tone: playful but not juvenile. Include the fabric (100% organic cotton, 180gsm) and available sizes (S-XXL). End with a call to action.
Summarise this document.
Summarise the following quarterly report in exactly 5 bullet points. Each bullet should be one sentence. Focus on: revenue change, customer growth, key risks, and strategic priorities. Use plain language suitable for a non-financial audience.
Common Prompting Mistakes
| Mistake | Why It Hurts | Fix |
|---|---|---|
| Vague instructions | The model guesses your intent and often guesses wrong | Be specific about task, format, length, and audience |
| Asking for too much at once | Quality degrades when the model juggles many objectives | Break complex tasks into sequential steps |
| No output format specified | Output varies unpredictably between runs, breaking parsers | Define exact format: JSON schema, markdown template, etc. |
| Contradictory instructions | The model cannot satisfy conflicting requirements | Review your prompt for logical consistency before deploying |
| Ignoring the system prompt | Rules, persona, and constraints reset every conversation | Always set a system prompt in production applications |
| No negative constraints | The model does not know what you do NOT want | Add explicit guardrails: "Do not include...", "Never..." |
| Prompt too long | Critical instructions get "lost in the middle" of long contexts | Put the most important instructions at the start and end of the prompt |
Prompt Templates for Common Tasks
These templates follow the five-layer structure. Copy and adapt them for your use cases.
System: You are a customer service agent for [COMPANY]. You are friendly, empathetic, and concise. You can only reference information from the order database provided. If you do not have the information to answer, say "Let me connect you with a team member who can help." Never make promises about refunds or delivery dates that are not confirmed in the data.
User: Customer email: [EMAIL_TEXT]. Order data: [ORDER_JSON]. Write a reply in under 100 words.
System: You are a senior software engineer conducting a code review. Focus on: correctness, security vulnerabilities, performance issues, and readability. Be direct but constructive. Format your review as a numbered list of findings, each with severity (critical/warning/suggestion) and a recommended fix.
User: Review the following [LANGUAGE] code: [CODE_BLOCK]
System: You are a data analyst. When given data, your job is to: (1) identify the key trends, (2) call out any anomalies or outliers, (3) suggest 2-3 actionable next steps. Always show your reasoning. Use plain language -- the audience is business stakeholders, not data scientists.
User: Analyse this data: [DATA_TABLE]. The business question is: [QUESTION]
Advanced Techniques
Self-Consistency
Run the same prompt multiple times (with temperature > 0) and take the majority answer. This is especially effective for reasoning and math tasks. If the model gives the same answer 4 out of 5 times, confidence is high. If answers vary widely, the task may need a better prompt or a more capable model.
Prompt Chaining
Break complex tasks into a pipeline of simpler prompts, where each step's output feeds into the next. Example: Step 1 extracts key facts from a document. Step 2 organises them into categories. Step 3 writes a summary. Each step is easier for the model, and you can inspect intermediate outputs for quality.
Meta-Prompting
Ask the model to write or improve prompts. "I want to classify customer emails into 5 categories. Write me an optimal system prompt for this task, including 3 few-shot examples." Models are often better at writing prompts than humans because they understand their own input format deeply.
XML/Tag Delimiters
Use XML-style tags to clearly separate sections of your prompt. Claude is specifically trained to respect these: <document>...</document>, <instructions>...</instructions>. This prevents the model from confusing your instructions with the data it should process -- critical for preventing prompt injection.
Iterating and Testing Prompts
Prompt engineering is an iterative process. Your first draft is almost never your best prompt. Here is a systematic approach to improvement.
| Step | Action | What to Look For |
|---|---|---|
| 1. Draft | Write your initial prompt using the five-layer structure | Does it cover role, instructions, context, examples, and format? |
| 2. Test (5+ inputs) | Run the prompt with at least 5 diverse test inputs | Are outputs consistently good, or do some inputs produce poor results? |
| 3. Identify failures | Catalogue specific failure modes: wrong format, missing info, hallucinations | Is the failure in the prompt, the model, or the task definition? |
| 4. Refine | Add constraints, examples, or clarifications that address the failure modes | Did the fix solve the problem without introducing new problems? |
| 5. Regression test | Re-run all previous test inputs to verify the change did not break what worked | Are previously good outputs still good? |
| 6. Document | Record the final prompt, test cases, and known limitations | Could a colleague use this prompt effectively without your help? |
A prompt that works perfectly on one example may fail on the next. Always test with diverse inputs: different lengths, different topics, edge cases, adversarial inputs. Production prompts should be tested on at least 20-50 representative inputs before deployment. Treat prompt testing like software testing -- your test suite is only as good as its coverage.
Hands-On Exercises
Choose a task relevant to your work. Write a complete prompt using all five layers (Role, Instructions, Context, Examples, Output Format). Test it against an LLM. Then remove each layer one at a time and observe how the output changes. Which layers have the biggest impact on quality for your task?
Task: Classify 10 customer emails into categories (billing, shipping, returns, product_question, other). First, try zero-shot (just describe the categories). Then try few-shot (add 3 examples). Compare accuracy. How many examples did you need before few-shot consistently outperformed zero-shot?
Ask an LLM this question without CoT: "A store has 125 shirts. They sell 40% on Monday, then receive a shipment of 30 shirts on Tuesday, then sell 25% of their current stock on Wednesday. How many shirts remain?" Record the answer. Now add "Think step by step" and ask again. Did CoT improve accuracy? Write down the intermediate steps the model produced.
Here is a bad prompt: "Write something about our product for social media. Make it good and viral. Use emojis but not too many. Keep it short but include all the details." Identify every problem with this prompt (there are at least five). Rewrite it as a strong prompt using the five-layer structure. Test both versions and compare outputs.
Create three production-quality prompt templates for tasks in your team. For each template: (a) Write the system prompt and user prompt template with clear [PLACEHOLDER] variables. (b) Include 2-3 few-shot examples. (c) Define the expected output format. (d) Test each template with at least 5 different inputs and document the results. Share your library with your team for feedback.