Large Language Models (LLMs) as Agent Brains
LLMs are the reasoning engine behind modern AI agents. Understanding how they work is essential for building effective agents.
What Are LLMs?
Large Language Models are neural networks trained on massive text datasets to predict the next token in a sequence.
- Scale: Billions of parameters (GPT-4: ~1.7 trillion, Claude: undisclosed)
- Training: Self-supervised on internet text, then fine-tuned with RLHF
- Capability: Generate text, reason, follow instructions, use tools
How LLMs Work (High-Level)
Input text → Tokenize → Embeddings → Transformer layers → Next token prediction
↓
Self-attention mechanism
(relates every token to every other token)
- Tokenization: Text split into subword tokens (~4 chars per token)
- Embedding: Tokens converted to vectors
- Attention: Each token attends to all others (context understanding)
- Generation: Predict next token, append, repeat (autoregressive)
Prompt Engineering Fundamentals
How you communicate with an LLM determines agent behavior:
system_prompt = """You are a helpful customer support agent for TechCorp.
Rules:
- Be polite and professional
- If you don't know the answer, say so
- Never make up information
- Escalate billing issues to human agents"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "I can't log into my account"}
]
Key techniques:
- System prompts: Define agent persona and rules
- Few-shot examples: Show desired behavior
- Chain-of-thought: Ask the model to reason step by step
- Output formatting: Request structured responses (JSON, lists)
Token Limits and Context Windows
- Context window: Maximum tokens the model can process at once
- GPT-4: 128K tokens (~300 pages)
- Claude: 200K tokens (~500 pages)
- Implication for agents: Must manage context carefully
API-Based vs. Local Models
| Aspect | API-Based (OpenAI, Anthropic) | Local (Llama, Mistral) |
| Quality | Highest | Good, improving fast |
| Cost | Per-token pricing | Hardware cost only |
| Privacy | Data sent to provider | Stays on your machine |
| Latency | Network dependent | Can be faster |
| Control | Limited | Full control |
Key Takeaways
- LLMs predict next tokens using transformer architecture
- Prompt engineering is how you program agent behavior
- Context windows limit how much information an agent can process at once
- Choose between API models (quality) and local models (privacy/cost)
- Understanding these tradeoffs is critical for agent design