Lesson 2.4~15 min

Large Language Models (LLMs)

Module 2: Fundamentals of AI and Machine Learning

Large Language Models (LLMs) as Agent Brains

LLMs are the reasoning engine behind modern AI agents. Understanding how they work is essential for building effective agents.

What Are LLMs?

Large Language Models are neural networks trained on massive text datasets to predict the next token in a sequence.

  • Scale: Billions of parameters (GPT-4: ~1.7 trillion, Claude: undisclosed)
  • Training: Self-supervised on internet text, then fine-tuned with RLHF
  • Capability: Generate text, reason, follow instructions, use tools

How LLMs Work (High-Level)

Input text → Tokenize → Embeddings → Transformer layers → Next token prediction

Self-attention mechanism

(relates every token to every other token)

  1. Tokenization: Text split into subword tokens (~4 chars per token)
  2. Embedding: Tokens converted to vectors
  3. Attention: Each token attends to all others (context understanding)
  4. Generation: Predict next token, append, repeat (autoregressive)

Prompt Engineering Fundamentals

How you communicate with an LLM determines agent behavior:

system_prompt = """You are a helpful customer support agent for TechCorp.

Rules:

  • Be polite and professional
  • If you don't know the answer, say so
  • Never make up information
  • Escalate billing issues to human agents"""

messages = [

{"role": "system", "content": system_prompt},

{"role": "user", "content": "I can't log into my account"}

]

Key techniques:

  • System prompts: Define agent persona and rules
  • Few-shot examples: Show desired behavior
  • Chain-of-thought: Ask the model to reason step by step
  • Output formatting: Request structured responses (JSON, lists)

Token Limits and Context Windows

  • Context window: Maximum tokens the model can process at once
  • GPT-4: 128K tokens (~300 pages)
  • Claude: 200K tokens (~500 pages)
  • Implication for agents: Must manage context carefully

API-Based vs. Local Models

AspectAPI-Based (OpenAI, Anthropic)Local (Llama, Mistral)
QualityHighestGood, improving fast
CostPer-token pricingHardware cost only
PrivacyData sent to providerStays on your machine
LatencyNetwork dependentCan be faster
ControlLimitedFull control

Key Takeaways

  • LLMs predict next tokens using transformer architecture
  • Prompt engineering is how you program agent behavior
  • Context windows limit how much information an agent can process at once
  • Choose between API models (quality) and local models (privacy/cost)
  • Understanding these tradeoffs is critical for agent design

Test Your Knowledge

5 randomized questions from a pool of 10. Pass with 60% to unlock the next lesson.