Section 01

1 of 5

How AI Actually Works

LLMs, tokens, weights, and all that stuff, explained without a computer science degree.

Before you start building with AI tools, it helps to understand what's actually happening under the hood. Not because you need to be an expert, but because a mental model makes you a better prompter and a better debugger when things go sideways.

Let's start at the very beginning and build up from there.

🤖Set the stage
Orientation prompt

I'm just getting started learning about AI and large language models. Can you give me a simple, friendly overview of how a modern LLM like you actually works? Skip the math and focus on intuition and analogies. What is it actually doing when I send you a message?

👆 Copy this prompt and paste it into Claude, ChatGPT, or Gemini — it will explain everything in as much detail as you want.

What Is an LLM?

LLM stands for Large Language Model. At the most basic level, it's a system trained to predict: “given everything I've seen so far, what word (token) comes next?”

That sounds almost too simple, right? But it turns out that if you train a model on enough text (like, essentially most of the internet) and you make it large enough (hundreds of billions of parameters), it develops an uncanny ability to reason, summarize, translate, write code, and hold conversations. All from next-token prediction.

🎲

The core idea: next-token prediction

Imagine autocomplete on your phone, but trained on hundreds of billions of words. The model has learned the statistical patterns of human language so deeply that it can generate coherent paragraphs, solve math problems, and write working code, all by continuously predicting the most plausible next word.

🤖Go deeper on LLMs
Deep-dive prompt

Can you explain how a large language model is trained? What is a neural network, what are "parameters" or "weights", and how does a model learn from a massive dataset? Use simple analogies. I don't have a math background.

👆 Copy this prompt and paste it into Claude, ChatGPT, or Gemini — it will explain everything in as much detail as you want.

What Is a Token?

Models don't read text the way humans do. They break everything into chunks called tokens. A token is roughly 3–4 characters, or about ¾ of a word on average. The word “hamburger” might be one or two tokens. “unhappiness” might be three.

🪙

Why tokens matter

Every AI model has a context window, the maximum number of tokens it can see and “remember” in a single conversation. Claude's context window is 200,000 tokens (about 150,000 words). GPT-4o's is around 128,000.

Tokens also determine cost. Most AI APIs charge per token, typically a few cents per million tokens. Longer prompts = more tokens = higher cost.

Token approximations:

1 token
~4 characters
100 tokens
~75 words
1,000 tokens
~750 words
1M tokens
~750,000 words
🤖Understand tokens
Deep-dive prompt

Can you explain what tokens are in the context of LLMs? How does tokenization work? Why do some words take more tokens than others? And how does the context window limit affect what a model can "remember" during a conversation?

👆 Copy this prompt and paste it into Claude, ChatGPT, or Gemini — it will explain everything in as much detail as you want.

How Does It Actually Generate a Response?

When you send a message, here's (roughly) what happens:

  1. 1
    Tokenize: Your message is broken into tokens.
  2. 2
    Attend: The model looks at all tokens (including the entire conversation history) and weighs how they relate to each other. This is the famous "attention" mechanism.
  3. 3
    Predict: The model computes a probability distribution over every possible next token (all ~100,000 of them).
  4. 4
    Sample: It picks one token based on that distribution. Then it repeats, using the new token as input, until it decides it's done.

Temperature: How Random Is the Model?

Remember that probability distribution over the next token? Temperature controls how “peaked” or “flat” that distribution is.

🎯
Precise
Low (0.1–0.3)
Almost always picks the highest-probability token. Deterministic, consistent. Good for code, facts, JSON.
⚖️
Balanced
Medium (0.7)
Some randomness. Reads naturally. Good for writing and conversation. The typical default.
🎨
Creative
High (1.0+)
Lots of randomness. More surprising, but can go off the rails. Good for brainstorming.
🤖Temperature & sampling
Deep-dive prompt

Explain temperature in LLMs to me like I'm a curious beginner. What does it actually control mathematically? What's the difference between temperature, top-p, and top-k sampling? When would I want to change these settings?

👆 Copy this prompt and paste it into Claude, ChatGPT, or Gemini — it will explain everything in as much detail as you want.

The Big Models You'll Encounter

🤖

Claude (Anthropic)

Claude 4 Sonnet / Opus

Long context, reasoning, code, safe

🤖

GPT (OpenAI)

GPT-4o / o1 / o3

Broad capability, huge ecosystem, images

🤖

Gemini (Google)

Gemini 1.5 Pro / 2.0 Flash

Very long context (1M+), multimodal

🤖

Open Source

Llama 3, Mistral, Qwen

Run locally, no API costs, full control

🤖Compare the models
Research prompt

What are the key differences between Claude, GPT-4, and Gemini as of 2024–2025? How do they differ in context window size, reasoning ability, coding ability, and price? Help me understand when I'd choose one over another.

👆 Copy this prompt and paste it into Claude, ChatGPT, or Gemini — it will explain everything in as much detail as you want.

What LLMs Can't Do (Yet)

🤖LLM limitations
Critical thinking prompt

What are the most important limitations of current LLMs that a developer should know? I want to understand hallucinations, context limits, reasoning gaps, and when NOT to trust an AI's output. Give me practical examples of where AI goes wrong.

👆 Copy this prompt and paste it into Claude, ChatGPT, or Gemini — it will explain everything in as much detail as you want.