AI from the Ground Up: LLMs, Tokens, Context, Tools, and Agents – A No-Nonsense Guide

The AI world throws new terms at you every day: LLM, token, context window, prompt, tool, MCP, agent, agent skill. You might have a rough idea what each one means. But do you really understand how they work together?

This post is different. No fluff, no hype. Just the engineering reality behind today’s AI systems. By the end, you’ll know exactly why a large language model spits out answers one word at a time – and how modern agents actually get things done.


1. The Large Language Model (LLM): A Fancy Word Guessing Game

The LLM (Large Language Model) is the core engine behind almost every AI application you see today. Most major models – GPT, Claude, Gemini – share the same underlying architecture: Transformer.

Google’s team introduced Transformer in 2017 (paper: Attention Is All You Need). But OpenAI made it famous:

  • Late 2022: GPT-3.5 arrived – the first truly usable large model.
  • March 2023: GPT-4 pushed the ceiling much higher.

Today GPT models are still the benchmark, but Claude, Gemini, and others compete strongly in specific areas.

So how does an LLM actually work?

The answer is almost embarrassingly simple: it’s a text completion game.

Ask: “How good is Mark’s video?”
The model does this:

  1. Predicts the next most likely word: “very”
  2. Appends “very” to the input, then predicts again: “good”
  3. Continues: “.”
  4. Final answer: “Very good.”

That’s why models generate answers one piece at a time. They don’t plan the whole sentence in advance. They just keep guessing the next token based on everything they’ve written so far.

Once you understand this, many odd behaviors (repetition, drifting off topic) make sense – they’re just probability glitches in the guessing game.


2. Tokens and the Tokenizer: The Translator

An LLM is a mathematical function. It only understands numbers, not words. So how does human language get in?

Enter the Tokenizer – a two-way translator:

  • Encoding: splits text into tiny pieces (tokens) and maps each token to a numeric ID.
  • Decoding: turns numeric IDs back into readable text.

Token ≠ Word

This trips up a lot of people. Tokens are not the same as words.

Examples:

  • Chinese: “程序员” (programmer) might split into “程序” + “员” (two tokens).
  • English: “helpful” → “help” + “ful” (two tokens).
  • Some special characters need 3 tokens.

So when you see “100k token limit”, don’t think “100k words”.

Rough conversion (real-world estimates)

Unit Approx. tokens
1 English word 1.3 tokens
1 Chinese character 0.5–0.7 tokens
400k tokens ~600k–800k Chinese characters (a thick book)

Token counts matter because most pricing and context limits are based on them.


3. Context and the Context Window: The Model’s Temporary Memory

Context = everything the model sees when processing a task. It includes:

  • Your current question
  • Previous conversation turns
  • The tokens it’s currently generating
  • Available tools
  • System prompt (invisible to you)

Context window = the maximum number of tokens that can fit into that memory.

Current model capacities

Model Context Window
GPT-4.5 1.05 million tokens
Claude 3.1 Pro 1 million tokens
Cloudopus 4.6 1 million tokens

1 million tokens ≈ 1.5 million Chinese characters. That’s enough to hold the entire Harry Potter series (in Chinese translation).

Real-world problem: what if your document is even larger?

Think of a 5,000-page product manual. You can’t stuff it all into the context window. The solution: RAG (Retrieval-Augmented Generation) .

  1. From the full document, extract only the fragments most relevant to your question.
  2. Send just those fragments to the model.
  3. The model answers based on that relevant slice.

RAG breaks through the context window limit and keeps costs under control.


4. Prompt Engineering: How to Give Instructions

A prompt is the question or instruction you give the model. There are two types:

  • User prompt: what you type (e.g., “write a poem”)
  • System prompt: rules and personality set by the developer (invisible to the user)

Fuzzy vs. precise prompt

Fuzzy: “Write me a poem.”
→ Could be modern, classical, a haiku, or even lyrics.

Precise: “Write a five-character quatrain about autumn leaves. Keep the tone bright and uplifting.”
→ The output will match the exact form, topic, and tone.

The power of a System prompt

Set a system prompt: “You are a patient math teacher. Never give the answer directly. Guide the student to think step by step.”

Student asks: “What is 3+5?”
Model does not say “8”. Instead:

“Let’s think: You have 3 apples. You get 5 more. How many apples do you have now? You can count them.”

This lets the model act as a tutor, customer service agent, scriptwriter, code reviewer – just by swapping the system prompt.

Why isn’t “Prompt Engineer” a hot job anymore?

A few years ago it was trending. Now? Less so. Two reasons:

  • Low barrier: it’s mostly about “saying what you want clearly”.
  • Models got smarter: even a vague prompt often works because the model infers your intent.

So don’t chase fancy prompt patterns. Just describe your need clearly.


5. Tools and MCP: Letting the Model See the Outside World

LLMs have a fatal flaw: they cannot perceive the outside world on their own.

Ask “What’s the weather in Shanghai today?”
It will honestly say: “Sorry, I can’t get real-time weather.”

How a Tool fixes that

A tool is simply a function: input → do something → return result.

Example – weather tool:

  • Input: city + date
  • Action: call a weather API
  • Output: temperature, humidity, rain chance

The full tool-calling flow

Three players: the model, the tool, and the platform (the app or interface).

  1. User sends a question to the platform.
  2. Platform sends the question plus a list of available tools to the model.
  3. Model decides which tool to use and generates a tool-call instruction.
  4. Platform executes that tool.
  5. Tool returns the result to the platform.
  6. Platform sends the result back to the model.
  7. Model turns that result into a natural-language answer for the user.

Division of labor:

  • Model: “brains” – chooses tools, synthesizes answers.
  • Tool: “hands” – does the actual work.
  • Platform: “connector” – moves messages around.

The headache: different platforms, different standards

  • ChatGPT requires OpenAI’s tool spec.
  • Claude requires Anthropic’s spec.
  • Gemini requires Google’s spec.

The same tool needs three separate implementations. That hurts.

MCP: The universal plug

MCP (Model Context Protocol) is a unified standard for tools. Think of it as USB-C for AI.

Write your tool once using MCP – and it works on every platform that supports MCP. No more re‑implementing for each vendor.


6. Agents and Agent Skills: Autonomous Workers

An Agent is more than a chatbot. It can:

  • Plan – decide what steps to take
  • Use tools – call external functions
  • Keep working until the task is done

A typical agent task

User asks: “What’s the weather like here today? And find nearby stores selling umbrellas.”

The agent’s workflow:

  1. Call location tool → get latitude/longitude.
  2. Call weather tool → check if it’s raining.
  3. Sees “rain” → call a nearby‑store search tool for umbrella shops.
  4. Combine everything into a single answer.

You don’t have to tell it which tool to use first. The agent figures it out.

Another pain point: repeating your personal rules every time

Suppose you want a “going out” assistant. Your rules:

  • Rain → bring umbrella
  • Strong UV → wear a hat
  • Bad air → wear a mask
  • Output format: summary first, then bullet list with reasons

Without Agent Skills, you’d have to repeat these rules every single time. Annoying.

Agent Skill: a pre‑written instruction manual

An Agent Skill is a Markdown file – a document the agent reads to know exactly how to handle a specific type of task.

It contains:

  • Metadata layer: name + short description (so the agent knows when to use it)
  • Instruction layer: goal, steps, decision rules, output format, examples

How to create an Agent Skill

  1. Create a .cloudskills folder.
  2. Inside it, create a subfolder – the folder name is your skill’s name.
  3. Inside that subfolder, create a SKILL.md file (all caps).
  4. Write your instructions (target, steps, rules, format, examples).
  5. The agent will automatically load and apply this skill when it matches the situation.

Once configured, you just say “I’m going out – what should I bring?”
The agent automatically:

  • Calls location tool
  • Calls weather tool
  • Applies your rules (rain → umbrella, etc.)
  • Formats the answer exactly as you specified

7. The Big Picture – All Concepts at a Glance

Concept One‑sentence explanation
LLM The core text‑generation engine (the guessing game)
Token Smallest data unit; tokenizer translates text ↔ numbers
Context Everything the model sees for the current task
Context window Maximum tokens the context can hold
Prompt Your instruction (user prompt) or hidden rules (system prompt)
Tool A function that lets the model act on the outside world
MCP Universal standard for tools – write once, run anywhere
Agent System that plans, uses tools, and works until done
Agent Skill Pre‑written instruction manual for the agent

8. Frequently Asked Questions (FAQ)

Does the model really “understand” what I say?
No – not in the human sense. It plays a probability game: given the text so far, guess the next token. But because it’s trained on massive data, that guess is often useful and looks like understanding.

Is a bigger context window always better?
Usually yes, but bigger means higher cost and slower responses. Choose the smallest window that comfortably fits your task.

RAG vs. larger context window – which one wins?
They’re different tools. A larger window lets you dump in raw data. RAG retrieves only what’s relevant. You can combine both: use RAG to cut costs and speed up responses, even when the window is large.

Is prompt engineering dead?
Not dead, but the bar has dropped. Most people already know the core rule: be clear and specific. No need for esoteric “spells”.

Tool vs. MCP – what’s the difference?
A tool is the actual function (e.g., “get weather”). MCP is a standard that says how to write and call tools. With MCP, you write once and use everywhere.

Agent vs. regular chat model – what’s the difference?
A chat model replies to one message at a time. An agent plans, uses multiple tools, and persists until the whole job is finished. It’s more like a junior assistant than a conversational partner.

Do I need to code to create an Agent Skill?
No. A skill is a Markdown file written in plain language. You describe the goal, steps, rules, and output format. The agent reads that file and follows it.


Final Thoughts

Once you truly understand LLM, token, context, prompt, tool, MCP, agent, and agent skill – the constant stream of new AI products and buzzwords stops being mysterious.

Smart coding assistants, automated support systems, AI agents – they all run on the same set of underlying principles. Technology moves fast, but the fundamentals are stable. Learn this framework, and you’ll be equipped to understand whatever comes next.