How do ChatGPT and other LLMs work?
- Will Tombs
- 2 hours ago
- 9 min read
Contents
Choose explanation depth:
Introduction
Users send 2.5 billion prompts into ChatGPT every day, and many of those questions relate directly to products, services, and businesses like yours. If customers are already asking AI for recommendations, explanations, and comparisons, the real question becomes: will your business appear in those answers?
To understand how that happens, it helps to start with the basics: what is ChatGPT, and how does it work?
What is ChatGPT?
[Couldn’t be simpler than this!]

Chat - This means the model is designed to talk with you. It understands your messages, keeps track of the conversation, and replies naturally.
GPT - GPT stands for Generative Pre-trained Transformer. Let us break it down for you in the simplest terms:
Generative: It can generate new text. First, you give a prompt. Then, it generates a response from scratch, like writing, solving, explaining, or summarising.
Pre-trained - Before you use it, it has already learnt from a huge amount of text. This learning (actually called training) teaches it: language patterns, facts, and reasoning styles. So it knows how to respond without learning from you in real time.
Transformer - This is the type of AI architecture the model is built on. Think of it as the “engine” that powers how it reads, understands, and writes text.
How ChatGPT and LLMs work: Beginner - explained in 1 minute

You ask a question - You type a message just like you would to a colleague.
ChatGPT reads your message - It examines every word and determines what you’re asking and what matters most.
It uses its training and live web search to create answers - ChatGPT learns from large amounts of publicly available text, which helps it understand language patterns and common questions. Humans then improve it using reinforcement learning, showing better and safer answers. When needed, it can also use live web search to access up-to-date information.
It predicts the best possible response - It generates a fresh answer based on what it has learned, similar to how a well-trained employee uses past experience to respond.
It replies in clear, human-like language - Because it understands context, it can write explanations, emails, reports, ideas, summaries, and plans.
In short:
ChatGPT reads your message → understands it → draws on past training → writes the most helpful answer it can.
Wasn’t that simple? At Buried, we simplify complex topics so they’re easy to understand and act on. Learn more here.
How ChatGPT & LLMs work: Intermediate - explained in 3 minutes
Large Language Models (LLMs) like ChatGPT produce answers using two main sources:
Training data
Live web search (only in models connected to the internet)
Most questions fall into one of these categories, and the model decides which source to use. But LLMs can also use a blend of training data and live web search to create an answer.
1. Training data: How LLMs answer general knowledge questions
What is training data?
Training data is a huge collection of text that the model learned from before you ever use it. This is taken from books, websites, articles, and other publicly available texts.
When will an LLM rely on training data?
If you ask: “How many days are in a year?” ChatGPT does not search the internet. It already knows the answer because it has seen this information many times in its training.
How it works
The model retrieves relevant patterns from what it learned.
It checks those patterns internally for consistency (a step known as grounding or fact-checking).
It produces an answer without citations, because it did not fetch it from a specific source.

2. Live web search: How LLMs answer real-time or location-based questions
Some questions require fresh, up-to-date information.
Example: “Music events in London this weekend.”
In this case, the model cannot rely on training data.
What happens instead
The model searches the live web.
It retrieves multiple sources.
It grounds the information to ensure accuracy.
It creates a response with citations (because the content came from specific pages).

3. Blending training data & live web search
In some cases, LLMs combine what they already know with web search results. They use training data to understand the topic and structure the answer, then pull live information to update facts, prices, dates, or locations. This creates responses that are both accurate and up to date.
Example: “Is it raining in London today?”
The model already knows what rain is and where London is from its training. But it can’t rely on old data for today’s weather.
What happens
The model uses training data for context
It checks the live web for today’s weather
It confirms the details
It gives a clear, up-to-date answer
In summary
ChatGPT and LLMs create responses by:
Answering general knowledge questions through training data
Answering real-time or location-based questions by searching the web as needed
Blending both methods when the answer requires both context and up-to-date information
How ChatGPT & large language models work: Advanced - explained in 5 minutes

Large Language Models (LLMs) like ChatGPT turn your text into numbers, analyse relationships between those numbers, and then predict the most likely next words until a full response is generated. This process happens in a series of well-defined steps.
In this section, we will break down the steps involved.
Step 1: You enter a prompt
Example: “What is AI?”
This is the starting point. The system reads your sentence exactly as you typed it.
Your input then moves into a structured processing pipeline.
Step 2: Break text into tokens (Tokenisation)
LLMs cannot process whole sentences directly. So your text is broken into tokens, which are small text units.
Example: “What is AI?” becomes:
“What”
“is”
“AI”
“?”
These aren’t always full words. Sometimes they’re parts of words, depending on how common they are in the training data.
Tokenisation is just chopping language into small chunks so the model can handle it properly — the same way you’d break something down before rebuilding it.
Step 3: Convert tokens into embeddings
LLMs do not understand words. They understand numbers. Therefore, each token is converted into a long list of numbers called an embedding.
Embedding = a numerical snapshot of the token’s meaning and relationships.
Example: “AI” might map to something like: [1.2, -0.5, 0.8, 2.1, ...]
These numbers capture:
Context
Semantic meaning
Relationships to other words
Why embeddings matter
They allow the model to understand similarity. For example, “car” and “vehicle” have embeddings close to each other.
Step 4: Run embeddings through transformer layers (Deep processing)
This is the heart of how ChatGPT works. A transformer contains multiple layers (sometimes dozens or hundreds). At each layer, two key things happen:
1. Attention
Attention determines which parts of your input matter most for the next step.
If you ask: “What is AI used for in business?” The model needs to pay attention to:
“AI”
“used for”
“business”
not every other filler word.
Attention lets the model assign “importance scores” so the right words influence the final answer.
Think of attention as the model’s spotlight. It highlights the most relevant parts of your prompt.
2. Deep contextual understanding
As your embeddings move through each layer, the model learns:
Relationships
Context
Tone
Intent
Structure
Each layer refines its understanding. Lower layers focus on simple patterns; higher layers handle more complex meaning.
A simple analogy would be - reading a sentence multiple times. Each pass gives you deeper insight.
Step 5: Predict the next token
Once the model has processed your prompt through all layers, it must generate a response. It does so one token at a time.
How it chooses the next token
The model calculates probabilities for every possible next token.
For example, after: “Artificial intelligence is a branch of…”
The model may think:
“computer” → 0.45
“science” → 0.15
“AI” → 0.03
others → lower probabilities
It picks the most suitable option. Each new token is then fed back into the model to predict the next one. This repeats until the answer is complete.
Step 6: Output the final response
Once enough tokens are generated, the model stops and presents the full answer to you. From your perspective, it looks seamless. Behind the scenes, thousands to millions of calculations have happened in milliseconds.
Putting it all together - The full flow
You type a message.
The text is split into tokens.
Tokens are converted into numerical meanings (embeddings).
Those embeddings pass through multiple transformer layers.
The system predicts one token at a time.
The loop continues until the response is complete.
You get a clear, natural-language answer.
How LLMs learn [LLM training process explained]
LLMs learn in several stages, each improving how the model understands language and produces useful answers.
1. Pre-training
This is the foundation. The model is trained on massive amounts of publicly available text to learn patterns in language, such as how sentences flow, how questions are asked, and how ideas connect.
It doesn’t learn facts in a database sense; it learns patterns that help it predict text.
2. SFT (Supervised Fine-Tuning)
Human experts then provide example questions and high-quality answers. The model studies these pairs to learn how to respond in a clearer, more helpful tone.
This is where the model first learns to behave more like a chatbot.
3. RLHF (Reinforcement Learning from Human Feedback)
Here, humans rank multiple AI-generated answers. The model learns which responses are safer, more accurate, and more aligned with human expectations.
This process greatly improves the usefulness and reduces harmful outputs.
4. RAG (Retrieval-Augmented Generation)
In this optional step, the model retrieves information from external sources, like documents or live web search, before generating a response. It keeps answers fresh, factual, and grounded.
We at Buried have discussed training data and RAG in our video embedded in the article GEO vs SEO - here’s a glimpse of it. While we suggest you watch the entire video to understand the relevant concept of GEO, you can also skip to 3:51 minutes for Will’s expert insight on training data.
How does ChatGPT's memory work?
ChatGPT doesn’t “remember” information the way humans do. Instead, it works with a context window, a temporary workspace that stores the conversation so far.
Every time you send a message, your entire dialogue is fed back into the model. ChatGPT then analyses this context to decide what is relevant, what to reference, and how to produce its next response.
This means memory is not long-term. Once information falls outside the context window, the model cannot access it.
It is similar to a whiteboard: useful for real-time collaboration, but wiped clean once space runs out.
Do all LLMs work like ChatGPT? (And how they compare)
Yes, most modern LLMs, such as Google Gemini, Anthropic Claude, and Perplexity AI, follow the same basic paradigm as ChatGPT. They are built on transformer-based, generative large language models that convert text into tokens, process them through deep layers, and generate responses.
Like ChatGPT, they learn from huge datasets during training and can be fine-tuned with retrieval mechanisms such as Retrieval-Augmented Generation (RAG) for up-to-date factual answers.
However, they differ in focus, design priorities, and how they handle real-time data.
ChatGPT is optimised for broad conversation and content creation. Its deep research, shopping research, and web search functionalities help ensure accuracy through real-time data.
Gemini, on the other hand, emphasises multimodal inputs (text, images, code) and deep integration with Google products. You can expect stunning illustrations or images generated through its Nano Banana Model.
Claude prioritises safe, nuanced reasoning and long contexts. You can upload a file for context and choose from its different modes for everyday or complex tasks to get responses to your prompts.
Finally, Perplexity blends AI chat with real-time search and citations to behave more like a research assistant than a standalone conversational model.
LLM comparison: ChatGPT, Google Gemini, Claude & Perplexity
Model | Core Strengths | Typical Use Cases | Real-Time Data / Search | Notes |
ChatGPT (OpenAI) | Versatile generative text, conversational depth | Creative writing, strategy, dialogue | Optional - through RAG or browsing (web search available) | Best general-purpose assistant |
Google Gemini | Multimodal (text + images), strong image creation, great reasoning | Mixed media tasks, coding, complex queries | Advanced thinking mode for deeper research | Excellent with multimodal and Google tools |
Claude (Anthropic) | Nuanced reasoning and long context windows | Analytical work, business logic | Web search and extended thinking modes | Focus on aligned, careful responses |
Perplexity AI | Search-like accuracy with citations | Research, factual queries, summaries | Deep research and pro search | Combines retrieval + generation |
How ChatGPT and LLMs differ from traditional search engines
ChatGPT is a generative model, while Google is a retrieval engine.
Both systems answer queries, but they operate on completely different mechanisms. LLMs generate responses based on training data and, when enabled, live web searches. Traditional search engines retrieve results by crawling, indexing, and ranking pages according to hundreds of algorithmic signals.
The comparison table below breaks down these differences across four essential layers: keyword behaviour, data sources, how each system processes a query, and how organic search optimisation influences visibility.

LLMs answer questions using what they’ve learned and sometimes live web results. Google works by indexing websites and ranking them, deciding which pages to show based on technical setup, content relevance, and authority.
LLMs can generate direct answers from what they’ve learned, sometimes without citations, or - when connected to live search - by fanning out queries and referencing high-authority sources. Google, however, doesn’t generate answers at all; it only retrieves and ranks pages from its indexed websites.
Understanding this distinction is crucial for modern SEO and GEO strategies. As AI-search evolves, brands must optimise not only for Google’s algorithm, but also for how LLMs interpret, retrieve, and cite information.
*Query fan-out is when an AI sends your question to multiple web sources simultaneously to gather fresh information before generating an answer.
How to gain brand visibility in ChatGPT & other LLMs
To get started, you should give these two useful resources a thorough read:
Reach out to the Buried team today. We’d love to help optimise your website through our industry-leading GEO and SEO services and take your business visibility to the next level.
