If you've spent any time using ChatGPT, Claude, or Gemini, you've probably run into the word "token" at some point. Maybe you saw it in an error message, maybe you noticed it on a pricing page, or maybe someone mentioned it and you nodded along without really knowing what they meant. That's completely fine, and honestly more common than people admit.
Tokens are one of those concepts that sounds more complicated than it actually is. Once you understand what they are, a lot of things about how AI models work start to make much more sense, including why they sometimes cut off mid-sentence, why longer conversations cost more, and why pasting a huge block of text into a prompt isn't always a great idea.
So let's go through it properly, without the complicated words.
The model doesn't read words, it reads pieces
Here's the thing most people get wrong from the start. When you type something into an AI chatbot, the model doesn't read your message the way you do. It doesn't see words. It sees tokens, which are small chunks of text that can be a full word, part of a word, a punctuation mark, a space, or sometimes just a single character.
The reason for this comes down to how these models were built. Training an AI on every single possible word in every language, including made-up words, technical terms, code, and abbreviations, would be incredibly difficult. Instead, the model learns from a set of tokens that covers the most common patterns in text. When you send a message, that message gets broken down into tokens before the model ever reads it.
A simple example: the word "unhappiness" might be split into "un" and "happiness", or even into three pieces, depending on the model. The word "cat" is probably just one token. The word "supercalifragilistic" is definitely more than one. And a word in a less common language might get broken into many more tokens than the same idea expressed in English, which actually has real cost implications if you're building something that handles multiple languages.
Why does this matter at all?
If you're just using a chatbot casually, it doesn't matter much. But the moment you start thinking about costs, limits, or building something with an AI API, tokens become very important very fast.
Every AI model has what's called a context window, which is basically the maximum number of tokens it can hold in memory at one time. This includes everything: your instructions to the model, the conversation history, and the response it's generating. Once you hit that limit, the model can't see earlier parts of the conversation anymore. It doesn't stop or warn you, it just quietly forgets whatever fell outside the window, which can lead to some confusing behavior if you're not paying attention.
You can see exactly how this plays out across different models using the Context Window Visualizer on Prompt Toolbox, which shows you how much space each model gives you and how fast a real conversation fills it up.
Tokens and money are directly connected
If you're using any AI model through an API, you're paying per token. Not per message, not per minute, per token. Both the tokens you send and the tokens the model writes back count toward your bill, though they're usually priced differently, with the model's responses typically costing more than what you send.
This is why understanding tokens isn't just a technical thing. It's practical. A long set of instructions that you send with every single request adds up fast. A conversation with a lot of history being passed back and forth can double or triple your costs without you noticing.
According to OpenAI's official documentation, one token is roughly four characters of English text, which works out to about three quarters of a word. So a hundred words is somewhere around 130 to 150 tokens, and a thousand words lands around 1,300 to 1,500 tokens. This is a good starting point for estimating, but the only way to know for sure is to actually count them.
Different models count tokens differently
This is something that trips people up. GPT-4o, Claude, and Gemini don't all use the same system for breaking text into tokens. They each have their own approach, which means the same piece of text can produce different token counts depending on which model you're using.
In practice the differences are usually small for plain English text, but they can get bigger with code, with text in other languages, with lots of numbers, or with unusual formatting. If you're switching between models and trying to keep your costs predictable, it's worth checking the token count for each one specifically rather than assuming they'll match.
The part about output tokens people forget
When thinking about how many tokens a request uses, it's easy to focus only on what you're sending. But the model's response also uses tokens, and those count too.
If you ask the model to write a long article, summarize a document, or generate a detailed report, the response can easily be two or three times longer than what you sent. That all adds to your total. This is especially relevant when you're using the API and trying to keep costs under control, because the model's output is usually priced higher than your input.
One practical thing you can do is give the model a clear instruction about response length. Not just "be concise" (models interpret that very loosely) but something more specific like "respond in no more than three paragraphs" or "keep your answer under 150 words." It won't always be perfect, but it makes a real difference most of the time.
Tokens aren't going away
As AI models get more capable, the role of tokens might shift a little. Some models are starting to handle images, audio, and video, and those have their own ways of being processed. But for text-based AI, tokens remain the basic unit of how information gets read and priced, and that's not changing anytime soon.
Understanding tokens won't make you an AI expert overnight, but it gives you a much clearer picture of what's actually happening when you interact with these models. And if you're building something with AI, even something small, that understanding pays off pretty quickly.
