// LAB NOTES
Tokens, Dollars, and Why Long Chats Get Expensive
A lab note on token cost efficiency.
The short version
- An AI charges you by the token. A token is a small chunk of text. Roughly 4 letters, or about three quarters of a word.
- You pay for input (what you send) and output (what you get back). Output costs more.
- Every new message in a chat re-sends the whole chat so far. So a long chat costs more on every turn, and it can also get less accurate.
- Match the model to the job. Use a small, cheap model for simple work. Save the big model for hard problems.
- A clear question with the right context up front beats a long back-and-forth. Good questions save money.
What a token is
A token is a piece of a word. Short common words like "lab" are often one token. Longer words get split into two or three. In English, about 1,000 tokens is about 750 words. A short paragraph is around 100 tokens.
This matters because the model does not count words. It counts tokens. The bill counts tokens too. So when we talk about cost, we talk about tokens.
A rough way to guess: take your word count and multiply by 1.33. That gives you a close-enough token count.
How tokens become dollars
There are two prices, and they are different:
- Input: the text you send (your question, your files, the chat history).
- Output: the text the model sends back.
Prices are listed per 1,000,000 tokens. People write that as "per million." Here is the formula:
cost = (input tokens / 1,000,000 x input price) + (output tokens / 1,000,000 x output price)
Let us do a real one. As of June 2026, Anthropic lists Claude Sonnet at $3 per million input tokens and $15 per million output tokens. (Always check the provider's pricing page for today's number. Prices change.)
Say you send 2,000 tokens and get back 1,000 tokens:
- Input: 2,000 / 1,000,000 x $3 = $0.006
- Output: 1,000 / 1,000,000 x $15 = $0.015
- Total: about $0.021, or 2 cents.
One chat is cheap. The trouble starts when you run a thousand chats, or one very long chat. Small numbers add up fast.
Notice one thing in those prices. Output costs 5 times more than input. That is true across the current models. So long, rambling answers cost more than short ones. If you only need the answer, ask for just the answer.
Why long chats cost more and get worse
This is the part most people miss, so read it twice.
The model has no memory of its own between turns. Each time you send a new message, the model re-reads the entire chat from the top. The whole history goes back in as input, every single time.
So the chat grows like a snowball. Turn 1 might send 300 tokens. By turn 30, you might be sending 15,000 tokens with every message, because the bill now includes everything you said before.
Here is the same question in a short chat and in a long one, using the Sonnet prices above.
Short, fresh chat. Your question is 1,000 tokens. The answer is 500 tokens.
- (1,000 / 1,000,000 x $3) + (500 / 1,000,000 x $15) = $0.003 + $0.0075 = about $0.011 (1 cent).
Long chat. The history is now 15,000 tokens. You add the same 1,000-token question. The answer is still 500 tokens.
- (16,000 / 1,000,000 x $3) + (500 / 1,000,000 x $15) = $0.048 + $0.0075 = about $0.056 (5 to 6 cents).
Same question. About 5 times the cost. And you pay that bloated input on every later turn, not just once.
It is not only about money. A long chat that wanders across many topics also gets less sharp. The important stuff gets buried in old, off-topic text. The model has to split its attention across all of it. It can also get stuck on something you said early on that no longer fits what you need now.
Two problems, one fix. When your goal changes, start a new chat. Carry over only the short summary or the file you actually need. Not the whole transcript.
Rule of thumb: one objective per chat. New objective, new chat. Name the chat after its objective so you can find it later.
Which model for which job
Think in three tiers. The names below are current Claude models, but the idea works anywhere.
- Small and fast (cheapest). Example: a Haiku-class model, about $1 per million input. Good for simple, clear, high-volume work: sorting text, pulling out fields, simple formatting, quick answers.
- Middle (balanced). Example: a Sonnet-class model, about $3 per million input. A good default for most real work.
- Large (priciest). Example: an Opus-class model, about $5 per million input. Save it for the hard stuff: tricky reasoning, messy problems, and answers where a mistake costs more than the tokens.
The move is simple. Start with the cheaper model. Step up only when it fails your own check. Do not reach for the biggest model out of habit. That habit is how bills balloon.
There is a stronger version of this idea. A cheap model with great context often beats an expensive model with poor context. The next section is about that.
Ask a better question, give better context
A vague question gets a vague answer. Then you spend 5 more turns fixing it. Each turn costs tokens and time.
A clear question with the right context can get the answer in one turn. That is cheaper and faster.
Give the model what it needs to win:
- The goal. What are you trying to do.
- The limits. What it must or must not do.
- The format. What you want the answer to look like.
- An example, if you have one.
This up-front context is leverage. A little more input buys a lot more output quality, and usually fewer total turns. That is the trade you want.
But more is not always better. The right context beats the most context. Dump in too much off-topic text and you are back to the long-chat problem: higher cost, lower focus. Aim for relevant, not huge.
More ways to save
- Say how long. "Keep it short" or "just the code, no explanation" cuts output tokens. Output is the pricey part.
- Reuse a big context with caching. Some providers let you cache a long document or a set of instructions so you do not pay full price to send it again every turn. When you reuse the same big context a lot, this can cut cost by most of it.
- Batch the non-urgent jobs. Some providers offer a lower rate (often about half) for work you can wait on.
- Count the cost of a redo. A cheap answer that is wrong, and makes you redo the work, was not cheap. Price the mistakes, not just the tokens.
What it means
- Tokens are money. Input and output both cost. Output costs more.
- Long chats cost more on every turn and can get less accurate. Reset when the goal changes.
- Start with the cheaper model. Step up only when it fails your check.
- Spend your effort on the question and the context, not on more turns.
- Always check today's prices on the provider's page.
Try it yourself
- Find your model's input and output prices, per million tokens.
- Take a recent chat. Guess its tokens: word count x 1.33.
- Put the numbers into the formula: cost = (input / 1,000,000 x input price) + (output / 1,000,000 x output price).
- Now do it twice for the same task: once in a long chat, once in a short fresh chat. Compare the cost. You will usually find the fresh chat is cheaper, and often the answer is just as good or better.
- Bonus: run the same task on a small model and a big model. Can you really tell the difference for that task? If not, the small model wins.