CELAYA SOLUTIONS RESEARCHLAB NOTES

// LAB NOTES

Tokens, Dollars, and Why Long Chats Get Expensive

A lab note on token cost efficiency.

The short version

What a token is

A token is a piece of a word. Short common words like "lab" are often one token. Longer words get split into two or three. In English, about 1,000 tokens is about 750 words. A short paragraph is around 100 tokens.

This matters because the model does not count words. It counts tokens. The bill counts tokens too. So when we talk about cost, we talk about tokens.

A rough way to guess: take your word count and multiply by 1.33. That gives you a close-enough token count.

How tokens become dollars

There are two prices, and they are different:

Prices are listed per 1,000,000 tokens. People write that as "per million." Here is the formula:

cost = (input tokens / 1,000,000 x input price) + (output tokens / 1,000,000 x output price)

Let us do a real one. As of June 2026, Anthropic lists Claude Sonnet at $3 per million input tokens and $15 per million output tokens. (Always check the provider's pricing page for today's number. Prices change.)

Say you send 2,000 tokens and get back 1,000 tokens:

One chat is cheap. The trouble starts when you run a thousand chats, or one very long chat. Small numbers add up fast.

Notice one thing in those prices. Output costs 5 times more than input. That is true across the current models. So long, rambling answers cost more than short ones. If you only need the answer, ask for just the answer.

Why long chats cost more and get worse

This is the part most people miss, so read it twice.

The model has no memory of its own between turns. Each time you send a new message, the model re-reads the entire chat from the top. The whole history goes back in as input, every single time.

So the chat grows like a snowball. Turn 1 might send 300 tokens. By turn 30, you might be sending 15,000 tokens with every message, because the bill now includes everything you said before.

Here is the same question in a short chat and in a long one, using the Sonnet prices above.

Short, fresh chat. Your question is 1,000 tokens. The answer is 500 tokens.

Long chat. The history is now 15,000 tokens. You add the same 1,000-token question. The answer is still 500 tokens.

Same question. About 5 times the cost. And you pay that bloated input on every later turn, not just once.

It is not only about money. A long chat that wanders across many topics also gets less sharp. The important stuff gets buried in old, off-topic text. The model has to split its attention across all of it. It can also get stuck on something you said early on that no longer fits what you need now.

Two problems, one fix. When your goal changes, start a new chat. Carry over only the short summary or the file you actually need. Not the whole transcript.

Rule of thumb: one objective per chat. New objective, new chat. Name the chat after its objective so you can find it later.

Which model for which job

Think in three tiers. The names below are current Claude models, but the idea works anywhere.

The move is simple. Start with the cheaper model. Step up only when it fails your own check. Do not reach for the biggest model out of habit. That habit is how bills balloon.

There is a stronger version of this idea. A cheap model with great context often beats an expensive model with poor context. The next section is about that.

Ask a better question, give better context

A vague question gets a vague answer. Then you spend 5 more turns fixing it. Each turn costs tokens and time.

A clear question with the right context can get the answer in one turn. That is cheaper and faster.

Give the model what it needs to win:

This up-front context is leverage. A little more input buys a lot more output quality, and usually fewer total turns. That is the trade you want.

But more is not always better. The right context beats the most context. Dump in too much off-topic text and you are back to the long-chat problem: higher cost, lower focus. Aim for relevant, not huge.

More ways to save

What it means

Try it yourself

  1. Find your model's input and output prices, per million tokens.
  2. Take a recent chat. Guess its tokens: word count x 1.33.
  3. Put the numbers into the formula: cost = (input / 1,000,000 x input price) + (output / 1,000,000 x output price).
  4. Now do it twice for the same task: once in a long chat, once in a short fresh chat. Compare the cost. You will usually find the fresh chat is cheaper, and often the answer is just as good or better.
  5. Bonus: run the same task on a small model and a big model. Can you really tell the difference for that task? If not, the small model wins.
CELAYA SOLUTIONS RESEARCH / INTERNALLAB NOTES