// Cost Report 001 / Token and API Spend

Token and API Cost Report

Neos is an autonomous agent that runs a real back office. It never sleeps, and it rents a frontier model for every task, so it also never stops billing. We tracked that bill to the penny across a stretch of real days. Here is exactly where the money goes, and the plan to take most of it back.

Reference: CSR-001 / Cost Report
Location and date: El Paso, TX / Jun 2026
Scope: Model API spend, the memory upgrade, and the in-house path

Back to case studies

~$2,300 to $3,000

Per month, to the model provider, at the measured run-rate.

62%

Of the bill is cache-write: the agent re-loading the same instructions at every wake up.

~166

Scheduled job fires per weekday, each a fresh cold start.

~$2,000/mo

The savings target once the Token Program is wired in.

01 / Where the bill comes from

The model is cheap to think. It is expensive to re-brief.

Neos runs on Anthropic's Claude Sonnet 4.6 for every task: 30 recurring scheduled jobs plus all interactive chat. The bill is not driven by the model thinking. It is driven by re-loading the agent's roughly 130,000-token briefing on every single job, even when nothing has changed. The expensive part is the back-office automation, not the markets desk.

Measured cost, June 3 to 7

Date	Total tokens	Est. cost	Note
Jun 3 (Tue)	3.3M	$4.35	instance initialized
Jun 4 (Wed)	79.7M	$93.73	migration and build day
Jun 5 (Thu)	131.8M	$141.06	migration and build day
Jun 6 (Sat)	4.0M	$3.71	jobs mostly idle
Jun 7 (Sun, partial)	19.4M	$30.42	partial day
5-day total	238M	$273.49

Run-rate: about $117 per active weekday, roughly $585 a week, roughly $2,300 to $3,000 a month, roughly $28,000 a year.

Data note: logs start June 3, so there is no earlier baseline, and some sessions show missing cost entries, so the real cost is likely higher. All dollar figures are estimates at Sonnet 4.6 list rates ($3 in, $15 out, $0.30 cache-read, $3.75 cache-write per million tokens).

02 / Where every dollar goes

Nearly two-thirds is re-reading, not real work

Cache write$169.94 / 62%

Cache read$56.91 / 21%

Output$46.42 / 17%

Input$0.02 / under 1%

Read this as: nearly two-thirds of the bill is the agent re-caching the same context (its handbook, memory notes, and tool catalogs, about 130,000 tokens) on each of roughly 166 cold starts a day, much of it for jobs that find nothing to do. In billing terms these are cache writes, and they cost far more per word than simply reading a saved copy. That is a configuration cost, and it is fixable.

03 / Which groups cost what

The cost is the bookkeeping fleet, not the trading

There are about 166 fires a weekday. Per-fire cost is dominated by the roughly 130,000-token cold cache plus reads and output. The figures below are estimates allocated by fire-volume and output weight, anchored to the measured totals.

Group	Fires/weekday	Model	Est. $/mo	Share
Back-office automation (email to project board to bookkeeping)	~143	Sonnet 4.6	~$1,860	~80%
Markets desk (alerts plus end of day)	~14	Sonnet 4.6	~$120	~5%
Interactive, dev, nightly	varies	Sonnet 4.6	~$200 to $600	~10 to 20%
End-of-day field photos to dispatch (4 workers)	~4	Sonnet 4.6	~$48	~2%
Misc daily routines	~5	Sonnet 4.6	~$50	~2%

Back-office~$1,860

Interactive, dev~$400

Markets desk~$120

Misc and photos~$98

Key finding: the cost is the bookkeeping fleet, not the trading. The 7 hourly email-to-board parsers alone are about $1,365 a month. Every one runs full-context Sonnet, even to confirm there is no new email.

04 / The memory upgrade, up first then down

The foundation that lets the bill drop

A recent memory upgrade nudged the bill slightly up at first, about $70 a month, because the nightly process promoted new facts into the memory file (now about 21KB), and that bigger context is re-cached on all 166 daily jobs. Light-context is still off on all 30 jobs, so none of the new search is reducing load yet.

Once wired into the Token Program it turns sharply down. With light-context and on-demand memory lookup, a job boots a minimal 20,000 to 30,000-token context and only queries memory when needed, turning the 62% cache-write line from 130K down to about 25K per fire, a 4 to 6 times cut. Plainly: that work did not lower the bill yet, it built the foundation that lets the bill drop.

05 / Before and after, the Token Program

Two levers: cheaper models, and stop re-loading the brain

The fleet plus Token Program is 35 hours at $125, which is $4,375. The 8 agents are already written; this is finishing, wiring, and the savings work. It pulls two levers: route simple jobs to Haiku 4.5, and stop re-loading the whole brain at every wake up (light-context).

Now (all Sonnet, full cache)~$3,000/mo

After Token Program~$1,000/mo

Proposal line	Lever	Effect
Set each agent's model, cheap to top	Email and board triage to Haiku 4.5	~70% off that cluster
Token savings: jobs, caching, context	Attacks the 62% cache-write	130K to 25K per fire
30-day tracking and dashboard	Visibility	find and kill waste
Wire and route the 8-agent fleet	Specialization	min viable model per job

Pays for itself: targets about $1,800 to $2,100 a month saved on about $3,000 a month of usage, roughly a 2-month payback, then about $22,000 to $25,000 a year ongoing. Savings start immediately because the token work is sequenced first.

06 / North star

Our own model, and the API line falls toward zero

The goal is to stop renting intelligence and run our own pre-trained model, so the API cost falls toward zero, with the frontier model kept only when a hard problem genuinely needs it. Today's spend is data acquisition, not burn. Each of the 166 daily runs is a labeled task-to-action trajectory: we are paying to generate our own training set while the assistant does real work.

The pieces already exist. The memory system distills runs into durable facts; the Token Program's model-routing doubles as an easy-versus-hard labeler, exactly the split needed to train the in-house model on the bulk and set the escalation boundary. Sequence: cut waste now (Haiku plus light-context, about $2,000 a month); keep accumulating the corpus; train on the easy bulk; cut over, so the in-house model handles roughly 143 jobs a weekday at near-zero marginal cost and the frontier model is called only on the hard tail.

End state: the roughly $28,000 a year provider line collapses toward self-hosting plus the occasional escalation. Same work, on infrastructure we own.

Estimate, not a bill. Per-group figures are allocations anchored to measured totals; the proposed dashboard makes them exact. Hosting and model costs are paid to vendors directly. Money and outside-sending actions stay behind a human approval, per CSR-001.

Sources

Figures come from the agent's own usage and cost ledger on the live service, its scheduled-job list, context file sizes from the workspace, and its self-reported usage for June 7, 2026. Client identity and project names are withheld; product and business names have been replaced with neutral descriptions.