// Cost Report 001 / Token and API Spend
Token and API Cost Report
Neos is an autonomous agent that runs a real back office. It never sleeps, and it rents a frontier model for every task, so it also never stops billing. We tracked that bill to the penny across a stretch of real days. Here is exactly where the money goes, and the plan to take most of it back.
01 / Where the bill comes from
The model is cheap to think. It is expensive to re-brief.
Neos runs on Anthropic's Claude Sonnet 4.6 for every task: 30 recurring scheduled jobs plus all interactive chat. The bill is not driven by the model thinking. It is driven by re-loading the agent's roughly 130,000-token briefing on every single job, even when nothing has changed. The expensive part is the back-office automation, not the markets desk.
| Date | Total tokens | Est. cost | Note |
|---|---|---|---|
| Jun 3 (Tue) | 3.3M | $4.35 | instance initialized |
| Jun 4 (Wed) | 79.7M | $93.73 | migration and build day |
| Jun 5 (Thu) | 131.8M | $141.06 | migration and build day |
| Jun 6 (Sat) | 4.0M | $3.71 | jobs mostly idle |
| Jun 7 (Sun, partial) | 19.4M | $30.42 | partial day |
| 5-day total | 238M | $273.49 |
Run-rate: about $117 per active weekday, roughly $585 a week, roughly $2,300 to $3,000 a month, roughly $28,000 a year.
Data note: logs start June 3, so there is no earlier baseline, and some sessions show missing cost entries, so the real cost is likely higher. All dollar figures are estimates at Sonnet 4.6 list rates ($3 in, $15 out, $0.30 cache-read, $3.75 cache-write per million tokens).
02 / Where every dollar goes
Nearly two-thirds is re-reading, not real work
Read this as: nearly two-thirds of the bill is the agent re-caching the same context (its handbook, memory notes, and tool catalogs, about 130,000 tokens) on each of roughly 166 cold starts a day, much of it for jobs that find nothing to do. In billing terms these are cache writes, and they cost far more per word than simply reading a saved copy. That is a configuration cost, and it is fixable.
03 / Which groups cost what
The cost is the bookkeeping fleet, not the trading
There are about 166 fires a weekday. Per-fire cost is dominated by the roughly 130,000-token cold cache plus reads and output. The figures below are estimates allocated by fire-volume and output weight, anchored to the measured totals.
| Group | Fires/weekday | Model | Est. $/mo | Share |
|---|---|---|---|---|
| Back-office automation (email to project board to bookkeeping) | ~143 | Sonnet 4.6 | ~$1,860 | ~80% |
| Markets desk (alerts plus end of day) | ~14 | Sonnet 4.6 | ~$120 | ~5% |
| Interactive, dev, nightly | varies | Sonnet 4.6 | ~$200 to $600 | ~10 to 20% |
| End-of-day field photos to dispatch (4 workers) | ~4 | Sonnet 4.6 | ~$48 | ~2% |
| Misc daily routines | ~5 | Sonnet 4.6 | ~$50 | ~2% |
Key finding: the cost is the bookkeeping fleet, not the trading. The 7 hourly email-to-board parsers alone are about $1,365 a month. Every one runs full-context Sonnet, even to confirm there is no new email.
04 / The memory upgrade, up first then down
The foundation that lets the bill drop
A recent memory upgrade nudged the bill slightly up at first, about $70 a month, because the nightly process promoted new facts into the memory file (now about 21KB), and that bigger context is re-cached on all 166 daily jobs. Light-context is still off on all 30 jobs, so none of the new search is reducing load yet.
Once wired into the Token Program it turns sharply down. With light-context and on-demand memory lookup, a job boots a minimal 20,000 to 30,000-token context and only queries memory when needed, turning the 62% cache-write line from 130K down to about 25K per fire, a 4 to 6 times cut. Plainly: that work did not lower the bill yet, it built the foundation that lets the bill drop.
05 / Before and after, the Token Program
Two levers: cheaper models, and stop re-loading the brain
The fleet plus Token Program is 35 hours at $125, which is $4,375. The 8 agents are already written; this is finishing, wiring, and the savings work. It pulls two levers: route simple jobs to Haiku 4.5, and stop re-loading the whole brain at every wake up (light-context).
| Proposal line | Lever | Effect |
|---|---|---|
| Set each agent's model, cheap to top | Email and board triage to Haiku 4.5 | ~70% off that cluster |
| Token savings: jobs, caching, context | Attacks the 62% cache-write | 130K to 25K per fire |
| 30-day tracking and dashboard | Visibility | find and kill waste |
| Wire and route the 8-agent fleet | Specialization | min viable model per job |
Pays for itself: targets about $1,800 to $2,100 a month saved on about $3,000 a month of usage, roughly a 2-month payback, then about $22,000 to $25,000 a year ongoing. Savings start immediately because the token work is sequenced first.
06 / North star
Our own model, and the API line falls toward zero
The goal is to stop renting intelligence and run our own pre-trained model, so the API cost falls toward zero, with the frontier model kept only when a hard problem genuinely needs it. Today's spend is data acquisition, not burn. Each of the 166 daily runs is a labeled task-to-action trajectory: we are paying to generate our own training set while the assistant does real work.
The pieces already exist. The memory system distills runs into durable facts; the Token Program's model-routing doubles as an easy-versus-hard labeler, exactly the split needed to train the in-house model on the bulk and set the escalation boundary. Sequence: cut waste now (Haiku plus light-context, about $2,000 a month); keep accumulating the corpus; train on the easy bulk; cut over, so the in-house model handles roughly 143 jobs a weekday at near-zero marginal cost and the frontier model is called only on the hard tail.
End state: the roughly $28,000 a year provider line collapses toward self-hosting plus the occasional escalation. Same work, on infrastructure we own.
Estimate, not a bill. Per-group figures are allocations anchored to measured totals; the proposed dashboard makes them exact. Hosting and model costs are paid to vendors directly. Money and outside-sending actions stay behind a human approval, per CSR-001.
Sources
Figures come from the agent's own usage and cost ledger on the live service, its scheduled-job list, context file sizes from the workspace, and its self-reported usage for June 7, 2026. Client identity and project names are withheld; product and business names have been replaced with neutral descriptions.