The week agents started paying for things
Agentic commerce quietly shipped. Plus: a small model beats a big model (again), and the regulatory shoe I've been waiting to drop.
Two independent teams this week shipped agents that can actually complete a purchase end‑to‑end, not in a demo, in production, with real cards. The interesting part is not the capability; it is that both teams quietly moved their checkout flows to a policy‑gated architecture that treats the agent like an untrusted employee with spending limits. This is the right pattern. Expect every serious B2B surface to follow it in the next six months.
A 7B open model from a small lab is beating a frontier model on domain‑specific legal extraction, at roughly 1/40th the cost per call. The lesson is not 'small models win'. It is 'the narrower the task, the earlier small models win'. If your product is narrow, stop defaulting to frontier.
The EU AI Act's first enforcement actions are landing, and the pattern is clear: regulators are going after documentation gaps, not model quality. If your system card is three paragraphs and a link, start writing now. I'll cover this at length in Brief 48.
Spent this week fighting a long‑horizon memory bug in Geoffrey. Turns out the 'memory' layer was dutifully remembering things the user had asked it to forget. Hard‑won lesson: tombstone everything, and assume every cache layer lies to you at least once a quarter.
If you only read one thing this week, read the post‑mortem from the team whose agent bought the wrong conference ticket five times. It is the most honest write‑up of an AI failure I have seen this year.