What happens when your AI tokens talk?
AI inference data is quietly going to become the most unforgiving performance review in history
Disclaimer: The views here are my own and do not represent my employer or anyone else.
Tom Tunguz wrote something quietly important last week - his inference spend went from $7k to $100k annualized in two quarters, and he framed it as the emergence of a fourth compensation component, salary, bonus, equity, and now tokens.
As i finished reading the article focusing on the fact that inference/tokens is potentially turning into a soft-compensation data, it also occured to me that for the first time in history, thinking is leaving a trace.
Every prior attempt to measure knowledge work collapsed into proxies, outputs, and a lot of theater. For example, OKRs measured what you shipped, not how you thought. Stack rankings, something i continue to detest to date, measured perception as much as performance. Even the most sophisticated people analytics tools were fundamentally backward-looking, reconstructing cognitive effort from artifacts that were already weeks old by the time anyone looked at them.
The work itself, the actual motion of a mind engaging with a problem, left nothing behind. Inference spend changes the texture of that problem in a way that’s easy to underestimate.
For the growing layer of work that runs through observable infrastructure, what is emerging is a real-time record of cognitive activity:
A solutions engineer who spends three hours in Claude iterating on a technical proof-of-concept, running fifteen variations before landing on the right architecture, leaves a very different token signature than someone who generates one response and pastes it into a deck unchanged
Or, a marketing strategist who uses AI to pull competitive intelligence, stress-test messaging against six different buyer personas, and rewrite a positioning brief four times before sending it to the CMO looks nothing like someone who asks AI to clean up a paragraph
The signature reveals how fast someone moves from problem to action, how deeply they iterate, whether their tool usage reflects genuine problem-solving or the appearance of it etc. These signals are imperfect, but once signals exist, organizations find ways to use them (e.g. Github commit logs for engineering productivity, email response time as a performance and team satisfaction signal, Slack activity metrics becoming a management tool, and so on).
This lineage adds a lot of resolution to the gap between perceived and actual contribution. Every organization carries this gap, between the person whose perceived contribution matched their actual output, and the person who was believed to be indispensable based on presence, confidence, and the complexity they added to simple things. That gap persisted because knowledge work was unobservable. Inference data doesn’t close it overnight, but it introduces a pressure that compounds quietly over time.
I think this matters most for the knowledge worker whose value lived in the white space, the person who coordinated, held institutional memory, and shaped decisions without owning them. That contribution is real, but it was evaluated mainly through social proof rather than evidence, and opacity was very protective. It’s becoming something you now have to actively maintain.
The other deeper irony worth talking about is that the same AI infrastructure compressing the cost of execution is simultaneously making the quality of judgment more legible than it has ever been. These two forces are usually discussed separately, one as a productivity story and the other as a surveillance concern, but they are the same phenomenon viewed from different angles. The token is both the unit of production and the unit of measurement, and that duality is entirely new.
Tom got to 12% of his original inference cost over a weekend with identical performance. The person still burning $100K in tokens had better be producing something meaningfully different, and now there are tools to check.
We spent fifty years trying to measure knowledge work and failed because thinking left no trace. The token doesn’t solve that completely, but it solves enough that the underlying assumptions of how careers are built and how value is attributed are quietly due for revision. Worth thinking carefully about what those tokens will say about you 🤖

