ChatGPT, Claude, Gemini. Every model you've used runs on a single paper published in 2017. I read it. This is what it actually says, and what I built once I understood it.
The paper
Before 2017, AI models processed text like a reader with amnesia. One word at a time, each step erasing what came before.
Eight researchers at Google Brain threw that out. Their paper, "Attention Is All You Need," introduced one idea: instead of reading sequentially, every word simultaneously asks every other word how much it matters. The answer reshapes the meaning.
Take: The bank by the river flooded after three days of heavy rain.
"Bank" has two meanings. The model looks at "river," "flooded," "rain." Those words vote. The financial meaning collapses. Riverbank survives.
Every word. Same time. Milliseconds. What comes out isn't your sentence. It's a compressed version: each word encoded by context.
Compressence
There's a concept for this: compressence. Compress an idea until only the essence remains. Any further and you lose the meaning.
Think of it as a bell curve. "Bank" in isolation spreads across a distribution: financial institution at the center, highest probability, riverbank further out. Add "river" and "flood" and the curve shifts. The financial meaning falls to the tail. Riverbank becomes the peak. Attention moves the curve. Compressence is the peak.
Applied to the 2017 paper, the smallest version that keeps the meaning:
Meaning lives in relationship, not in isolation.
Remove "simultaneously" and you're back to sequential models. Remove "relationships" and you have no attention. Remove "at scale" and you have a toy.
What nobody tells you
The model has no memory.
Every new chat starts from zero. What feels like continuity is the conversation history pasted back in. The model reads it fresh each time.
It doesn't think either. It predicts the most plausible next word given everything in its context window. Confidence is a statistical property of training data, not a signal of truth.
Precision: how often it's right when it answers. Recall: how many right answers it captures. A model can fail at both while sounding completely certain.
No memory. No reasoning. No plan. Pattern matching at scale.
Most users cluster at the center of a bell curve: same prompts, same outputs. The 100x engineers are in the tail. Not because they have better models. Because they understand one thing: the model samples; it doesn't think. What it samples from depends entirely on what you give it.

What you can build from that
If the model has no memory, engineer one. If it only knows what's in the context window, design that window deliberately. If it's pattern matching on text, your text has to be precise.
I keep years of notes in Obsidian: journals, book highlights, article clips, half-finished ideas. I connected Claude directly to that folder and wrote CLAUDE.md: a file that tells it how to operate on my notes. My tracks, my rules, my structure. It never touches raw notes. It only writes to the wiki layer it maintains.
I built 22 commands. Each one is a markdown file, a reusable procedure that teaches the model how to do something specific. In my old setup, this would be a Python script. Now it's a markdown file. Fat with context. The system running them is deliberately thin: reads files, manages context, calls the model, returns output.
/today reads my notes, calendar, and
open threads. Writes a daily plan based on what's
actually happening, not a template someone else
designed.
/context builds a snapshot of who I am
right now: role, projects, open decisions. Every new
session starts with Claude reading it. The model never
asks me to catch it up.
/emerge surfaces ideas my notes imply but I've never written down. First run found a pattern across 18 months I hadn't consciously noticed.
Intelligence lives in the commands. The harness stays thin.
Compressence test on what I built: The model only knows what's in the window. Fill the window with everything that matters.
Five years of notes. 22 commands. One file that tells the model who I am. The architecture told me how to build it.
Vague prompts produce vague answers. Not because the model is dumb. Because you gave it nothing to compound.
The new job
Every enterprise needs someone to own this. Not centrally. At the team level.
The job: find workflows where compute changes the math completely. Not 10% faster. 100x more volume. Every inbound lead processed, not just the top 10%. Every contract reviewed, not just the flagged ones. Every customer onboarded at the same quality, not just whoever gets the senior rep.
The hard part isn't the agents. It's the context. What does the agent need to know, in what form, at what step? Where does the human stay in the loop?
Two categories: latent and deterministic. Judgment, synthesis, pattern recognition: these belong in the model's context space. Numbers, facts, SQL, contract clauses: these must stay deterministic, outside the model's reach. The job is knowing which is which and building the boundary. Blur it and you get a system nobody trusts.
The skill is the same whether you're building for yourself or a team of 500: the model reads, not thinks. Design around the constraint.
It's already a job. Most companies just haven't written the description yet.
What to do with this
The prompt isn't a question. It's the only document the model will ever read about you. Write it like one.
Most people type a question and wait. They're using a search engine that writes sentences instead of returning links. The ones getting real leverage treat the prompt as a brief: full context, clear constraints. Give it nothing and it invents.
Stop treating this like a conversation. Start treating it like a spec.
Notes
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762
- The benchmark was WMT 2014 English-to-German translation. The model scored 28.4 BLEU, beating the previous state of the art by more than 2 points at a fraction of the training cost.
- Several of the eight authors went on to found the labs now competing with Google. Vaswani co-founded Adept (acquired by Amazon in 2024). Gomez co-founded Cohere. Uszkoreit co-founded Inceptive. The paper that gave Google's competitors their foundation was written by Google employees on Google time.
- Precision and recall are borrowed from information retrieval, where they measure search quality. The framing is deliberate: the model has the same failure modes as a search index, just less visible.
- The term comes from Chris Begg, via a profile at whatgotyouthere.com. Begg uses it to describe compression to essence: remove any further and meaning breaks.
- The pattern: more time spent building infrastructure than writing. Eighteen months of daily notes made it visible.