I gave my LLM a Cortex
I gave my LLM a Cortex
Every LLM session boots blind. Close the window and it forgets everything. The notes, the transcripts, the decisions: all of it piles up write-only, never read back. So you re-explain yourself. Every single time.
I stopped patching this with bigger prompts and built a brain instead.
Cortex is a memory layer that sits between your files and any model. One principle holds it up:
Files own the truth. The brain owns the connections.
The intent
The goal was never a smarter model. It was a model that stops forgetting.
A working setup already produces everything a memory needs. Notes get written. Conversations get logged. Decisions get recorded. Work ships. The substance is all there. What is missing is the connective tissue: a way to read it back, organised, at the exact moment it matters, without a human remembering to go and fetch it.
Bigger context windows do not solve this. They are rented, not owned, and they reset. The fix is a layer that persists underneath the model and survives every session, every tool, and every model swap.
The philosophy
Five rules shaped every decision.
- Files are canonical. The brain never owns content. It is a derived index you can delete and rebuild from your files at any time. Lose the index, lose nothing.
- Nothing is destroyed. Importance decays over time. Rows do not. Old thinking fades in weight but is always walkable back to its source.
- Event-triggered, not command-driven. The brain loads context at session start and re-indexes when files change. You never have to remember to ask. The brain reminds you.
- Index the thinking, not the actions. Reasoning, decisions, prompts, and outputs go in. Tool noise and mechanical logs stay out. Signal only.
- The model is the replaceable part. Nothing here is tied to one provider. The contract is the surface. The model is configuration.
The system around the AI is the intelligence.
Why a brain, not a database
A flat vector database can store everything and surface nothing useful. The structure is the point. Borrowing from human neuroanatomy gives the memory shape, and the shape is what makes retrieval feel like recall instead of search.
Regions. Every memory lives somewhere. Language comprehension, language production, vision, episodic events, spatial and project structure, and identity each map to a region. Retrieval comes back grouped the way the memory is organised, not as one undifferentiated pile.
Tracts. Regions are wired together by typed edges. Some are deterministic: a prompt wires to its answer, an image wires to its description, everything in one project wires together. Others accumulate: any two memories retrieved together enough times grow a permanent edge between them. Fire together, wire together. Cold edges decay.
Consolidation. Raw capture is expensive and noisy, so the brain sleeps. Over time, raw sessions distil into session digests, then daily summaries, then long-lived signal, then promoted facts. Importance rises as content proves durable. A year of raw thinking compresses roughly 37 times into a small, dense, queryable store, with a full lineage walk back to the original on demand.
Modes. The same brain thinks differently on demand. A creative mode weights the retrieval differently from a deep-reasoning mode or a fast-recall mode. Neuromodulation, as software. One brain, several postures.
How it decides what to pull
This is the question everyone asks. The honest answer: nothing picks a single right memory. The brain ranks everything and surfaces the top of the stack.
There is no query you type. At the start of a session the query is your situation: which project you are in, what you touched last, the task in front of you. That becomes the search.
From there it pulls candidates by meaning, then re-ranks them on several signals stacked together:
- Region and layer. Identity and core rules sit high and never fade. Episodic memory is scored by how recent it is.
- Importance. It decays over time, so old noise sinks while durable facts stay near the top.
- Co-activation. Anything retrieved together before gets a boost. Fire together, wire together. This is how the brain learns your patterns.
- Graph walk. Once a memory is in, it follows the edges (a note to the decision it led to, an image to its caption) and pulls the connected pieces too.
A token budget caps the result so it never floods the window. And the whole stack reweights with mode: creative pulls differently from deep, or from fast-recall. Same brain, different posture.
Nothing decides the answer. It surfaces the best-ranked context for where you are, and it sharpens the more you use it.
How to set it up
The pattern is small, and every piece is swappable. Nothing below needs a paid API. Local defaults run it offline.
your files/ # canonical: notes, transcripts, decisions, outputs index.db # derived: SQLite + a vector extension hooks/ session-start # run retrieval, inject top-k context into turn 1 on-change # re-index a file the moment it is edited jobs/ nightly # consolidate raw -> digests -> facts (the "sleep") adapters/ injected-context # default: drop context into the prompt preamble mcp-server # for tools that speak Model Context Protocol
The seven moves:
- Keep your files canonical. Your notes and transcripts stay the source of truth. The brain reads them, it never replaces them.
- Build a derived index. SQLite plus a vector extension is enough. Chunk each file, embed it, and tag it with a region and a memory layer.
- Wire the deterministic edges. Prompt to answer. Image to caption. Everything inside one project to everything else inside it.
- Let co-activation do the rest. When two chunks keep getting retrieved together, strengthen the edge between them. Let unused edges fade.
- Auto-inject at session start. A start-up hook runs retrieval and drops the top results into the model's first turn, organised by region. No one has to ask.
- Re-index on change, consolidate at night. A file-change hook keeps the index fresh. A nightly job distils raw capture into durable facts.
- Stay model-agnostic. Expose the brain through injected context, a thin adapter, or an MCP server. Any model can read it. The next model can too.
What it unlocks for any LLM
The difference is not subtle. It is the gap between a tool that starts cold and one that starts knowing.
- Every session boots with the right context already loaded, shaped the way a brain organises it.
- The brain volunteers what is relevant instead of waiting to be queried. Recall, not search.
- It learns your patterns. Over weeks it notices which things fire together and surfaces them before you think to look.
- A year of raw thinking lives in a small, dense, rebuildable store, with nothing lost.
- It works across every model you use today, and the one you switch to next month.
A model with no memory is a stranger you brief from scratch every morning. A model with a Cortex is a collaborator who remembers.
The model is the replaceable part. The brain is what persists.