How to Think About Agents

A practical framing page for understanding what agents are good at, where they fail, and why the harness, memory, and initialization strategy matter more than model brand.

CREATED: 2026.06.09·UPDATED: 2026.06.09·18 MIN READ

How to Think About Agents

It may sound presumptuous to write a page called “How to Think About Agents”. I do not mean it that way. The difficulty is that agents sit in an awkward conceptual space. They can write, reason, call tools, remember things, inspect files, search the web, and carry out long chains of work. In some narrow situations they are already faster and more diligent than people. Yet they are not people, and the places where they are unlike people matter enormously.

The practical problem is not that agents are mysterious. It is that they are easy to anthropomorphise in exactly the wrong places. If you treat an agent as a clever colleague with a stable inner life, you will over-trust it. If you treat it as a mere autocomplete box, you will miss most of what makes the technology useful. The balance point is to see the agent as a cultivated working system: a model, a harness, a set of tools, a memory strategy, and a discipline of review.

This page is a framing guide. It is meant to help you decide what agents are good for, where they should be constrained, and why serious agent work is less about picking the fashionable model and more about building the right operating environment around it.

What a companion agent is

A companion agent is not a single chat thread. It is an agent, or more often a small cluster of agents, configured with enduring memory, curated skills, recurring work, and boundaries that fit a person or organisation.

It may have daily input tasks: email triage, market research, competitor monitoring, financial tracking, news scanning, customer feedback analysis. It may have evaluation tasks: ranking opportunities, surfacing anomalies, checking whether yesterday’s work produced anything useful. It may have output tasks: drafting notes for review, preparing content, updating a dashboard, or communicating with the world after approval.

The important word is cultivated. A companion agent becomes useful through repeated interaction, correction, procedure design, and memory hygiene. You are not merely asking questions. You are shaping a working relationship between your aims and a machine system that can act on your behalf.

That cultivation also includes boundaries. Some tasks should be automated. Some should be drafted but not sent. Some should be researched but not decided. Some should require explicit human approval every time. Posting, purchasing, publishing, deleting, and contacting people are not the same kind of action as summarising a document. A serious companion agent makes those distinctions structurally, not as an afterthought.

The LLM is becoming the commodity layer

The first, perhaps controversial, point is also one of the most practically useful: the large language model is becoming a commodity layer.

That does not mean models are identical. They are not. Claude, ChatGPT, Grok, Gemini, DeepSeek, Qwen, Llama, and specialist code or reasoning models each have their own character, latency, price, context length, tool behaviour, and failure modes. At any given moment one model may be materially better for a task. But the frontier does not stay still. One provider moves ahead, another catches up, open-weight models narrow the gap, pricing changes, routing layers improve, and users learn to switch.

OpenRouter’s State of AI 2025 report makes the point for me. It describes a market serving more than 300 active models from over 60 providers and reports that open-weight models reached roughly one third of total usage by late 2025. It makes clear there is not one model winning out. Model use is fragmenting and specialising as people route work to whatever model fits the job, cost, and workflow.

So the investment question changes. Model choice can be significant, but be careful about investing all your working practice in one model brand. The time investment you will be making is the harness and the ecosystem. It takes time to build and mature. We are talking about the local code, permissions, network of skills, memory, review loops, and publication paths that let you change models without losing the working system you have cultivated.

Diagram distinguishing the LLM, the harness, and the wider agent ecosystem
The LLM is the pattern engine. The harness turns model output into controlled work. The ecosystem is the durable layer you cultivate over time.

A chatbot gives you a conversation with a model. An agent harness gives your model digital arms and legs and allows the model, in a controlled way, to act: read files, call APIs, run commands, inspect a knowledge base, maintain a task board, send messages, create drafts, or ask for approval. A companion agent ecosystem such as Hermes goes further. It adds persistent memory, curated skills, scheduled work, domain-specific profiles, observability, and review gates.

When you create a companion agent you cultivate it. You mould it to your preferences. You invest time inputting what may amount to tens of thousands of directives, opinions, facts, corrections, workflows, and limits. As it develops, that investment starts to compound. If the model improves, the harness can use it. If a model becomes too expensive, too censored, too weak at tool use, or simply no longer the best fit, the harness can route around it.

That is the strategic point. If memory, skills, workflows, and habits are locked inside a single vendor’s chat product, your accumulated advantage is fragile. Once you have cultivated the harness and ecosystem, changing the LLM at the centre is less traumatic than people suppose. The new model is not identical, but it often fits into the same working structure and produces a recognisably similar level of performance. The reasoning engine changes. The cultivated system remains.

Comparing an LLM to human capability

A large language model is trained on enormous quantities of written and code-like material. It absorbs patterns from books, documentation, forums, articles, transcripts, tutorials, source code, and the growing body of AI-generated text. It does not know these things as a person knows them, but it has learned a remarkable map of how people express facts, procedures, arguments, styles, errors, and repairs. One way to think about an LLM is as an entity that can call upon and combine previously recorded thought: human thought, and increasingly now other machine-generated thought, preserved in the world of data.

That makes an LLM extremely good at work where the method has already been expressed somewhere in the commons. It can draft a legal-style clause because many such clauses exist. It can explain a database migration because many people have explained migrations. It can debug familiar code paths, summarise research, transform formats, write scaffolding, classify emails, compare options, produce first drafts, and follow procedural instructions with inhuman patience.

A chess example is useful because it marks a boundary. A general LLM may know a great deal about openings, famous games, and standard tactical motifs because those patterns have been written down. But unless it is equipped with a chess engine, it is weak where the next move has to be calculated from first principles in a position not already represented in the written record.

The same boundary appears in the research literature. Apple’s 2025 paper, The Illusion of Thinking, tested large reasoning models in controlled puzzle environments such as Tower of Hanoi, river crossing, checker jumping, and blocks world. The important result is not that models sometimes made mistakes. Everyone makes mistakes. The important result is that performance could collapse beyond certain levels of complexity, and that the models often failed to apply explicit algorithms consistently even when they had enough token budget to continue. A 2026 survey, Large Language Model Reasoning Failures, gives smaller examples of the same family of problem: models failing simple working-memory and rule-switching tasks such as n-back sequences or sticking with a previous answer pattern after the rule changes.

This is why I am cautious about calling agents “creative” in the human sense. An agent can appear original because it recombines a vast library of expressed procedures. It can also be genuinely useful in producing candidates a human would not have had time to explore. But that is not the same as a reliable faculty for ground-up discovery. Where the answer requires exact computation, embodied judgement, emotional intuition, or first-principles invention, the agent needs tools, tests, or human supervision.

That limitation is less limiting than many suppose. Most work, especially mid-level office work, does not require invention. It requires the correct application of known procedures under imperfect conditions. Agents tend to be strong here because diligence, iteration, memory retrieval, and systematic exploration compensate for the absence of human inspiration. Edison’s line about one percent inspiration and ninety-nine percent perspiration is overused, but in this domain it lands. Agents are good at the perspiration.

So do not ask an unequipped agent to be a genius. Ask it to be a tireless procedural worker inside a system that supplies the right context, tools, checks, and escalation points. Get the agent to do the grunt work. Keep the places that require judgement visible.

The blank instant before a session starts

It is sobering to realise that before a session starts, the LLM likely knows nothing about you in the personal sense, unless you have a Wikipedia page and the model has seen it. It has general training. It may have broad knowledge of public facts. It may have learned patterns from many kinds of writing and reasoning. But it does not know your history, your preferences, your projects, your permissions, your private files, your standards, or the meaning of the conversation you are about to have unless that information is placed into the session context.

If you want to have a conversation with an agent where there is some level of relationship and understanding of who you are, all of that has to be supplied at the outset of every conversation.

That is one of the central facts of agent work. It is of central importance to the cultivation of a personal agent. Your task is the creation of an environment for the strategic seeding of context: just enough relevant information, just enough disclosure of tools, just enough procedural direction, and just enough memory retrieval to maximise the agent’s ability to complete the task it has been allocated for that session.

Context “before the session” is a blank window. The agent starts with a generalised view of the world as represented through the internet and other training material. From that blank slate, the system has to create a meaningful starting point for useful work.

Agent tasks might be anything:

  • New scheduled email triage
  • Daily scheduled news and social media search for stories relating to your business or interests
  • Ingest of an academic paper you have found to be of interest
  • Expenses gathering from email and supplier sources
  • Marketing channel analysis

At the start of a reasoning turn, the harness assembles an input package. It may include the system prompt, developer instructions, tool descriptions, memory entries, skill summaries, retrieved notes, selected files, previous chat messages, and any observations returned from tool calls. That whole package is sent to the model. The model then produces a response. Sometimes the response is text for the user. Sometimes it is a request for a tool call. The harness decides whether that call is allowed, executes it outside the model, records the result, and sends the observation back into the next reasoning turn.

Diagram of context assembly and the reason-act-observe tool loop
An agent acts through a loop: assemble context, reason, request an approved action, observe the result, and reason again.

This is often called a ReACT loop: reason, act, observe, repeat. The model does not directly operate your computer. The harness does. The model proposes actions in a structured way, and the harness turns approved proposals into real effects.

This distinction matters for safety. It also matters for capability. A model with no tools is limited to what it can infer from context. A model with the right tools can inspect the current world, retrieve private knowledge, run tests, verify claims, and leave durable artefacts behind. But those powers are not inside the model. They are in the system around it.

Context is the agent’s working memory

Context is where the agent’s relationship with the task exists. If the agent needs to know your goal, it must be in context. If it needs to know that a file is read-only, that must be in context. If it needs to know which memory store to query, which approval gate to respect, or which voice to write in, that has to be in context or retrievable from something the context points toward.

This creates a catch-22. An agent cannot search for what it does not know it needs. It does not know what it does not know about your world. Without some initial structure, it may never retrieve the decisive note, tool, convention, or constraint.

The temptation is to solve this by loading everything. That is usually sub-optimal.

Large context windows are useful, but they are not magic. First, they are expensive. Late in a long session, even a trivial message may cause the whole accumulated conversation to be processed again. The exact mechanics vary by system and caching strategy, but the practical pattern is clear enough: long sessions make every additional turn heavier. A polite “thanks” near the end of an overgrown session can be surprisingly wasteful because it drags the accumulated context through another turn.

Second, large context dilutes attention. I think of the model’s reasoning over context as a flashlight. When the context is small and relevant, the light can be close to the material. The agent sees the details. When the context is huge and unfocused, the beam spreads out. The agent may still have the information somewhere in view, but its practical grip on the details is weaker. It misses constraints, confuses neighbouring concepts, or follows an old instruction that should have been superseded.

The trick is strategic initialization of context.

A good initialization package does not try to be a complete biography of the user or a complete archive of the organisation. It gives the agent just enough to start well: the mission, the role it is playing, the relevant constraints, the tools it can use, the memory or knowledge bases it should query, the approval boundaries, and the specific outputs expected from this run.

Too little context leaves the agent guessing. Too much context makes the session costly and vague. The balance point is an initialization packet that tells the agent where it is, what matters, and where to go next.

Diagram showing agent profiles as efficient start points that assemble task-specific context
Profiles are efficient start points. Each one prioritises a different mixture of prompts, memory, skills, tools, and review gates before the agent begins work.

This is why cultivating a personal agent is not merely a matter of writing one perfect system prompt. You end up cultivating a range of start points. A research profile may need source acquisition tools, sceptical appraisal habits, and a memory map of current interests. A writing profile may need voice rules, publication constraints, and evidence standards. A deployment profile may need repository conventions, test commands, rollback paths, and hard approval gates. The intelligence of the system is partly in choosing the right start point before the first token is generated.

Compounding context consumption mental model

An LLM agent does not have persistent working memory in the human sense. Every time it generates a response, it attends over the active context window: the system prompt, developer instructions, conversation history, current user message, tool descriptions, tool results, retrieved memory, and any other material the harness has placed in front of it.

That creates a compounding cost structure. Suppose each turn adds roughly the same amount of material. After ten turns the next response is not processing only the eleventh message. It is processing the accumulated stack that came before it. After twenty turns the stack is larger again. In a simple unmanaged session, cumulative token processing grows like a triangle: first a little context, then more, then more again. The exact curve depends on caching, model architecture, summarisation, retrieval, and provider billing, but the pressure is real.

This is why a tiny late message can be surprisingly expensive. Near the beginning of a session, “thanks” is just a small courtesy. Near the end of a bloated session, it may force the system to carry an enormous accumulated history through another turn. If the session is close to the context limit, even a tiny addition can trigger summarisation, retrieval, eviction, or lossy compression. The new message is small. The structure it lands on is not.

Caching and context-management tricks soften the shape, but they do not abolish it. Prefix caching may avoid recomputing some earlier material. Summaries may compress the history. Retrieval may bring back only selected pieces. Those are useful mitigations. They are not a licence to treat the conversation as an infinite notebook.

The practical conclusion is simple: design agent conversations as if context pressure is present from the start. Use short, high-signal turns. Move durable decisions into files, task boards, skills, and knowledge stores. Restart cleanly when the session has become too heavy. Context hygiene is not politeness. It is engineering.

Compaction and context drift

Long agent sessions eventually hit a limit. Sometimes the hard limit is the model’s context window. Sometimes the practical limit arrives earlier: the agent starts to repeat itself, forgets a constraint, confuses a previous branch of work with the current one, or becomes oddly confident about a summary that has lost the texture of the original evidence.

Most systems respond with some form of compaction. They summarise the session, drop low-value detail, retain decisions and artefacts, and continue from a compressed state. This is useful and often necessary. It is also dangerous if treated as lossless. A compacted summary can omit the thing that later turns out to matter. It can preserve a decision but lose the doubt attached to it. It can carry forward a confident simplification instead of the original messy evidence.

Skilled agent users become better at ending and restarting sessions. They learn when to ask for a handoff note, when to preserve source files, when to create a task with explicit acceptance criteria, and when to re-initialise a fresh agent rather than nursing a bloated context forward. The goal is not a single endless conversation. The goal is continuity of work across clean starts.

This is why durable artefacts matter: task boards, source files, skill documents, knowledge-store records, logs, tests, and publication receipts. The conversation is not the system. The system is the set of artefacts that allows the next conversation to begin intelligently.

Skills, memory, and the second brain

A skill is a reusable procedure made available to an agent. It may contain frontmatter describing when it should be used, a short operating procedure, pitfalls, templates, scripts, and references. Skill frontmatter matters because it lets the harness or the agent decide which procedures to load without reading an entire library every time.

Memory is different. Memory stores durable facts that should influence future sessions: user preferences, environment facts, project conventions, and stable lessons. A good memory system is selective. If everything is remembered, the agent drowns. If nothing is remembered, the agent remains a stranger.

A second brain goes further. It is a structured knowledge base with source material, summaries, links, indexes, and increasingly compressed representations of important domains. Andrej Karpathy has described his own wiki as a way to maintain a personal knowledge base; the same principle applies to companion agents. The agent needs a map of what exists, compressed enough to search and orient from, with links back to fuller source material when detail matters.

There are many strategies for this. You can maintain index files, summarised domain pages, immutable source records, vector search, tag systems, graph links, and recurring review passes. None is perfect. The hard problem is overlapping concern. If two memories, two skills, or two second-brain pages partially contradict each other, a stochastic agent may choose different interpretations on different days. This is why curation is not clerical work. It is part of the intelligence of the system.

Skills also decay. A procedure that worked for last year’s model may hold back this year’s model. Anthropic has reported, in its own customer research and product writing, that old prompting habits can prevent users from getting the best out of newer models. That is exactly what one should expect. A companion agent is not configured once. It is pruned, sharpened, tested, and occasionally simplified.

Harness choices

The leading personal companion-agent options change quickly, but the categories are already visible.

Claude-style cowork products are convenient and powerful, especially when the provider’s interface, model, and hosted tools are enough. The trade-off is lock-in. Your workflows, memories, and habits tend to accumulate inside one company’s boundary.

Open-source harnesses such as Hermes and OpenClaw point in the other direction. They are evolving rapidly, rougher in places, and more demanding to operate, but they let the user treat models as interchangeable components. The harness can call different providers, use local tools, preserve durable files, integrate with task boards, and make the user’s own knowledge system the centre of gravity.

The right choice depends on the user. If you want convenience and can accept the boundary of a single vendor, a hosted cowork product may be right. If you are building a long-term companion system around your own memory, workflows, and business processes, the open harness model is more strategically interesting.

Profiles should follow work, not theatre

It is tempting to model agents on human roles: researcher, writer, assistant, analyst, engineer. Sometimes that is useful. But I do not think the strongest companion-agent design simply imitates an office org chart.

Agents have context limits. They benefit from focus. A human role may combine judgement, memory, politics, taste, tacit knowledge, and social context in one person. An agent profile should often be narrower. It should be initialized around a vertical domain or a horizontal capability, with a clear toolset and a clear handoff contract.

Vertical profiles know a domain: a product, a market, a publication, a customer segment, a codebase, a compliance area. Horizontal profiles perform a kind of work across domains: retrieval, appraisal, drafting, visual production, QA, deployment, monitoring. A strong system often needs both. The mistake is to create a cast of theatrical personalities while ignoring the actual shape of the work.

The question is not “What would we call this agent if it were a person?” The question is “What context does this agent need to start well, what tools may it use, what artefact must it leave behind, and where should a human decision interrupt the flow?”

The practical conclusion

Agents are best understood as systems for turning structured context into controlled work. The LLM supplies a broad pattern engine. The harness supplies tools, permissions, loops, memory, and observability. The ecosystem supplies continuity across time.

This is why the fashionable model of the month matters less than it seems. The deeper advantage is learning how to initialise context, curate memory, maintain skills, split work across focused profiles, and preserve evidence in durable artefacts.

If you want to use agents seriously, do not start by asking how intelligent the model is in the abstract. Ask what system will surround it. Ask what it will know at the first turn. Ask what it can safely do. Ask where its memory comes from, how it will be corrected, how it will be tested, and how its work will survive the end of the conversation.

That is the shift from chatting with AI to cultivating companion agents.

Return to Services.