Context Made Intelligent

Rethinking retrieval systems for LLMs using hierarchical semantic decision trees

Aug 03, 2025

One persistent challenge in LLM-based systems is managing context: deciding what data the model should “know”, when that information is crucial, and how to deliver it. This is especially critical in workflows that involve nuanced decision-making over evolving bodies of knowledge. In these environments, the cost of fragmented context is high. Decisions are rarely made by pulling isolated facts; they’re made by applying a strategic lens over dynamic information. For example, when evaluating companies against a live investment thesis, or applying institutional memory across strategy, portfolio, and market shifts.

A quick recap of the three approaches most commonly used for personalizing general-purpose models to specific workflows or knowledge domains (in increasing order of complexity):

Prompt engineering: Crafting detailed instructions, examples, and heuristics in the prompt to guide the model’s output behavior.
Retrieval Augmented Generation (RAG): Fetching relevant external data, typically using vector similarity or sparse search, and injecting it into the model’s context window at inference time.
Fine-tuning: Modifying the model’s internal weights by training it on custom datasets, embedding persistent knowledge or behaviors into the model itself.

The dream for me was to have a model that has all the context of everything that goes on at Montage, deeply understands our thesis and opinions, and stays up to date on recent news and the market. That’s where RAG falls short, since it can only pull slices from the database. Even with very accurate retrieval algorithms, it doesn’t give the model a holistic view. That’s totally fine in many applications where you only need the specific documents from your database to inform the LLM, but for a model that “knows everything”, a more fitting approach would be to fine-tune pre-trained models like GPT on your own data. The thing is, it’s a lot of work and prevents you from switching between models easily without fine-tuning again.

I’ve experimented with various retrieval pipelines and prompt engineering setups in the past (and written about them here in past Substack pieces), they all hit similar limitations. Out-of-the-box LLMs are impressive at surface-level tasks, but they struggle with deep reasoning aligned to a specific worldview or strategic context.

So I wanted to talk about an approach that I’ve found to work well, bridging these gaps while still maintaining simplicity: using a hierarchical semantic decision tree to guide the model with relevant context at each node.

Instead of retrieving documents at query time or hardcoding, this approach builds a persistent tree structure that represents a thematic map of the world. Each node encodes increasingly specific categories and is embedded with relevant context, which in my example are memos, meeting notes, portfolio examples, market updates, and so on.

As the model processes new inputs, it traverses the tree, narrowing from broad categories to detailed sub-themes, using embedded context at each node to guide the path. This structure allows for:

Persistent memory of your worldview and priorities
Dynamic growth, as new nodes are created to represent emerging themes
Traceability of reasoning, enabling meta-analysis or training of downstream models

This approach gives you more control over the context, while still leveraging the flexibility that comes with LLM decision making and analysis.

Here’s a look at how I designed and implemented this system for this specific use case, where it fits well, and what tradeoffs come with it.

I. Designing the Tree

The foundation of this system is a persistent, interpretable structure that encodes how we think about the world, like a thematic map that evolves with our strategy, investment theses, and market insights. To start, implemented with a simple JSON file.

Using JSON may sound unsophisticated compared to more advanced options like knowledge graphs or structured databases, but it comes with important advantages at this stage. Flat files are easy to version-control, inspect, and manually tweak. When you’re building out the initial taxonomy, deciding how to break down categories, defining node-level context - it’s incredibly helpful to have a human-readable structure you can navigate and edit directly. It mirrors how we already think about taxonomies: as nested lists of concepts, from broad themes down to granular subcategories.

The process begins with manual scaffolding. You start by defining high-level categories that reflect your strategic lenses. For this use-case, we define sectors that you can categorize into, like Financial services, Healthcare, Commerce, etc. These are broad domains where we actively invest and track trends. From there, you break each one down into increasingly specific sub-themes. These might come from IC discussions, strategy documents, or simply from patterns you observe in the market.

A key design principle here is explicit children nodes. Every node in the tree includes a children dictionary, even if it’s currently empty. This ensures clarity in traversal and prevents ambiguity, especially in cases where multiple sub-themes might share a name (like “Payments” under both Fintech and Commerce). By explicitly structuring the hierarchy, you make sure that context is always tied to a clear path - no accidental overlaps or collisions.

Each node also has a meta field where you embed relevant context. This is where the tree becomes more than just a taxonomy, and turns into a living knowledge base. You can enrich each node with:

Portfolio company examples
Strategic notes or internal IC comments
Investment thesis descriptions
Market updates or news snippets
Recent deal activity

Effectively, you’re embedding context at the decision point, rather than relying on a global vector database to retrieve it at query time. This approach keeps the reasoning traceable and structured. The model doesn’t need to “remember” everything up front, as it can traverse the tree and narrow down categories while progressively accessing richer context as it gets closer to a decision.

In a sense, the tree acts as a structured alternative to vector retrieval: lightweight, interpretable, and designed for human-in-the-loop iteration.

II. Tree Traversal Logic

Once the tree is designed, the next challenge is deciding how the model reasons its way through it. Starting at the root node, the model is presented with a set of immediate child categories. For each decision point, the model is given:

The company description (or whatever input is being classified)
A structured summary of the current node’s children, including their descriptions, investment interest levels, existing portfolio examples, and any relevant notes
Optional context from the current node itself, like strategic insights or recent news snippets

With this information, the model selects the child category that best fits the input. In our use-case, this decision is a semantic judgment where the model has to cut through surface-level noise and think about what the company truly does.

After selecting a child, the traversal recurses into that node and repeats the process. This continues until either:

The model reaches a leaf node (i.e., a category with no further children), or
The model determines that none of the current children are a good fit, in which case it suggests creating a new category at that level.

At every step, the path taken is logged. This “decision trace” might look like: ["AI", "AI Applications", "Agentic Interfaces"]. Maintaining this trace is crucial for a few reasons:

Interpretability: You can see exactly how and why a classification was made, step by step.
Auditability: If a categorization feels off, you can trace back through the model’s path to diagnose where the reasoning went astray.
Training downstream models: Over time, these traces form a structured dataset of your decision-making patterns. This provides a foundation for potential directions like training meta-models that learn to mimic your “taste” in specific sectors.

One important aspect of traversal is handling ambiguity. There are cases where a company might reasonably fit under multiple sub-categories, or where the model isn't entirely confident about which path to take. There are several ways to handle this:

You can prompt the model to return ranked options with rationale, highlighting why certain paths are stronger fits.
You can allow inputs to be tagged under multiple categories if they straddle different themes, which is useful for cross-sector plays.
For edge cases where the model’s confidence is low, you can flag the decision for human review, adding a manual oversight layer to the system.

This traversal process is deliberately lightweight, not requiring a dense retrieval backend or complex schema enforcement. Instead, it relies on a well-structured tree, context-rich prompts, and the model’s ability to reason within defined boundaries. It’s a system that scales with both human and machine collaboration, allowing you to evolve the taxonomy over time without losing clarity or traceability.

III. Context Injection

A key design principle of this system is giving the model just enough context to make informed decisions. This balance is what allows the traversal process to stay accurate, interpretable, and cost-efficient.

At each decision point in the tree, the model is presented with a curated snapshot of context. Specifically, when a node is being evaluated, it injects its own metadata into the prompt, like strategic summaries, recent news snippets, portfolio examples, and any notes or cautions flagged during past investment discussions. This ensures the model isn’t just making decisions based on category names; it’s reasoning with the same context a human investor would consider.

That said, a conservative default is to keep the model’s view focused on the current node and its immediate children. You want the model to reason locally, evaluating context in manageable slices rather than being overwhelmed by the entire tree’s contents. This is especially important as the tree grows deeper. If you inject too much ancestry or reveal large subtrees at once, you risk diluting focus, bloating token usage, and introducing unnecessary noise into the decision-making process.

To avoid prompt flooding, the metadata at each node is designed to be compact and distilled. These aren’t full memos or IC documents; they’re strategic summaries, quick notes, and bite-sized insights that give the model just enough signal to reason effectively. Think of it as the cheat sheet an investor would scribble in the margins, not the full investment memo - for now.

If a node starts to accumulate too much context, that’s a signal to split it into smaller, more focused sub-categories. For example, if “Fintech Infrastructure” becomes overloaded with examples and nuances, you can break it down into more granular leaves like “Payments APIs,” “Embedded Finance,” or “Compliance Automation.” This pruning keeps the tree agile and ensures that context injection remains sharp and purposeful.

By controlling context at the node level, you create a system that scales gracefully. The model isn’t tasked with holding your entire worldview in its head at once, but reasons progressively, informed by the context that matters most at each step.

IV. Autonomy and Node Creation

A well-structured taxonomy needs to evolve as your worldview evolves. New themes emerge, markets shift, and companies start pitching ideas that don’t quite fit into existing categories. The challenge is allowing the model enough autonomy to recognize and propose new nodes without letting the tree sprawl uncontrollably.

In this system, dynamic node creation is allowed, but only under clear conditions. Specifically, the model can propose new categories when it reaches a leaf node or an explicitly permitted layer. The traversal logic is designed so that at each step, the model is nudged to make a fit within existing categories. Only when no existing option makes sense does it consider creating a new branch.

When proposing a new node, the model uses both semantic reasoning and heuristic guidance. It’s not just free-writing names; it’s instructed to think critically about recurring patterns in the input. For example, if the model keeps encountering companies building “AI Tools for Engineering” under the broader “AI Applications” category, it can suggest creating a dedicated sub-theme to capture this emerging space.

But autonomy comes with guardrails. Every time the model proposes a new node, it must also provide a justification, clearly explaining why none of the existing categories are a sufficient fit. Expanding the tree is also an opportunity to understand how the model interprets your taxonomy. When the model suggests new categories based on recurring patterns it encounters, it reveals how it “thinks” about your domain. These suggestions aren’t always perfect, but they offer valuable signals about emerging themes or areas where your existing structure may be too rigid. Rather than fighting these model-driven suggestions, it’s often more productive to co-create the structure with the model, adjusting the tree in ways that align with both your strategic lenses and the model’s semantic intuition. This collaborative dynamic keeps the taxonomy both grounded and adaptive. This explanation becomes part of the trace, giving you visibility into the model’s reasoning and allowing for a human-in-the-loop review before the tree is actually modified. This step is crucial to prevent category bloat, redundant paths, or overlapping branches that dilute the clarity of the taxonomy.

Another important consideration is deduplication. Some themes naturally span multiple categories. “Payments,” for instance, is relevant in Fintech, Commerce, and even Healthcare. In these cases, you have a few strategies:

Mirroring context across nodes: You can allow “Payments” to exist as a child under each relevant parent, embedding context that’s specific to the vertical.
Canonical context stores: Alternatively, you can maintain a single, shared context store for cross-cutting themes and point each node to it. This keeps context centralized while allowing multiple paths to reach it.
Multi-tagging inputs: In some cases, it’s cleaner to tag companies under multiple categories rather than trying to force a single path. This is especially useful for companies that are true cross-sector plays.

The key is to maintain a balance between structure and flexibility. You want the tree to grow dynamically, reflecting new insights and emerging themes, but always within a controlled framework that keeps the reasoning process transparent and aligned with your strategic lenses.

V. Injecting New Data

A taxonomy like this is a living structure that needs to evolve as new information comes in. Whether it’s a fresh investment memo, a founder update, or market news, the system needs a way to absorb new data without starting from scratch each time. The goal is to keep the tree current, continuously enriching it, without having to reindex or rebuild the entire context layer.

Mapping unstructured inputs into this structure is where LLMs become incredibly useful. You can use the same traversal logic that analyzes companies to route snippets of information into the appropriate nodes. The model analyzes the snippet, reasons through the category tree, and determines where it belongs. This turns the tree into a self-updating knowledge base, where context is continuously layered in without requiring human curators to manually sort every piece of data.

One of the major advantages of this system over vector-based retrieval setups is that it doesn’t require periodic re-indexing or re-embedding of data. When you update a node with new information, it’s instantly part of the traversal context. There’s no lag, no expensive batch processing. You’re working with a lightweight, append-only structure where updates are transparent, immediate, and directly tied to decision points.

This keeps the system nimble. As the tree evolves, context builds up organically, node by node, without disrupting the overall architecture. The model doesn’t need to “remember” everything, it simply needs to know how to traverse the tree and access the right context when making a decision.

Tradeoffs and Mitigations

Building a persistent decision framework like this comes with inevitable tradeoffs. You’re giving the model a structured scaffold to reason within, which creates clarity and traceability, but also boundaries. If those boundaries become too rigid, you risk overfitting: the model might miss creative fits simply because a category doesn’t exist yet, or because the tree structure frames the decision too narrowly.

On the flip side, when node summaries are too sparse or poorly framed, you open yourself up to false positives - situations where the model confidently follows an incorrect path, simply because it lacks enough context to challenge the framing. And as the taxonomy grows, tree bloat becomes a real concern. Too many branches dilute focus, overwhelm traversal, and turn context injection into a bloated, expensive process.

These problems require active stewardship:

Regular pruning and simplification of nodes to keep the tree lean and focused.
Auditing traversal traces to spot surprising or incorrect paths, which can reveal blind spots in how context is framed.
Leveraging LLMs themselves to suggest node merges or restructuring, analyzing usage patterns and semantic overlaps to propose cleaner, more efficient hierarchies.

As this system evolves, it’s clear that flat JSON files will eventually become limiting. Moving the taxonomy into a more dynamic structure, like a graph database like Neo4j or a PostgreSQL setup with JSONB columns, opens up new possibilities. You’d be able to query the tree in ways that are currently manual and tedious:

“Fetch all nodes where more than three insights were added in the past month.”
“List all subcategories under Fintech where our investment status is marked as ‘high’.”

It also enables more dynamic context slicing. Instead of statically formatting long JSON blobs into prompts, you could pull only the most relevant context on the fly, keeping prompts lean and adaptive.

Another powerful extension is closing the feedback loop. Right now, the system guides categorization and reasoning, but it doesn’t yet learn from outcomes. Over time, feeding back signals like investment outcomes (tracked, passed, funded) into the tree can refine future recommendations. For example, if certain paths consistently lead to “pass” outcomes, that signal could inform how node context is framed, or even prompt a restructuring of how that space is categorized.

On a practical level, there’s also a question of cost and efficiency. The recursive traversal model is deliberate, but it does incur higher token usage compared to flat retrieval systems. One mitigation here is to precompute traversal path probabilities for shallow levels, batching early decisions into a single prompt. For instance: “Out of the following 3-level paths, which is most relevant?” For most cases, this would eliminate the need for step-by-step recursion, only falling back to a more detailed traversal when ambiguity arises. It’s a simple adjustment that can significantly reduce both latency and API costs.

Ultimately, this approach is a balancing act. You’re giving up some of the brute-force flexibility of unstructured retrieval in exchange for a system that reflects how you actually think. The structure isn’t there to constrain the model but to guide it, ensuring that as your worldview evolves, the system evolves with you. The key is to treat it as a living framework that grows, prunes, and adapts, just like your investment strategy.

Of course, the system is still inherently non-deterministic. LLMs reason probabilistically, and no amount of structure will 100% guarantee the same output every time. But what this approach offers is visibility - you always get a trace of how the decision was made. If a path feels off, you can inspect it, understand where the reasoning drifted, and adjust context or structure accordingly. It’s not about enforcing deterministic outputs, but about keeping the reasoning process interpretable, so that you’re never operating in the dark.

Wrapping up

We’ve walked through a structured approach to context engineering using a hierarchical semantic decision tree, a system that acts like a smart, evolving junior reasoning assistant. By aligning model reasoning with the way strategic decisions are made in investment workflows, it provides a scalable, interpretable framework for navigating complex thematic context.

While the use case here has been investment evaluation, the approach generalizes to any system that benefits from:

A persistent worldview or strategic lens
Structured, thematic context
Traceable reasoning paths
Modular, evolving knowledge graphs

Potential applications could span:

Exploratory algorithm development
Medical diagnostics
Government policy assessments
Due diligence and risk evaluations
Scientific literature synthesis
Security assessments
Hypothesis exploration in scientific research
Legal strategy advisors

This isn’t the only way to approach structured reasoning. Adjacent methods like Graph-RAG (Retrieval-Augmented Generation with Knowledge Graphs) offer another path, bringing relational depth into the retrieval layer. Where our semantic tree structures context explicitly before prompting, Graph-RAG automates the traversal of connected entities, enabling multi-hop reasoning across relationships.

Moving forward, a key challenge will be ensuring that the tree remains scalable and manageable as it grows. Maintaining it as a static JSON file will only go so far. I’m interested in building tooling that makes it easier to visualize, edit, and interact with the structure, perhaps evolving it into a lightweight graph database that still preserves the intentionality of this design. Another direction I want to explore is leveraging the decision traces themselves and using them as a dataset to train small models that can assist in routing or flagging ambiguous cases, turning traces into a learning loop.

I’d love to connect with others who are experimenting with these ideas, or who have deep experience building structured reasoning systems or working with Graph RAG. If you’re working on context engineering, graph-augmented retrieval, or anything in this space, please reach out!

Matthildur’s Substack

Discussion about this post

Ready for more?