Keeping Cool in a Hot Space: A No BS Framework for AI Products

by Noah Williams in VC Unpacked

March 20, 2025

In a fast moving space like AI, how do we as investors determine what’s hype and what’s the real deal?

We find ourselves asking these questions all the time, so in this short essay we’ll propose a framework to segment today’s AI companies. To make things a bit more digestible, we’ll limit our discussion strictly to products built around large language models. We’ll provide some working definitions, observations from our dealflow, and some of the open questions we have about each.

Note: AI is an evolving space where definitions are hotly debated and the lines between different products blur constantly. The categories and examples we provide here aren’t conclusive or exhaustive, but we think they’re helpful to share as a starting point. 

Tools

Best example: Granola (also our team’s favorite)

We think of tools as standalone products that focus on a single workflow like notetaking, video editing, or research. Most of the time, tools are triggered manually and have well-defined start/stop points (e.g. transcribing a fixed-length video call, generating a video or compiling research.) Granola is a great example; our meetings are transcribed, enhanced with AI, and sent around to the different apps we use.  

Although we love using them, building investment conviction in tools is tough. Consider how just a few short months ago, generating research with up-to-date sources was unique to Perplexity, whereas today it’s common across a variety of tools like Gemini, Grok, and ChatGPT. Similar “feature bleed” happens in a variety of ways to early-stage companies and can quickly erode what might look like a competitive advantage.

In general, what tools gain from foundation model improvements, they lose equally in their proximity to them. Many a startup has been absorbed by a hyperscaler bumping up against their use case, hence the (possibly deprecated) meme of OpenAI killing startups with every new model release. Breakout products like Granola seem to have decent staying power on user experience alone, but we’ve seen countless other “cool” tools made obsolete. It seems the answer might be that building right on the edges of a hyperscaler’s roadmap – where users can’t get a similar workflow from the base model alone – is the safest place to be.

Essential questions we like to ask:

  1. How do you respond to foundation model improvements?
  2. Is the workflow you focus on defensible?

Assistants

Best example: Glean (and now Onyx)

Assistants are the messy middle of AI right now and represent the vast majority of companies. Whereas tools mainly solve isolated problems or replace workflows, most assistants today look more like an “intelligence layer” wrapped around a customer’s data. Architecturally, they tend to look like an internal chat interface that can access documents and connect to other software.

What’s strange about assistants is that the workflows built around them are typically very “light” – if they can be called workflows at all. Common patterns include generating documents or proposals, summarizing information, and helping to find documents with retrieval-augmented generation. Base models themselves do most of the heavy lifting, which is nice from a macro model-advancement perspective but raises questions on long-term defensibility/durability.

Most investors today believe the “moat” for these products will come from proprietary data. While this is convenient (and logical) it is more often just an extension of classic vertical SaaS thinking. We find in our evaluations of companies that the quality and depth of “proprietary” data varies greatly. 

Essential questions we like to ask:

  1. What exactly is the proprietary data you capture? What makes it impactful?
  2. What workflow do you most want to augment/replace, and why is it valuable?
  3. What does this product evolve into on a five to ten year timeline? 

Agents

Best example: Deep Research

Agents are the true wild west in technology right now. We think there are far fewer true agents in-market than it seems, but the lines are starting to blur as Assistant products gradually build in agentic features. In our view the frontier of agents is centered on autonomy and customer trust. These are systems that are set up with guidelines – not rules – and operate without human intervention or oversight by design.

The big challenge we’ve observed so far is that people outside the tech community don’t necessarily trust agents yet. The technology to deploy autonomous agents (within some boundary of acceptable error) is already here, but many enterprise customers still want deeper observability and control into how LLM-based systems actually arrive at decisions.

Pricing models are also an open question as token cost is highly variable for multi-agent systems, even on a per-query or per-task basis. We’ve seen a variety of different paradigms from outcome-based (where a task completion earns revenue) to hybrid (blended subscription with rate limits) to 100% “token markup” business models that look more like API-as-a-product. The ascendancy of reasoning models and the additional cost of test-time scaling only adds to this complexity.


We find that the most exciting work in AI is happening at these edges where agents are allowed an almost “scary” level of responsibility. The question in these cases is less about what workflow an agent replaces, and more about what new abilities someone gains when using (or working with) one.

Essential questions we like to ask:

  1. What’s the underlying cost structure for an agent?
  2. How do customers want to pay? How are they willing to pay?
  3. To what extent do your customers trust agents?

Again, these are just the rough categories we see most companies fall into. We’re most excited, of course, for the founders breaking this mold and building against entirely different product paradigms. We’ll be tracking this space closely and look forward to updating this framework as things evolve.