How LLMs Choose Which Brands to Recommend

Ask ChatGPT “what’s the best project management tool for a remote team” five times. You’ll likely see the same three or four brands appear in most responses, sometimes in different order, but from a remarkably consistent pool. Then ask the same question about a niche category where no dominant players exist. The model hesitates. It hedges. It may recommend brands you’ve never heard of, or invent features that don’t exist.

This contrast reveals something important: language models don’t search a database of rankings when you ask for a recommendation. They construct an answer based on statistical patterns in their training data and retrieved sources, and the strength of those patterns varies enormously across brands and categories.

Understanding how these patterns form, and what makes some brands appear consistently while others remain invisible, is the foundation of any serious AI visibility strategy.

Five signals that shape LLM recommendations

When a language model generates a brand recommendation, the output is shaped by a set of overlapping signals. None of them work in isolation. Think of them as ingredients in a weighted recipe, where the proportions shift depending on the query, the model, and whether web search is active.

1. Entity weight in the training corpus

The most fundamental signal is how deeply a brand is embedded in the data the model learned from. During training, LLMs process billions of documents: web pages, articles, forum discussions, technical documentation, academic papers, product reviews. Every time your brand appears in this corpus, it strengthens the model’s internal representation of your brand as an entity.

But raw frequency isn’t enough. What matters is frequency in relevant, high-quality contexts. A brand mentioned 10,000 times in spam forums carries less weight than one mentioned 500 times across industry publications, expert roundups, and trusted review platforms.

Research from MIT CSAIL suggests that brands appearing fewer than 50 times across high-trust sources fail to be recognized by LLMs in 72% of queries. There’s an effective threshold of presence below which a brand simply doesn’t register as a meaningful entity in the model’s knowledge.

This creates what some researchers call the “shadow entity” problem: your brand data exists somewhere in the training corpus, but it lacks sufficient density or association with authoritative sources for the model to retrieve it when a relevant query comes in. The brand is technically “known” but practically invisible.

2. Co-occurrence patterns

LLMs learn by association. When your brand consistently appears alongside specific concepts, the model builds statistical connections between them. If “Notion” repeatedly appears in contexts involving “team wiki,” “documentation,” and “knowledge base,” the model develops strong associations between Notion and those use cases.

This is why category positioning matters so much in the AI context. A brand that tries to be everything to everyone may end up with weak co-occurrence signals across many categories, rather than strong signals in the ones that matter most.

The practical implication: the way you describe your product across your website, documentation, press coverage, and third-party mentions should consistently reinforce the same set of concept associations. Not through keyword stuffing, but through genuine topical focus in your content and communications.

3. Sentiment and framing

It’s not just whether you’re mentioned. It’s how. LLMs absorb the sentiment of the contexts where your brand appears. Positive reviews, satisfied customer stories, favorable analyst coverage, and constructive case studies all contribute to a positive entity sentiment in the model’s representation.

Negative signals work the same way. Unresolved complaint threads on Reddit, critical reviews on G2, or controversy covered in industry press can shift how the model frames your brand. When a user asks for a recommendation, the model isn’t just checking if your brand is relevant. It’s synthesizing an overall impression that includes sentiment, and that impression influences whether it recommends you, mentions you with caveats, or skips you entirely.

One nuance worth noting: commercial LLMs are tuned toward a positive, recommendatory tone. When a user asks “what’s the best X?”, the model is structurally inclined to recommend rather than warn. This means negative sentiment needs to be quite strong and widespread to actively suppress a brand from recommendations. But it can absolutely affect positioning within the recommendation list and the language used to describe you.

4. Source authority and citation depth

Not all mentions carry equal weight. A feature in a respected industry publication contributes more to entity authority than a self-published blog post. A mention in well-sourced technical documentation carries more signal than a passing reference in a listicle.

This mirrors how traditional domain authority works in SEO, but with an important difference. In SEO, authority flows through backlinks between pages. In LLM recommendations, authority is encoded in the model’s understanding of source quality. The model has learned, through training patterns, which sources tend to produce reliable information, and it weights mentions from those sources accordingly.

The Princeton GEO study found that content enriched with source citations and statistical evidence improved visibility in AI-generated responses by 30 to 40% over baseline. The mechanism is straightforward: content that cites its sources and backs claims with data gets cited more often by AI systems, because the model has learned that well-sourced content is more trustworthy.

For brands, this means third-party coverage often matters more than owned content. An independent analyst mentioning your product in a comparative report contributes more to your entity authority than your own blog post making the same claims. The model treats independent validation differently from self-promotion.

5. Retrieval layer dynamics

Everything above describes how the model’s baseline “opinion” of your brand is formed through training. But many AI interactions also involve a real-time retrieval component through RAG (Retrieval-Augmented Generation), where the model searches the web and incorporates fresh sources into its response.

For a detailed breakdown of how RAG retrieval works, including indexation latency and the pipeline from publication to LLM inclusion, see our deep dive: Daily vs. Weekly: How Often Should You Track Brand Visibility in LLMs.

The key point for brand recommendations specifically: when retrieval is active, recently published content can influence results. A new product review on a major publication, a fresh industry report, or updated comparison content can surface in the model’s response within hours to days, depending on the source’s crawl priority.

But retrieval doesn’t override training. It supplements it. A brand with weak entity weight in the training corpus won’t suddenly dominate recommendations just because it published a new blog post that got indexed. The retrieval layer adds fresh context, but the parametric layer provides the foundational understanding of which brands are relevant and trustworthy.

Position within the recommendation matters

When an LLM lists multiple brands in a recommendation, the order isn’t random. Research on position bias in LLMs shows that the first brand mentioned in a recommendation receives disproportionate attention from users. This is the primacy effect applied to AI-generated content: users tend to focus on and trust the first option presented.

The position a brand occupies in a recommendation is influenced by the combined strength of all five signals above. Brands with the strongest entity weight, most relevant co-occurrence patterns, most positive sentiment, deepest citation authority, and freshest retrieval presence tend to appear first.

For marketers, this means that “being mentioned” and “being recommended first” are different goals with different levels of impact. Tracking not just presence but position across AI models gives a much more accurate picture of competitive standing.

Why some brands are invisible

The flip side of understanding what drives recommendations is understanding why some brands don’t appear at all.

The most common reason is insufficient entity density. The brand simply hasn’t built enough presence in the types of sources that LLMs learn from. This is different from having low web traffic or poor SEO rankings. A brand can have a well-optimized website and strong Google rankings while being nearly absent from the broader ecosystem of publications, forums, documentation hubs, and review platforms that feed into LLM training data.

Another common pattern is entity fragmentation. The brand is described inconsistently across different sources: different names, different product descriptions, different positioning statements. The model can’t build a coherent entity representation from contradictory signals, so it defaults to brands with clearer, more consistent profiles.

A third scenario is category crowding. In competitive categories, the model has limited “recommendation slots” and the brands with the strongest combined signals occupy them. A newer or smaller brand with modest entity weight simply gets crowded out, even if its product is competitive.

What actually moves the needle

Given how these signals work, some common marketing activities matter more for AI visibility than others.

High impact: Earning coverage in publications that LLMs treat as authoritative. Industry reports, analyst coverage, expert roundups on established platforms, technical documentation referenced by developers. These build entity authority at the source level the model trusts most.

High impact: Consistent entity definition across all touchable surfaces. Your brand name, product category, key use cases, and differentiators should be described in the same way across your website, documentation, directory listings, review profiles, and press coverage. Consistency strengthens co-occurrence patterns.

Medium impact: Owned content that demonstrates genuine expertise. Blog posts, guides, and research that get cited by other sources create a secondary authority signal. The content itself may not directly influence the model, but the citations it generates do.

Lower impact than expected: Social media activity. While social platforms are part of the broader web, their content tends to be ephemeral and less weighted in LLM training data compared to structured, long-form content on established domains.

Lower impact than expected: Paid advertising. LLMs don’t process ads the way search engines do. Advertising spend doesn’t directly influence entity weight in the model’s training data, though the brand awareness it creates may indirectly lead to more organic mentions.

Measuring your current position

At Rankry, we approach this measurement through what we call the semantic core: 100+ carefully constructed prompts that cover the full range of ways users ask about your category. Running these across five major LLM models reveals not just whether you’re mentioned, but where you rank in recommendation order, which competitors appear alongside you, and how your positioning varies across models and query types.

The value of this approach is that it separates signal from noise. A single query to one model tells you very little, because LLM outputs are probabilistic (for a technical explanation of why, see Daily vs. Weekly: How Often Should You Track Brand Visibility in LLMs). A large sample across models and prompt variations reveals durable patterns: where your brand is strong, where it’s weak, and where competitors are gaining ground.

Understanding how LLMs select brands for recommendation isn’t academic. It’s the operational foundation for any strategy aimed at improving your presence in AI-generated answers. The signals are identifiable, the measurement is possible, and the brands that act on this understanding now are building advantages that compound over time.