The Practical Path to AI-Powered Search

Why Search Is Broken (and How AI Fixes It)

Most website search is frustrating. Users search for "how to cancel my subscription" and get articles titled "Subscription Plans." They search for "the thing that holds the door open" and get nothing. The gap between what users mean and what search systems understand is one of the oldest unsolved problems in information retrieval.

AI is closing this gap — not through magic, but through specific, well-understood techniques that can be added to existing search infrastructure. This article walks through the practical path from keyword search to AI-enhanced search, explaining what each improvement buys and what it costs.

Where Traditional Search Fails

BM25 (and its predecessors) works by matching query terms to document terms. It's good at exact matching and surprisingly robust to noise. It fails at:

Vocabulary mismatch: User says "cancel subscription," document says "terminate membership"
Conceptual queries: "What are the risks of this medication?" won't match a document that says "contraindications and side effects"
Implicit intent: "I want to build a chatbot" — the user needs resources, not a definition of chatbots
Natural language: Questions and conversational queries perform worse than keyword-style queries

The Three AI Enhancements That Matter

Modern AI-enhanced search typically involves three techniques, each addressing a different part of the failure taxonomy:

1. Query Expansion

Query expansion uses an LLM to broaden the query before it hits the search index. Given "cancel subscription," the LLM generates: "cancel membership, terminate account, stop payments, unsubscribe, deactivate account, end renewal." All of these expanded terms get sent to BM25, dramatically improving recall.

The key insight: you don't need vector search to get semantic search benefits. LLM-powered query expansion improves the recall of your existing BM25 index by translating user language into document language. The cost is one LLM API call per query.

2. AI Overviews

Once you have search results, an LLM can synthesize them into a direct answer to the user's question — an "AI overview" similar to what Google and Bing now show. The LLM reads the top N results, extracts relevant information, and writes a paragraph-length response that cites specific documents.

Done well, AI overviews dramatically improve the search experience for informational queries. Users who would otherwise have to read 5 articles to find their answer get it in a sentence. The AI overview also signals which documents contain relevant information, helping users decide whether to dig deeper.

3. Semantic Reranking

BM25 retrieves documents that contain query terms; a cross-encoder reranker re-scores the top-K results based on semantic relevance. Cross-encoders read both the query and document together, enabling much more nuanced relevance judgments than BM25's term-matching.

The practical workflow: BM25 retrieves 100 candidates quickly; the cross-encoder reranks the top 20 for quality. This hybrid approach gets the speed of lexical retrieval with near-vector-search quality.

Scolta: These Three Techniques in Production

Scolta, developed by Tag1 Consulting, implements all three techniques for Drupal websites (and can be adapted for other CMS platforms). The architecture uses Pagefind for static site search — fast, client-side, no server required — plus the Anthropic Claude API for query expansion and AI overview generation.

What makes Scolta notable is what it doesn't require: no vector database, no embedding infrastructure, no GPU serving, no semantic search overhaul. It layers AI capabilities onto existing BM25 search, making it practical for organizations with existing content infrastructure. The full approach is detailed in the tag1.com/how-to/ "Practical Path to AI Search" series.

The Certificate in AI-Powered Search at Meridian AI uses Scolta as a primary case study throughout the curriculum, examining its architecture as an example of pragmatic AI system design that prioritizes production readiness over theoretical elegance.

When to Go Further: Vector Search

Query expansion + reranking handles the vast majority of search quality improvements for content-heavy websites. But some use cases genuinely benefit from dense retrieval (vector search): very large corpora with high vocabulary diversity, multimodal search (images + text), or applications requiring semantic clustering. For those cases, the Certificate program covers Pinecone, Weaviate, pgvector, and the full RAG stack.

The key decision criterion: if your search is working reasonably well and you need to make it better, start with AI query expansion. It costs one LLM API call per query and can double recall. If you've exhausted what query expansion can do, then consider the much heavier investment in vector infrastructure.