At MadCap Software, we recognize the incredible importance of a user’s ability to search for and find the content they need in today’s fast-paced business environment. As the speed of change accelerates, organizations must deliver new and current content across the enterprise at an unprecedented pace. In their moments of need, people rely on content search to get the most accurate and up-to-date content for their need. However, when content search fails to deliver quality results, it can lead to wasted time, errors, rework, safety issues, and ultimately dissatisfaction among users and employees. That’s why at MadCap Software we’ve recently introduced AI Search, a search experience that harnesses the power of semantic content analysis to revolutionize content discovery and serve as a stable foundation for many other AI solutions.

But what exactly does “semantic analysis” mean, and how did we decide that semantic search was a valuable-enough AI technology to implement for our enterprise customers?

Why Semantic Search Matters

When we set out to develop AI Search, we knew that we wanted to do more than just improve the user experience of the MadCap Software search functionality or only marginally increase the quality search results for end uses. We wanted to do everything we could to ensure that users are able to find the content that is most relevant to their query, even if their queries don’t have exact keyword matches with the most relevant content. To do this, we needed to provide a centralized database of the semantic value of each piece of content in our customers’ content inventories – not only for use by the AI Search feature itself, but also the foundation for powering other AI technologies and solutions yet to be implemented by either MadCap Software or our customers. This semantic value of each piece of content is called an “embedding”. 

What are Semantic Embeddings? 

At its core, semantic search is all about understanding the meaning behind words, both in the content and in the user’s search query. In traditional keyword-based searches, algorithms look at individual terms to determine relevance. But what if we could capture the essence of each piece of content – its underlying themes, concepts, and relationships? 

That’s where semantic embeddings come in. Embeddings are mathematical representations of a piece of content that distill its essential characteristics into a compact vector. Think of them as having a numerical summary or representation of the meaning behind a piece of content. The language of the content is also not as important, since the content will mean the same thing regardless of the language in which it’s written. This allows us to compare and contrast different pieces of content based on their shared meanings. As you might be able to imagine, once we have a numerical representation of a piece of content, those numbers are much easier for a computer to analyze and compare as opposed to large blobs of text, meaning that we are able to get higher-quality search results overall. 

We can engage in a thought experiment and think of these embeddings as the content’s coordinates on a map (albeit, a complicated, multi-dimensional map). When each piece of content is given coordinates on a map, it means that you can look at the map at a bird’s-eye view and really get an understanding of what your content inventory looks like: what subjects of content you have and do not have, what content you need, what a piece of content’s neighborhood looks like, etc. 

Illustration Of Ai Search For Bike Safety

Each piece of content is given a set of numbers, called ”Embeddings”, that serve as a summary of the content’s semantic meaning and can work a bit like map coordinates.

How Semantic Search Differs from Conventional Keyword-Based Systems 

Conventional search systems generally rely on keyword matching between the user’s search query and the target content. These systems extract specific terms from both the content inventory then use these keywords to create an index that can be searched against (the “search index”). When a user inputs a search query, keywords are extracted and matched to the target content via the search index. 

How Conventional Search Systems Work: 

  1. Search Index creation: The system creates a massive database of indexed terms (e.g., words, phrases) extracted from the content inventory, which ranks and maps different indexed terms to different pieces of content. 
  2. Query processing: When a user submits a search query, the algorithm extracts relevant keywords and uses these to search the index for matching results. 
  3. Ranking and retrieval: The top matches are then ranked based on factors like relevance, popularity, or recency. 

While conventional search systems can deliver high-quality results in many cases, they have limitations: 

Pros: 

  • Efficiency: Simple and fast to implement, often with low computational costs. 
  • Transparency: Easy to explain how results are generated and reproduce results (based on keyword matches or other explicit rules). 
  • Robustness: Works reliably for well-defined or structured queries. 
  • Lower Entry Barrier: Requires little training and infrastructure compared to semantic search systems. 

Cons: 

  • Keyword Dependence: May fail to capture nuances or intent in both content and search queries, returning irrelevant results. 
  • Lack of Adaptability: Limited ability to learn from user interactions or adapt to new contexts. 
  • Rigid Querying: Less effective for natural language or imprecise queries. 

How Semantic Search Systems Work: 

Semantic search systems take a different approach. They extract more abstract representations of the content inventory (semantic embeddings) and user queries, then use these to create a vector store. 

Here’s how it works: 

  1. Vector creation: The system analyzes the semantic meaning behind each piece of content in the content inventory and then generates and stores semantic vectors in the vector store (this is like a conventional search system’s “search index”). 
  2. Query processing: When a user submits a query, the system creates a corresponding semantic vector based on the query terms. 
  3. Matching and ranking: The system then compares the query vector to those in the vector store, returning results that are most semantically similar. 

Due to the fact that they are comparing numbers instead of blobs of text, semantic search systems are able to easily and automatically handle things like synonyms between the search query and content, as well as more nuanced content subjects. 

Pros:

  • Understanding Intent: Captures the meaning behind queries rather than matching exact keywords, improving relevance. 
  • Synonym Recognition: Recognizes synonyms, contextual nuances, and relationships between concepts, offering more flexible results. 
  • Natural Language Queries: Handles conversational or complex queries effectively. 
  • Dynamic Ranking: Can adapt based on patterns in user behavior for personalized or context-aware results. 

Cons:

  • Computational Cost: Requires significant processing power and can be more resource-intensive than keyword-based search. 
  • Explainability: Results can seem opaque, making it harder to diagnose issues, reproduce results, or explain choices to end-users. 
  • Barrier to Entry: Requires significant training and infrastructure compared to conventional search systems. 
Illustration Of Location Symbols On A Map Showing Search Query Answers

The embeddings for each piece of content and the user’s search query can be imagined as coordinates on a map, allowing for the system to find the ”nearest” pieces of content to the query.

If we return to the example of content embeddings working like coordinates on a map, and if you consider that the user’s search query is given the same type of coordinates when it is processed, well then you can think of a search query like asking the maps application on your phone to “show me content near me”. The search system pinpoints the query on the map then looks in the vicinity to find the most closely-located content.

The Power of Semantic Analysis 

So why is semantic analysis so valuable and why did MadCap Software choose to implement it? As stated previously, this analysis is useful to more than just semantic search functionality, it has the ability to power other innovative AI technologies and solutions. For example: 

  • Semantic duplication detection: By analyzing the semantic embeddings, we can identify duplicate or near-duplicate content across different sources. 
  • Content recommendation engines: We can combine these embeddings with information about user preference, performance, or behavior to suggest highly-relevant content at the right moment of need. 
  • RAG chatbots: These semantic embeddings can be leveraged by “retrievers” in RAG systems allowing for LLM-powered chatbots to access the context of your proprietary content library when it is generating answers, greatly improving the accuracy of GenAI chatbots when dealing with organization-specific content. 

Leveraging MadCap Software’s Semantic Data 

MadCap Software is doing the work of analyzing content inventories to capture the semantic meaning of the content within and we want to make sure our customers can leverage that data in powerful ways. First and foremost, MadCap Software’s AI Search functionality empowers organizations to keep up with that blistering speed of change, allowing them to keep their employees well-informed, well-trained, and satisfied. But beyond our AI Search functionality, we are also working on tools and APIs that will allow MadCap Software customers to leverage this semantic data in their own enterprise AI solutions – whether it’s for a RAG chatbot, a content recommendation engine, or something we never imagined! As always, MadCap Software remains committed to facilitating customer innovation however we can!