Blog Posts Business Management

Moving beyond keyword search with cognitive automation

Blog: Capgemini CTO Blog

In light of the rapid adoption of cognitive automation, intelligent information retrieval will be a key requirement of many future enterprise applications. Various aspects of insight engines can effectively cater to these requirements to enhance the functionality of a search application. Internally, the insight engine leverages a set of cognitive tech enablers to address some of the current challenges faced by today’s search applications.

One such challenge is how to handle false negatives. Quite often, the search application returns only a fraction of relevant items. When searching for a data science case study, for example, only the case studies containing the phrase “data science” are returned. Ideally, we should also be able to retrieve case studies containing related phrases, such as “natural language processing” or “machine learning.” The challenge is that finding a related set of phrases for a specific domain requires a significant exertion of time and effort by a subject matter expert. In some cases, building up this kind of repository manually may be practically impossible.

Is there a way to auto-retrieve related words or phrases that may not necessarily be available in a standard language dictionary but are specific to a domain? Is it possible to enhance search results by matching underlying topics rather than keywords?

In keyword-based search, every keyword is treated as a discrete and independent entity and any potential relationship between sets of individual words is not captured. In other words, the context in which the keyword is used cannot be captured. While this representation retrieves the initial set of relevant results, there is a high chance of missing many other relevant results. Moreover, this representation is rarely scalable because it requires more data for effective and efficient processing.

The current solution aims to address this challenge by moving beyond keyword searches to mine the underlying topic and use it for retrieving a larger set of relevant results. The solution uses a set of unsupervised machine-learning algorithms to mine the underlying topic associated with a keyword in a search string by looking at neighboring words. It then retrieves a word cloud associated with that topic. This extended set of words can be used to enhance the search query to return a larger set of relevant results. While the underlying conceptual model of this solution can be implemented in any technology that supports such functionalities, the current solution proposes a set of open-source tech enablers that can be combined to realize a cost-effective solution.

The primary focus of the current solution is to use topic-based search to improve the sensitivity of a search application. On similar lines, a wide range of cognitive tech enablers exist around insight engines. These enablers span multiple phases of a search-based solution—be it defining, designing, indexing, or querying. Depending on the context and the current problem at hand, a set of suitable enablers can be picked up and used to enhance the existing search-based solutions.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples