Moving beyond keyword search with cognitive automation
Blog: Capgemini CTO Blog
In light of the rapid adoption of cognitive automation, intelligent information retrieval will be a key requirement of many future enterprise applications. Various aspects of insight engines can effectively cater to these requirements to enhance the functionality of a search application. Internally, the insight engine leverages a set of cognitive tech enablers to address some of the current challenges faced by today’s search applications.
One such challenge is how to handle false negatives. Quite often, the search application returns only a fraction of relevant items. When searching for a data science case study, for example, only the case studies containing the phrase “data science” are returned. Ideally, we should also be able to retrieve case studies containing related phrases, such as “natural language processing” or “machine learning.” The challenge is that finding a related set of phrases for a specific domain requires a significant exertion of time and effort by a subject matter expert. In some cases, building up this kind of repository manually may be practically impossible.
Is there a way to auto-retrieve related words or phrases that may not necessarily be available in a standard language dictionary but are specific to a domain? Is it possible to enhance search results by matching underlying topics rather than keywords?
In keyword-based search, every keyword is treated as a discrete and independent entity and any potential relationship between sets of individual words is not captured. In other words, the context in which the keyword is used cannot be captured. While this representation retrieves the initial set of relevant results, there is a high chance of missing many other relevant results. Moreover, this representation is rarely scalable because it requires more data for effective and efficient processing.
The current solution aims to address this challenge by moving beyond keyword searches to mine the underlying topic and use it for retrieving a larger set of relevant results. The solution uses a set of unsupervised machine-learning algorithms to mine the underlying topic associated with a keyword in a search string by looking at neighboring words. It then retrieves a word cloud associated with that topic. This extended set of words can be used to enhance the search query to return a larger set of relevant results. While the underlying conceptual model of this solution can be implemented in any technology that supports such functionalities, the current solution proposes a set of open-source tech enablers that can be combined to realize a cost-effective solution.
The primary focus of the current solution is to use topic-based search to improve the sensitivity of a search application. On similar lines, a wide range of cognitive tech enablers exist around insight engines. These enablers span multiple phases of a search-based solution—be it defining, designing, indexing, or querying. Depending on the context and the current problem at hand, a set of suitable enablers can be picked up and used to enhance the existing search-based solutions.