Securing LLMs Starts with Securing the Information Behind Them

Blog: OpenText Blogs

Artificial Intelligence is rapidly transforming how organizations operate, and Large Language Models (LLMs) are at the center of this change. Every day, IT leaders and AI teams are looking for ways to connect LLMs to enterprise information to improve productivity, automate processes, and accelerate decision-making.

Yet, one critical challenge is often overlooked: the security and quality of the information feeding these models.

An LLM is only as valuable as the information it can access. While many organizations focus on selecting the right model, far fewer are paying enough attention to the information that powers it and this, is a critical error.

The Hidden Risk in Enterprise AI

As organizations connect LLMs to document repositories, collaboration platforms, and business systems, they gain access to valuable knowledge. However, they also introduce new risks. Sensitive customer data (Social Insurance, Health Records, Home Addresses, etc.), intellectual property, financial information (credit card numbers, CVVs) , and regulated records can easily become exposed if the proper safeguards are not in place. At the same time, incomplete or inaccessible content can reduce the accuracy and reliability of AI-generated responses.

In my experience, securing AI is not just about protecting the model. It's about understanding what information exists, where it resides, and whether it should be accessible to an LLM in the first place. Wouldn't you agree?

Most enterprise data environments contain vast amounts of unstructured information spread across thousands of file types, many of which were never designed for AI consumption. So, before organizations can effectively govern AI, they need visibility into their information, and this is where content extraction becomes essential.

Solutions such as OpenText File Content Extraction help organizations unlock information from more than 2,200 file formats, including metadata, embedded content, archived files, and legacy documents. By making this information accessible and searchable, organizations can provide LLMs with richer, more complete knowledge while maintaining total control over what is exposed to the end user.

Understanding What Needs Protection

Discovering information is only the first step. Organizations must also identify sensitive content before it reaches AI systems. Traditional security tools often focus on file locations and permissions, but AI requires a deeper understanding of the content itself. Customer records, employee information, financial data, healthcare information, and intellectual property may all exist within documents that appear harmless on the surface, creating potential risk to enterprises and users.

This is where technologies such as OpenText Named Entity Recognition play an important role, by identifying and classifying sensitive entities within structured and unstructured content. By leveraging this solution, organizations can better determine what information should be available to AI models and what should remain protected and never ever be shared outside of the organization or as a result of AI searches.

One of the biggest lessons I have learned from AI projects and also, as a user of AI tools, is that successful AI initiatives depend on trusted information, not just powerful models, which is "only the technical part of the challenge". Organizations need a framework that enables them to discover information, extract and enrich content, identify sensitive data, and apply governance policies before information is exposed to AI systems. When these practices are in place, organizations gain more than security. They also improve the quality of AI outcomes by providing models with better, more relevant information. The result is more accurate responses, stronger compliance, reduced risk, and greater confidence in AI-generated insights.

The Future of AI Depends on Trusted Information

As AI adoption accelerates, the conversation is shifting from which model is best to whether the information behind it can be trusted.

The organizations that succeed will be those that build a secure information foundation—one that allows AI to access valuable business knowledge while protecting sensitive data.

The future of enterprise AI is not simply about smarter models. It is about ensuring the information that powers them is secure, governed, and trusted from the start.

If you wish to find out more about OpenText File Content Extraction, OpenText Named Entity Recognition and other software solutions you can embed into your application, you may want to have a look into our OEM Solutions.

The post Securing LLMs Starts with Securing the Information Behind Them appeared first on OpenText Blogs.