April 30, 2007
By Jerome Pesenti Chief Scientist and Co-founder, Vivisimo
Article

Restricted Access: Is Your Enterprise Search Application Secure?

As search capabilities move out of applications and across the organization, the potential for security breaches increases greatly. Yet, despite the buzz around search these days, security is one aspect that few have paid much attention to.

When implemented carelessly, search engines have the potential to uncover flaws in existing security frameworks and can expose either restricted content itself or verify the existence of hidden information to unauthorized users. Either scenario can have dire consequences.

Before introducing an enterprise search solution, organizations must carefully document their requirements, understand how their search solution handles security, test extensively before deployment and monitor performance after the rollout.

Search Engine Security Defined

Search engine security is a form of access control restricted to the context of a search application and is primarily about ensuring that in the course of using a search application, users can only access information they are permitted to see. Search engine security is only concerned with the information accessed through the search application and not with information delivered by other means. For example, once a user clicks on a link pointing to an independent resource, it is no longer in the domain of the search application. Further, search engine security is not about hiding links—if links to sensitive content exist, that is a problem that needs to be addressed at the source, not by hiding them in the search engine.

In addition, unlike security for the systems storing information, search engine security is only concerned with read access. Access control in general is critical for obvious reasons—confidentiality, privacy, competitive intelligence, etc. On the one hand, compared to other types of computer security, access control violations are often limited and do not have the domino effect that result from most computer security breaches. On the other hand, search engine security is often seen as the most critical of the access control issues because a search engine can expedite the process of revealing security holes.

Imagine a system with millions of files, some of which have unprotected, confidential or critical information. Without a search engine, a malicious user would have to scour all of these files one by one. With a search engine a user can just type well chosen keywords targeting critical and/or confidential information. Here is where read access comes into play. Read access involves not only being able to read the content, but also knowing that it exists. Consider, for example, the employee who searches his/her intranet using keywords targeting critical and/or confidential information. Examples include:

List of passwords;
Social security numbers;
Salary of John Doe;
Employee reviews; and
Company secrets.

Even allowing a user to know that a file with certain content exists can be a serious security breach. Imagine searches such as: “X should be fired,” “merger with Y” or “June sales are low.” The simple title of these results could reveal extremely sensitive information.

Components of Security

A good search engine application will generally mirror a pre-existing security framework. Users, security groups and access control lists, are most likely to come from external systems (Active Directory, Lotus Notes, Documentum, etc.) rather than being defined within the search engine itself. This notion gives us some clue as to what a proper security design should look like. Vivísimo’s philosophy is that the search engine should rely as much as possible on external processes to conduct authentication and authorization. This will decrease the complexity of the implementation, the need for synchronization and consequently, the risk of errors.

Identification and authentication
Identification and authentication are not particular to the context of a search engine, except that they can be completely externalized given that search engine applications are usually not carrying their own security framework. Reusing an existing authentication framework can be done in two different ways:

The search application is executed after an external authentication step—users can only access the search application after having been identified and authenticated through an external process often carried out by the web server itself or through a single sign on application. In this case, the search application just needs to grab the user identifier, often through an environmental variable. The search application has no access or knowledge of the user credentials that were used during the authentication process; or
The search application collects the user credentials and tests them through an existing service, for example by binding to a directory service through LDAP.

The first approach is cleaner from a security standpoint as it avoids any manipulation of the security credentials. However, the second approach may be required in certain cases where the search application needs to access all of the user credentials to perform certain tasks.

Authorization
Search engine security is only concerned with the ability to read and list. This simplifies the authorization frame. The content that is searchable and viewable through the search application by a given user should be restricted by an access control matrix. As we mentioned earlier, it is critical that search engine security not be limited to opening results. Simply allowing a user to know that a search result exists can have damaging consequences. The authorization process needs therefore to be strongly interleaved with the search process. Implementation of an access control matrix can be done in two ways. One is to attach some access information to the content indexed. The second is to check authorization for each result at query time.

ACL indexing
The most straightforward way to enforce the access control matrix is to index it with the documents. The search application will add a field to each document listing the set of users allowed to see it. Once a user does a search, this search is automatically rewritten to integrate the security restriction:
[User Query] AND
[users: “John Doe”]

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Restricted Access: Is Your Enterprise Search Application Secure?

How Knowledge Graphs Make Generative AI Consumable in Enterprise Environments

Building a KM Foundation for Enterprise AI

TRANSFORMING ENTERPRISE KNOWLEDGE: THE JOURNEY TO SAFE, SECURE, AND TRUSTWORTHY AI

More

Intelligent Content Management: Game-Changing Technologies and Strategies

Optimizing LLMs with RAG: Key Technologies and Best Practices

What's Ahead in Search: AI, NLP, Knowledge Graphs, and More

Rethinking KM for Agility, Efficiency, and Innovation

More Webinars