April 30, 2007
By Jerome Pesenti Chief Scientist and Co-founder, Vivisimo
Article

Restricted Access: Is Your Enterprise Search Application Secure?

A search application should be able to implement security using this method, but in practice this method can be inefficient and the existence of aggregative object-like groups make it unnecessary. Imagine an organization with hundreds of thousands of employees in which most documents are readable by most employees. Hundreds of thousands of tokens would need to be added to each document! In the Vivísimo Velocity Search Platform, the cost of a security token is roughly the cost of an additional word in the document. Each document would therefore see its size augmented accordingly. So while it is feasible, it is obviously not very efficient.

A more efficient approach is to leverage aggregative objects like groups, which are often part of the underlying security framework. Groups allow the decomposition of the access control matrix into two separate matrices—access control lists (ACLs) and security groups.

The ACLs are much sparser than the access control matrix. This is guaranteed by the fact that the information they contain has been entered by humans—often the owner of the content deciding which group can access it. The sparsity of ACLs makes indexing them less costly. Searching is now a two-step process. First, given a user-name, the search application will collect all its associated groups:

”John Doe”
INTRANET/johndoe
INTRANET/marketing
PUBLIC/sites etc.

Then the search application will use these groups to restrict the user search:

[User Query]
AND groups:
[INTRANET/johndoe OR
INTRANET/marketing OR
PUBLIC/sites etc]

The first step can be done in many different ways. Following the Vivísimo philosophy of using existing services whenever possible, this information should be collected at query time when practical. This is possible when using a standard LDAP-based directory service; its light weight usually ensures a good response time. If the directory service is too slow, some automated caching mechanism should be put in place. When no directory service exists, the information in the group matrix can be mirrored in a search index. The first step would then be to look up the username in the index.
There are two drawbacks to this approach. The first drawback is related to the mirroring of security information. Whenever information is duplicated, it needs to be synchronized and synchronization always involves discrepancies. In this particular case, the security information is never modified at the search engine level, so the discrepancies will only be related to the synchronization lag, i.e., the time between a security setting change and the time the security tokens are updated in the index.

When the access control matrix can be decomposed in ACLs and group matrix, the group matrix is often, time-wise, the most critical component. Indeed, if a user (e.g., an employee) leaves the organization or changes positions, it is critical to reflect this change immediately. An implementation relying on a directory service at query time will allow this.

The ACLs, on the other hand, usually reflect much more granular security information which is not updated very frequently. The most critical case occurs when confidential documents have been unexpectedly exposed or even indexed. The use of a blacklist, which should be reflected immediately in the search results, offers a solution to this problem although it is certainly imperfect from a functionality standpoint.

The second drawback is that the search application must have intricate knowledge of the underlying authorization process. It needs to be able to reconstruct the access control matrix by understanding how the underlying system represents it. Different information storage systems have very different representations of this matrix, and the connectors between the search application and storage systems will need to reproduce it correctly.

No Silver Bullet

Organizations have taken a variety of approaches to search engine security, from simple identification and authentication to ACL indexing. Other options include query-time result-by-result authorization and federated search. Each approach has benefits and drawbacks that make it the right or wrong approach for any given organizations. Often a mixed approach that combines all of the above is the best answer.

The bottom line is that search engine security is one of the most challenging problems for enterprises and this sheer complexity makes it critical not only to do it right, but also often requires a custom approach for each organization. There is no silver bullet; therefore search applications need to offer a variety of security tools that are easily configurable, while at the same time be able to handle the most complex and demanding security requirements. Each organization must understand its needs, its IT infrastructure and its security goals in order to find the solution that fits the best.

In general, a good search engine application will mirror a pre-existing security framework and rely as much as possible on external processes to conduct authentication and authorization. This will decrease the complexity of the implementation, the need for synchronization and consequently, the risk of errors. Furthermore, it needs to be flexible, but as out-of-the-box as possible in order to avoid too much configuration complexity which is a sure recipe for introducing errors that have potentially dire security consequences.

Next Steps

So how should organizations proceed when considering deploying a search engine? Here are a few simple steps to handle the security conundrum:

1. Collect all necessary information beforehand to define the specifications as exhaustively as possible. Design choices need to be made early on. Determine the different dimensions of the search security requirements: username and security group domains, ACL granularity, freshness requirements, etc;

2. Create a test environment that is as realistic as possible and able to demonstrate the feasibility of all the security aspects identified in the specifications;

3. Do not buy a product on paper; test it in a real environment to see if it can handle all of your security requirements. Pay great attention to the ease of configuration, the likelihood of committing configuration errors and the expertise of the support staff;

4. Test before you launch. A search engine is a great tool to expose security holes in your pre-existing framework. Before deploying your search application organization-wide, employ a small set of trusted users to try to identify security problems and resolve them before the launch; and

5. Monitor the use and the performance of the application and modify its configuration accordingly.

Special Advertising Section

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Register Now to SAVE BIG & Join Us for KMWorld 2025, November 17-20, in Washington, DC.

Restricted Access: Is Your Enterprise Search Application Secure?

Special Report- Shadow AI: Managing the Unseen Copyright Risks in Your Organization

Supercharging Your Customer Experience Program With AI and Automation

Special Report- The Role Metadata Plays in the Information Lifecycle

More

The Rise of GenAI Agents and AI-Powered Search

Explainability and Interpretability: Building Trustworthy AI Models

The KM ROI Challenge: Measuring the Impact of Your Investment

Unlocking the Power of Intelligent Document Automation

More Webinars