Saturday, September 8, 2007

Challenges of an Enterprise Search implementation

Enterprise Search is, to quote Wikipedia, "the practice of identifying and enabling specific content across the enterprise to be indexed, searched, and displayed to authorized users". The goal is to give the users the "single search"-field while still search all kinds of content from all kind of data sources. The challenge is that "content" comes in many formats and from different kinds of data sources. It may for instance be:
- other web internal sites
- your own extranets & internet sites
- file shares
- customer records held in a CRM system
- business information in an internal database
- letters and reports in Document Management Systems.
- people/contact information in a telephony system

Another challenge is that the information often have various levels of security classification so that only authorized user should get hits on a search. Sounds faily simple, right. But that means firstly that the user making the search needs to identify himself, and secondly that all the systems needs to be able to identify that user correctly. Not an easy task when different base systems have their own user database with their own user identification solution. Across an enterprise, numerous user databases may be in use.

Information in base systems is not structured in a "search-friendly" format. A relational database is pretty useless as it is when it comes to extracting relevance-sorted search results from a free text search. You probably need to do some work to make these system available as a good data source in an enterprise search solution.

How do You create a search result when hits comes from many different data sources? In what order should the search hits be sorted? You need a way to understand the relevance weight of each search hit from each data source when assembly the total, final search result to be presented to the user.

Lastly, performance issues needs to be adressed. The users of an enterprise search expects response times of no more than a few seconds, they are spoiled with the Google performance of "0,047 seconds" ;-). The base systems participating in the search solutions are rarely prepared for this scenario.

In an upcoming post I'll describe an enterprise search solution atchitecture that I recently implemented for a large organisation adressing the challenges described above.

© Copyright 2007, Tomas Elfving

0 kommentarer: