indexing

Xapian Search for Drupal

Screenshot

Here at Trellon, clients come to us all the time to looking for solutions for making knowledge more accessible through their web sites. Given that search features are a primary tool for exposing data and that the performance of Drupal's search engine is less than optimal in certain situations, we developed a module that replaces Drupal's native search features with the Xapian search engine. And here's why we did it.

Reason

A common challenge for Drupal sites is working with documents in different formats and getting them into the search engine. Drupal does not natively index PDFs and Word documents, despite the fact they are the most commonly exchanged text formats on the Internet (outside of HTML). This presents problems for sites where content is driven by document uploads, and has lead to some sub-optimal solutions and messy UI workflow patterns in the past.

Another challenge is performance. While Drupal scales well in terms of serving pages through the use of memcache, CDNs and op code caching, search functionality is another matter entirely. The organic nature of searches often defies attempts to bring this important subsystem to scale in high traffic environments. Alternative search engines, like Google's Site Search and Yahoo's web services, often need to be leveraged in order to handle the demands that come along with high utilization.

To attempt to solve these problems, Trellon spent some time working with at alternative search engine solutions for Drupal, and one product we came across that can get the job done is Xapain. We have just released a module that replaces Drupal's native search with Xapian, and it can be downloaded from it's Drupal project page