About the implementation of Elasticsearch in Koha

Formulaire de recherche Koha

It’s now been several weeks since our first customers went live with the Elasticsearch indexing engine in Koha. We opened the upgrade to four volunteers. A big thanks to the users who participated for their activity and their patience and to those who contribute in one way or another to the community.

I therefore propose an inventory of its implementation to date.
The deployed 18.11.x is enriched with some patches necessary for daily use without major problem.

The activity on bugzilla is regular and we participate when we can. To give an idea, we spent about a hundred days on Elasticsearch / Koha including 24 days that were funded by BULAC to support them in the evolutions and community submissions. They were the first french people to take the plunge long before us. We took time to train a team on Elasticsearch with Jolicode. This training was a significant boost in learning and discovering the scope of the unique possibilities of this research solution. We also made an initial analysis of the choices made in Koha that can be improved in the future.

Elasticsearch has become the lead search engine. His reputation and popularity are important. It’s free software and its community is strong. The documentation is of high quality. The core of Elasticsearch is of course research but it is used for other purposes. Its architecture and its implementation are real assets. Starting with this product is fast, however it is necessary to spend time to have a quality implementation. Its use is in real time: the indexed ressources can be searched immediately. Once the configuration is fine-tuned, this search engine is very efficient and offers outstanding results relevance. But the most important thing, I want to say, is that now it is implemented and usable with Koha!

The first experimentation allowed us to:
– deploy a technical infrastructure that can accommodate the next bases
– correct the main operating irregularities
– automate what can be automated so you can spend more time on what matters

“It was better before”
– In the bibliographic catalog, research on rejected forms does not work (yet)
– The syntax in Koha is too sensitive (this is due to the “query string” use, it is a bad implementation, we are working to go to a “simple query string” and review the implementations of the requests)

“It’s better now”
– Configuration interface for indexed fields and mappings with Unimarc data
– Super search engine that no longer needs to prove its qualities, used among the biggest as wikipedia, ebay, facebook, github etc.
– Very rich ecosystem: other software based on Elasticsearch to offer the best possible user experience (kibana, cerebro etc.)
– Efficient calculation of relevance (with Zebra, only the number of occurrences counts, with Elasticsearch, it goes much further https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory. html)
– Weighting of the results according to weight given to the desired fields
– Text analysis algorithms in different languages ​​and managed by the communities (identifies and groups for example the terms with the same roots -stemming)
– Management for the system administrator: performance, storage, backups (clustering)
– Technical tools for work and analysis (REST API, Kibana)

“It will be better soon”
– Although Elasticsearch is an excellent tool, its implementation in Koha still requires some adjustments
– Exact search for authorities
– Reconstruction of the index without loss of search service (it is possible technically, still to implement it with Elasticsearch aliases in particular)
– Potential improvements still to be written:
* highlight display
* management of text analysis configuration
* search in rejected forms
* joins
* facet configurations
* suggestions
* associated manifestations
* enjoy the power of the engine without worrying about backwards compatibility with Zebra

“It’s important to know”
– in the current version, it is quite possible to return to Zebra in a few clicks since the two solutions are compatible (which further complicates developments).
– From a system administration point of view: Zebra is hosted on each Koha machine. Elasticsearch, in turn, centralizes all the indexes in a single cluster to allow better management and ability to manage failures.
– Implement an Elasticsearch cluster and use it requires a dedicated infrastructure and the acquisition of new skills (this is good news because they can be used for other purposes)
– The today implementation can be improved. It is integrated in the heart of Koha and is enriched to provide the user with a better search experience.In conclusion: we will continue the progressive deployment of Elasticsearch at our customers. If you want to start alone, we suggest you to train before so you can understand and implement the solution in a sustainable way. And if you want us to support you, do not hesitate to contact us !

Share

Leave a Reply

Your email address will not be published. Required fields are marked *