By Raghavendran Pedapati
Apache Solr is a web application and is built around Lucene. Lucene has a powerful search library to provide full-text indexing . The significant aspect of Lucene search is its inverted index, meaning keyword-centric data structure i.e. word -> Pages rather than page -> words.
Solr not only takes advantage of all good features like inverted index of search , spellchecking, hit highlighting and advanced analysis/tokenization capabilities in Lucene ; but empowers itself as one of the powerful search application with SolrAPI . One of the advanced features of Solr is faceting , i.e. arranging search results in the form of columns and numerical counts of the key terms.
Thus Solr is the paradise of programmers to develop sophisticated and efficient search applications as it provides easier scaling and distribution.
The DataImportHandler (DIH) is a mechanism for importing structured data from a data store into Solr. It is often used with relational databases, but can also handle XML with its XPath Entity Processor. We can pass incoming XML to an XSL, as well as parse and transform the XML with built-in DIH transformers. We could translate our arbitrary XML to Solr’s standard input XML format via XSL, or map/transform the arbitrary XML to the Solr schema fields right there in the DIH config file, or a combination of both. DIH is flexible.
I will discuss here how to deploy Solr DIH for search XML files. (more…)