Apache lucene indexing example
The input console consists of options for student name, title of the article, category of the article and the body of article. In this section we will create an index of documents using Lucene indexing.Ĭonsider a project where students are submitting their yearly magazine articles. So far we have seen all the components of Lucene indexing. It is similar with RDBMS as it needs to have a fast lookup for keys, but the bulk of the data resides on a secondary storage. This is run over all of your documents, in a similar manner to a view’s map function, and defines the fields that your search can query. Search indexes are defined by a javascript function. A freshly-merged segment thus has no gaps in its numbering. Deleted documents are dropped when segments are merged. These are eventually removed as the index evolves through merging. When documents are deleted, gaps are created in the numbering. Document three from the second segment would have an external value of eight. For example two five-document segments might be combined, so that the first segment has a base value of zero, and the second of five. To convert an external value back to a segment-specific value, the segment is identified by the range that the external value is in, and the segment’s base value is subtracted. To convert a document number from a segment to an external value, the segment’s base document number is added. The standard technique is to allocate each segment a range of values, based on the range of numbers used in that segment. The numbers stored in each segment are unique only within the segment, and must be converted before they can be used in a larger context. In particular, numbers may change in the following situations: Note that a document’s number may change, so caution should be taken when storing these numbers outside of Lucene. The first document added to an index is numbered zero, and each subsequent document added gets a number one greater than the previous. Internally, Lucene refers to documents by an integer document number.
The index stores statistics about terms in order to make term-based search more efficient. An index contains a collection of documents.
The fundamental concepts are index, document, field and term. Now lets take a look at the overall Lucene searching process.