A brief introduction to how text is indexed and searched.

Searching in two steps

  1. Relevancy Ranking: a pattern recognition model that runs across the entire dataset.
  2. Coverage: a rules-based search applied to the top* results from step 1, scoring based on lexical features, sorting, and boosting.

Relevancy Ranking can be used on its own, but is typically paired with Coverage for more precise control of the search result and truncation. *CoverageDepth can be configured to deep search the entire dataset. Default value is 500.

:pixl-clr-pattern-recognition: Unique Relevancy Ranking model

When indexing, a relevancy model is generated by extracting features from all documents.

The features contains essential parameters like frequency and rarity which are mapped as vectors into a multi-dimensional hypersphere.

When searching, the features of the search query are mapped into this space, and the search results are arranged in the order of the applied distance measure.

Indx performs a search on entire sentences, not just the underlying keywords. This means better matching across words, for incomplete names, and spelling variations.

:pixl-clr-coverage: Coverage function

A function that detects exact matching on tokens in the query, and lifts them up to the top of the result list. Coverage is a collection of algorithms that works together, and this complements the core **algorithm that always runs first.

Coverage detects: