📕

C# .NET API with linked class library v3.0

This is the API for usage with the self-hosted class library.

💡
indxSearchLibs does not have any third party dependencies except Microsoft .NET. Current version is 7.
💻
Typical use case

🚀 Create an instance of the system

⛓ Insert with an array of documents

🪄 Create Index

🟢 Check ready status

🔎 Search

⛓ Insert one or more documents without re-indexing

🗑 Delete one or more documents without re-indexing

🪄 (re-index if system status says it is required)

All of these functions can be run multiple times. For example to perform incremental loading.

All variations of this pattern can be run continuously over a long period of time to assure that the index and data is up to date.

API

🎛️
Interfaces
⌨️ IIndxSearchEngine

IIndxSearchEngine is the API for the Indx search engine. The constructor for the implementation of this class takes three parameters:

1) a configuration number, obtained from Indx and depending on the application 2) a prefix text to be added to log entries, if you choose to inject: 3) a log that implements the interface: microsoft.extensions.logging.abstractions. Most loggers such as Nlog and Log4Net implement this. For Linux users who typically log to the console, we can supplement such a logger.

All methods are thread-safe. All methods block except IndexAsync.

A search will always run on the thread and thus the logical CPU it is called from.

public interface IIndxSearchEngine : IDisposable
{
	SystemStatus Status { get; }

	bool Insert(Document[] documents);
	bool Insert(Document document);

	void IndexAsync();

	SearchResult Search(SearchQuery query);

	void Delete(long foreignKey);
	void DeleteAllDocuments();
}
Status gives a complete status of the system. See the SystemStatus class for functions.
Insert has two variants. One for arrays and one for single documents. Insert can be called multiple times to add documents.
IndexAsync vil run an asynchronous indexing. Call Status.IndexProgressPercent to see the progress from 0% til 100%. The bool Status.IsReadyToSearch will return true when the indexing is complete.
Search is the search function. See the class SearchResult to read the result of the search. See the SearchQuery class for a description of the arguments.
Delete deletes a single document with the given foreign key. If multiple documents has the same key, they will all be deleted.
DeleteAllDocuments deletes all documents.
⚖️ IMerger

The merger is used to make a weighted merge between the results of two or more IIndxSearch instances.

public enum MergeAlgorithm
{
	Regular,
	SumScore
}
public interface IMerger
{
	bool AddInstance(
		IIndxSearchEngine instance, 
		float weigthToApply, 
		KeyFilter includeFilter, 
		KeyFilter excludeFilter, 
		WordFilter wordIncludeFilter, 
		WordFilter wordExcludeFilter
	);
	SearchResult Search(
		string soughtText, 
		int noOfRecordsToReturn, 
		int timeOutMillisec, 
		string logPrefix, 
		bool removeDuplicates
	);
}
MergeAlgorithm is a function that enables weighted merging between different instances of IndxSearchEngine. A typical use case is to weight titles over body text.
AddInstance is a method to add search instances to the merger. The number is unlimited. The different instances can have different (or similar) filters. weightToApply must be in the range [0..1]
SearchResult Search Call for merged search. The result is delivered in the usual way. The Regular algorithm is the most used, and combined with the removal of duplicates, this results in the documents sorted by maximum score value. Sumscore can have applications, discussed with Indx. The arguments are otherwise as described in the SearchQuery class.
📡
Classes
⚙️ IndxSearchEngine (constructor)

To search in a set of documents the client needs an instance of the search system. To create an instance you will need a configuration number. The configuration decides what algorithms are being used, and allocations on size. The configuration can adjust latency vs memory usage. Indx is also able to create custom configurations for enterprise customers.

Advanced use cases may include the usage of the StringReplacer and TokenizerSetup classes.

public IndxSearchEngine(
	string logPrefix, 
	ILogger<IndxSearchEngine> logger, 
	int configurationNumber,
	bool? usesCaseSensitiveSearch=null,
	StringReplacer replacer=null,
	TokenizerSetup tokenizerSetup=null
)
ILogger implements microsoft.extensions.logging.abstractions. This supports most loggers like Nlog og Log4Net.
int configurationNumber is a configuration number. This will be supplied from Indx, based on use case. Config number 109 is used for most cases.
StringReplacer makes it possible to replace or remove strings from the search. Typically used for national characters, for example an ‘ø’ that should be replaced with ‘oe’.
TokenizerSetup can be used to index entire words or tokens in addition to the usual pattern recognition indexing.
📄 Document

A document is the entity being searched for. A document can describe everything from a product in an online store, to a page in The Collected Works of Shakespeare.

Each document contains the following fields:

  • The text to be searched for.
  • Client annotation, text field where the user can enter their own information, for example thumbnails, Uris and the like.
  • A segment number, can be used to describe a part of a body text. Examples could be line numbers on a page or page numbers in a book.
  • A foreign key or identifier in the form of a number. The field is optional, but is required for one of the filter types, or if the system is to support dynamic deletion and insertion.
💡
If update, delete or filter functions are required, a foreign key (DocumentKey) is needed. But note that the indexing function Indx is so fast that in most cases it is more efficient to run DeleteAll followed by an upload and indexing of updated documents.
🥷
Aliases can be added by using the same DocumentKey for multiple documents.
public class Document
{
	public Document(Document document);
	
	public Document(
		long documentKey, 
		int segmentNumber, 
		string documentTextToBeIndexed, 
		string documentClientInformation
	);
}
long DocumentKey is a 64-bit foreign key, needed to support update, delete and filter.
int SegmentNumber is used by the client to add extra info such as line numbers in a book. When searching in large "body texts"
string DocumentTextToBeIndexed is the text to be searched and indexed.
string DocumentClientInformation is for free use by the client, e.g. for thumbnail_ids, uris and other references. This is not indexed.
🔎 SearchQuery
Without filters
public class SearchQuery
{
	public SearchQuery(
		string soughtText, 
		Algorithm algorithm, 
		int maxNumberOfRecordsToReturn,
	  int timeOutLimitMilliseconds, 
		bool removeDuplicates, 
		string logPrefix
	);
}
string soughtText refers to the text (input) to be searched for.
Algorithm describes which search algorithm should be used. Indx's own ProprietaryRelevanyRanking is the one used for most applications. ProprietaryRelevancyRanking, ProprietaryCombined, UnlimitedFastLevenshtein, JaccardOfAllChars, JaccardOfCharSet, Jaro, JaroWinkler
int maxNumberOfRecordsToReturn defines the maximum number of documents to be returned.
int timeOutLimitMilliseconds sets waiting time in case of overloaded CPU. We recommend 1000 ms.
bool removeDuplicates removes all duplicates of documents with the same foreign key. Only the one with the best score value is returned in the results list. If foreign keys are not in use, this can be set to false.
string logPrefix to each search. If the instance of the search engine has had a logger injected into its constructor, every search will be logged. The log prefix is then first added to the log text, and can be used by the client to add statistical information, for example for use in dashboards.
With filters
public class SearchQuery
{
	public SearchQuery(
		string soughtText, 
		Algorithm algorithm, 
		int maxNumberOfRecordsToReturn,
	  int timeOutLimitMilliseconds, 
		KeyFilter keyIncludeFilter, 
		KeyFilter keyExcludeFilter,
    WordFilter wordIncludeFilter, 
		WordFilter wordExcludeFilter, 
		bool removeDuplicates, 
		string logPrefix
	);
}
string soughtText refers to the text (input) to be searched for.
Algorithm describes which search algorithm should be used. Indx’s own ProprietaryRelevanyRanking is the one used for most applications. ProprietaryRelevancyRanking, ProprietaryCombined, UnlimitedFastLevenshtein, JaccardOfAllChars, JaccardOfCharSet, Jaro, JaroWinkler
int maxNumberOfRecordsToReturn defines the maximum number of documents to be returned.
int timeOutLimitMilliseconds sets waiting time in case of overloaded CPU. We recommend 1000 ms.
keyIncludeFilter and keyExcludeFilter are inclusive and exclusive filters based on the foreign key field in the Document.
wordIncludeFilter and wordExcludeFilter are word-based including and excluding filters. Completely dynamic filter based on text that can, for example, be entered from the end user. See the WordFilter class for details.
bool removeDuplicates removes all duplicates of documents with the same foreign key. Only the one with the best score value is returned in the results list. If foreign keys are not in use, this can be set to null.
string logPrefix to each search. If the instance of the search engine has had a logger injected into its constructor, every search will be logged. The log prefix is then first added to the log text, and can be used by the client to add statistical information, for example for use in dashboards.
🔑 KeyFilter

Filter class based on lists of 64 bit foreign keys that categorize the documents.

The classes support boolean binary operators which cannot be defined on a C# interface, therefore it has been chosen to present the class itself as part of the API.

public class KeyFilter
{
	public KeyFilter(long[] foreignKeys);
	public static KeyFilter operator &(KeyFilter x, KeyFilter y);
	public static KeyFilter operator |(KeyFilter x, KeyFilter y);
	public void AddKey(long key);
	public void AddKeys(long[] foreignKeys);
	public long[] GetKeys();
	public bool InFilter(long key);
}
KeyFilter accepts foreign keys of 64 bits, and can be used for both inclusive and exclusive filtering. The resulting set of keys that is included in the filter can be added both as single values or as arrays. See the Document class for further use of this.
operator & returns a filter that combines the argument x and y. If x represents a category (e.g. male), while y represents an occupation (e.g. taxi driver), this filter will result in a search result with male taxi drivers.
operator | returns a filter that combines x with the argument y. If x represents a category (e.g. male), while y represents an occupation (e.g. taxi driver), this filter will result in a search result with males or taxi drivers. This could include, for example, female taxi drivers or men with other occupations.
AddKey and AddKeys add one or more new keys. No exception occurs even if the same key is added several times.
GetKeys returns the keys in the filters.
InFilter checks whether a key will be included in a search with this filter.
💬 WordFilter

WordFilter is a word-based dynamic filter that can be predefined, but also possibly set up by the user. Can be used to set up both inclusive and exclusive filters.

public class WordFilter
{
  public WordFilter(string word);
  public static WordFilter operator &(WordFilter x, WordFilter y);
  public static WordFilter operator |(WordFilter x, WordFilter y);
}
WordFilter takes a string as a single word and is defined as an argument to SearchQuery, and can be used either inclusively or exclusively. The word must hit exactly. A word filter is typically used to narrow a search by including or excluding specific words.
The & operator makes it possible to supply a list of words to the filter, which in this case will require all the words to be present, or not present, for an inclusive or exclusive filter, respectively.
The | operator makes it possible to supply a list of words to the filter, which in this case will require at least one of the words to be present, or not present, for an inclusive or exclusive filter, respectively.
🔦 SearchResult

SearchResult is the class that contains the search result. This is an array of SearchRecord, but the class also contains some error messages.

public class SearchResult
{
	public SearchResult(
		SearchRecord[] records,
		bool illegalHeapNumber, 
		bool invalidState, 
		bool timedOut, 
		bool invalidArgument
	);
}
SearchRecords, the search result, in the form of documents with their score.
bool illegalHeapNumber is only applicable for special configurations that will be documented separately.
bool invalidState is returned true when searching before Indexing or during indexing. In this case the search will be unsuccessful.
bool timedOut is returned true if there are not enough cpu resources to complete the search within the specified time.
bool invalidArgument returns true if SearchQuery is null.
📃 SearchRecord

The search record is the document that is delivered in the search response. It is delivered as an array in SearchResult and consists of the same document that was uploaded with the addition of score.

public class SearchRecord : Document
{
  public SearchRecord(
		byte metricScore, 
		Document document
		) : base(document)
	);
}
SearchRecord returns search results as an array
byte metricScore is the result of pattern recognition where 255 is the best result, i.e. identical similarity. Metricscore goes from 0..255. When integers are selected, this is because some algorithms such as Levenshtein distance give a result in the number of typing errors. One letter's mistake is then scored 254. Most configurations are supplied with Indx's own algorithm.
🟢 SystemStatus

Class used to get a status of the system

public class SystemStatus
{
	public int DocumentCount { get; set; }
	public int IndexProgressPercent { get; set; }
	public bool InvalidHeapId { get; set; }
	public bool InvalidState { get; set; }
	public bool ReIndexRequired { get; set; }
	public int SearchCounter { get; set; }
	public bool SearchIsAllowed { get; set; }
	public DateTime TimeOfInstanceCreation { get; set; }
	public DateTime TimeOfLastIndexBuid { get; set; }
	public bool TooLongSearchText { get; set; }
	public bool TooLongClientText { get; set; }
	public bool TooManyDocuments { get; set; }
	public bool UnknownConfigurationError { get; set; }
	public string Version { get; set; }
}
int DocumentCount returns the number of documents uploaded and indexed.
int IndexProgressPercent returns the progress of the indexing. This can e.g. tied up to a progress bar when indexing large datasets.
bool InvalidHeapId is only relevant for certain configurations and will then be described separately for each individual
bool InvalidState will return true if IndexAsync is called when DocumentCount == 0 or if the search function is called before IndexAsync has been called at least once.
bool ReIndexRequired is returned true when a fraction over a limit of the documents has been deleted or inserted after the last indexing (call to IndexAsync). The limit is set in the configuration. Omitted reindexing may affect search results.
int SearchCounter returns the number of calls to the search function after occurring
bool SearchIsAllowed Indicates that the search engine has documents indexed and that Search can be called
DateTime TimeOfInstanceCreation returns timestamp for call to constructor
DateTime TimeOfLastIndexBuild returns timestamp of last call to IndexAsync
💡
The configuration will have a maximum length for the search text, the client text, and the number of documents that can be loaded.
bool TooLongSearchText is returned true if the maximum length of the search text is exceeded. If that happens, the text will be truncated. This alarm is not reset until a call to DeleteAll is made.
bool TooLongClientText is returned true if the maximum length of client text is exceeded. If that happens, the text will be truncated. This alarm is not reset until a DeleteAll call is made.
bool TooManyDocuments is returned true if the number of documents that are indexed exceeds the maximum limit defined in configuration. The alarm is reset by deletion and subsequent re-indexing.
bool UnknownConfigurationError is returned true if the instance is created with an invalid configuration number
string Version returns version number of IndxSearchLib
🧵 StringReplacer

Makes it possible to replace or remove sub strings from the search. One may for example replace national characters like 'ø' with 'oe'. An instance of this class can is passed to the IndxSearchEngine constructor. This will only have effect on the indexed text. No changes are made to text fields of the Document class. The two lists must have similar lengths. Empty strings are allowed in replaceBy, but not in toReplace.

public class StringReplacer 
{
	public StringReplacer(
		string[] toReplace, 
		string[] replacedBy)
}
// Example usage
new StringReplacer(
new string[] 
{ "à", "é", "ô", "ö", "è", "ê", "ç", "î", "ü", "ð", "ä", "ñ" },
new string[]
{ "a", "e", "o", "o", "e", "e", "c", "i", "u", "d", "a", "n" }),
string[] toReplace is the list of strings to be swapped.
string[] replacedBy is the list of new strings to replace the selection.
🌀 TokenizerSetup

The Indx search engine does not have a concept of words, or in more general terms tokens. It may however, by this TokenizerSetup class, be passed a tokenizer if one wants to index entire tokens in addition to the regular pattern used by IndxSearchEngine. For most use cases this is not needed. Indexing tokens will tend to increase weight of exact word hits.

public class TokenizerSetup
{
	public TokenizerSetup(
		string delimiter=" ", 
		bool removeDuplicateTokens = false
	)
	public TokenizerSetup(
		string[] delimiters, 
		bool removeDuplicateTokens = false
	)
	public string[] Delimiters { get; set; }
	public int MaxSizeOfTokenTobeIndexed { get; set; }
	public bool RemoveDuplicateTokens { get; set; }
	public char Replacement { get; set; }
	public bool UseLeftReplacement { get; set; }
	public bool UseRigthReplacement { get; set; }
}
use TokenizerSetup constructors to supply the delimiter and other arguments.
string[] Delimiters supply the list of token delimiters. Usually a white space text “ “ is all that is needed.

For some texts however, several delimiters may improve search. Consider characters like '-', '/', ',', '&', '"', '–', '_', '(', ')', '\t'.

int MaxSizeOfTokenTobeIndexed limites the token size to be indexed. Longer tokens will not be indexed. The default limit is set to 8 characters.
bool RemoveDuplicateTokens is usually not set, but may improve search result in case of many repeating words occuring in an unregular and insignificant pattern. This may happen when several fields are concatenated into one text to be indexed. For product data for example, one may use several fields like title, type and description. Words in the type field may then also occur in both title and description resulting in a biased search result. Setting removeDuplicateTokens=true will avoid this effect.
char Replacement is typically not altered for most use cases with regular text
bool UseLeftReplacement and UseRightReplacement tends to increase weight of indexed token, and is usually not altered (set to true)

General All methods are thread-safe. All methods block except IndexAsync. A search will always run on the thread and the logic CPU it is called from.

Exception handling

If an exception occurs, this will be written to the log. Furthermore, the exception will then be raised to the client. If this happens, the stack trace will be obfuscated. Send us this and we'll have tools to de-obfuscate. In such a case, it is also an advantage that we see the search text and also the data set in which the search is made.