📗

Rest API v3.2

This is the Restful API docs for calling Indx through HTTP requests.

Version 3.2.

☁️
indxRestAPI runs on Azure servers in 🇳🇴 Norway East. The server runs on 🌱 100% renewable energy.
💻
Typical use case

📡 Connect to the endpoint, log in and receive a token

⛓ Insert with an array of documents

🪄 Create Index

🟢 Check ready status

🔎 Search

⛓ Insert one or more documents without re-indexing

🗑 Delete one or more documents without re-indexing

🪄 (re-index if system status says it is required)

All of these functions can be run multiple times. For example to perform incremental loading.

All variations of this pattern can be run continuously over a long period of time to assure that the index and data is up to date.

How to use indx Rest API

📡 Indx Rest API endpoint: https://api.indx.co Variables in use

${API_URL}
the url to the endpoint
${USER}
your username (e-mail)
${PASSWORD}
your password
${TOKEN}
your retrieved token string
${CONFIG}
a config value, most often this should be 100
${HEAP_ID}
a integer value of your dataset, for example 0 or 1
${DOCUMENT_KEY}
a foreign key for each record
🔓 Log in and retrieve your access token
👩‍💻
If you do not have a username and password, request developer access here

/api/Login

curl -X POST \
	'${API_URL}/api/Login?UserEmail=${USER}&UserPassword=${PASSWORD}' \
	-H 'accept: */*' \
	-d ''
⚙️ Create an instance of indx

A heap (an instance with a dataset) is required. Every API call will refer to this heap ID. You can create multiple heaps. This function will also ask for a configuration number. Config number 100 is used for most cases.

PUT /api/Search/{heapId}/{configuration}

curl -X PUT \
	'${API_URL}/api/Search/${HEAP_ID}/${CONFIG}' \
	-H 'accept: */*' \
	-H 'Authorization: Bearer ${TOKEN}'
💡
If you try to create a heap with an id that is already in use, you will get this error message.
title": "Unprocessable Entity",
"status": 422,

If you want to override the ID, you need to run DELETE /api/Search/{heapId}

📄 Insert data

To load documents into your heap, add data as JSON. A document is a class of information used by indx. The documents are identified by a key, 64 bit integer. The document class also has a field called “documentClientInformation” for text that should not be searchable. This is for storing information you typically need to display or process with your record. You can store JSON or any other format in here. int SegmentNumber is used by the client to add extra info such as line numbers in a book. Can be used to describe a part of a body text that is split into several records.

The recommended max length of a documentText is 80 characters, after this, SegmentNumber should be used. This ensures optimal pattern recognition capabilities.

An array of records

PUT /api/Search/array/{heapId}

curl -X PUT \
	'${API_URL}/api/Search/array/${HEAP_ID}' \
	-H 'accept: */*' \
	-H 'Authorization: Bearer ${TOKEN}'
	-H 'Content-Type: application/json' \
  -d '[
  {
    "deleted": false,
    "documentClientInformation": "Large airport, KLAX",
    "documentKey": 0,
    "documentTextToBeIndexed": "Los Angeles International Airport KLAX",
    "segmentNumber": 0
  },
  {
    "deleted": false,
    "documentClientInformation": "Medium airport, ENTO",
    "documentKey": 1,
    "documentTextToBeIndexed": "Sandefjord lufthavn Torp ENTO",
    "segmentNumber": 0
  }
]'

A single record

PUT /api/Search/single/{heapId}

curl -X PUT \
	'${API_URL}/api/Search/array/${HEAP_ID}' \
	-H 'accept: */*' \
	-H 'Authorization: Bearer ${TOKEN}'
	-H 'Content-Type: application/json' \
  -d '[
  {
    "deleted": false,
    "documentClientInformation": "Large airport, KLAX",
    "documentKey": 0,
    "documentTextToBeIndexed": "Los Angeles International Airport KLAX",
    "segmentNumber": 0
  }
]'
🪄 Index your data

After inserting, you need to run the index call before you can search. This will run asynchronously and can be monitored by GetStatus

GET /api/Search/DoIndex/{heapId}

curl -X GET \
	'${API_URL}/api/Search/DoIndex/${HEAP_ID}' \
	-H 'accept: text/plain' \
	-H 'Authorization: Bearer ${TOKEN}'
🟢 Check system status

GET /api/Search/{heapId}

curl -X GET \
	'${API_URL}/api/Search/${HEAP_ID}' \
	-H 'accept: text/plain' \
	-H 'Authorization: Bearer ${TOKEN}'

Example response:

{
  "errorMessage": "",
  "documentCount": 123456,
  "indexProgressPercent": 100,
  "invalidArgument": false,
  "invalidHeapId": false,
  "invalidState": false,
  "reIndexRequired": false,
  "searchCounter": 0,
  "secondsToIndex": 0,
  "systemState": 3,
  "timeOfInstanceCreation": "2024-02-27T13:50:10.532Z",
  "timeOfLastIndexBuid": "2024-02-27T13:50:10.532Z",
  "tooLongClientText": false,
  "tooLongSearchText": false,
  "tooManyDocuments": false,
  "unknownConfigurationError": false,
  "version": "3.2.0.2"
}
int DocumentCount returns the number of documents uploaded and indexed.
int IndexProgressPercent returns the progress of the indexing. This can e.g. tied up to a progress bar when indexing large datasets.
bool InvalidHeapId is only relevant for certain configurations and will then be described separately for each individual
bool InvalidState will return true if DoIndex is called when DocumentCount == 0 or if the search function is called before has been called at least once.
bool ReIndexRequired is returned true when a fraction over a limit of the documents has been deleted or inserted after the last indexing. The limit is set in the configuration. Omitted reindexing may affect search results.
int SearchCounter returns the number of calls to the search function after occurring
int SecondsToIndex returns the indexing time in seconds
int SystemState returns what state the system is in. 0 = created, not loaded 1 = loading 2 = indexing 3 = ready to search
DateTime TimeOfInstanceCreation returns timestamp for call to constructor
DateTime TimeOfLastIndexBuild returns timestamp of last call indexing
💡
The configuration will have a maximum length for the search text, the client text, and the number of documents that can be loaded.
bool TooLongSearchText is returned true if the maximum length of the search text is exceeded. If that happens, the text will be truncated. This alarm is not reset until a call to DeleteAll is made.
bool TooLongClientText is returned true if the maximum length of client text is exceeded. If that happens, the text will be truncated. This alarm is not reset until a DeleteAll call is made.
bool TooManyDocuments is returned true if the number of documents that are indexed exceeds the maximum limit defined in configuration. The alarm is reset by deletion and subsequent re-indexing.
bool UnknownConfigurationError is returned true if the instance is created with an invalid configuration number
string Version returns version number of IndxSearchLib
🔎 Search query

POST /api/Search/{heapId}

curl -X POST \
	'${API_URL}/api/Search/${HEAP_ID}' \
	-H 'accept: text/plain' \
	-H 'Authorization: Bearer ${TOKEN}'
	-H 'Content-Type: application/json' \
	-d '{
	"algorithm": 1,
  "maxNumberOfRecordsToReturn": 20,
  "soughtText": "string",
  "coverageSetup": {}
}'
Advanced query
curl -X POST \
	'${API_URL}/api/Search/${HEAP_ID}' \
	-H 'accept: text/plain' \
	-H 'Authorization: Bearer ${TOKEN}'
	-H 'Content-Type: application/json' \
	-d '{
	"algorithm": 0,
  "keyExcludeFilter": null,
  "keyIncludeFilter": null,
  "logPrefix": "",
  "maxNumberOfRecordsToReturn": 20,
  "removeDuplicates": true,
  "soughtText": "string",
  "timeOutLimitMilliseconds": 1000,
  "wordExcludeFilter": null,
  "wordIncludeFilter": null,
  "coverageSetup": {
    "lcsTopErrorTolerance": 0,
    "lcsTopMaxRepetions": 0,
    "lcsErrorTolerance": 0,
    "lcsMaxRepetitions": 0,
    "lcsBottomErrorTolerance": 0,
    "lcsBottomMaxRepetitions": 0,
    "lcsWordMinWordSize": 3,
    "lcsWordLcsErrorTolerance": 0,
    "lcsWordLcsMaxRepetitions": 0,
    "coverageMinWordHitsAbs": 1,
    "coverageMinWordHitsRelative": 0,
    "coverageQLimitForErrorTolerance": 5,
    "coverageLcsErrorToleranceRelativeq": 0.2
  },
  "numberOfRecordsForAppliedAlgorithm": 500
}'

Required:

Algorithm describes which search algorithm should be used. Indx’s own RelevanyRanking is the one used for most applications. This is number 0 Version 3.1 introduces a new algorithm called Coverage. This is a function that detects exact matching on tokens in the query, and lifts them up to the top of the result list. Coverage is a collection of algorithms that works together, and this complements RelevancyRanking. ⚡️ Coverage algorithm is number 1 This also returns an argument LcsBottomIndex to truncate the list
string soughtText refers to the text (input) to be searched for.
int maxNumberOfRecordsToReturn defines the maximum number of documents to be returned.
coverageSetup defines properties for the Coverage algorithm. This only applies when algorithm number 1 is used. 😌 For a simple query, the coverageSetup{} array can be inserted empty. ”LCS” is an acronym for “Longest Common Substring”.
    lcsTopErrorTolerance = 0,
    lcsTopMaxRepetions = 0,
    lcsErrorTolerance = 0,
    lcsMaxRepetitions = 0,
    lcsBottomErrorTolerance = 0,
    lcsBottomMaxRepetitions = 0,
    lcsWordMinWordSize = 3,
    lcsWordLcsErrorTolerance = 0,
    lcsWordLcsMaxRepetitions = 0,
    coverageMinWordHitsAbs = 1,
    coverageMinWordHitsRelative = 0,
    coverageQLimitForErrorTolerance = 5,
    coverageLcsErrorToleranceRelativeq = 0.2
coverageSetup default values
lcsTopErrorTolerance, lcsErrorTolerance, lcsBottomErrorTolerance, lcsWordLcsErrorTolerance ranks correct strings (no tokenization) higher on the list, and takes an argument for error tolerance on the prefix and suffix of the string.
lcsTopMaxRepetitions, lcsMaxRepetitions, lcsBottomMaxRepetitions, and lcsWordMaxRepetions gives repetition counting of the Coverage function. The value sets the number of repetitions that should be ranked, for example 2. The record with most repetitions will be ranked higher. In most situations the value of these parameter should be the same.
lcsWordMinWordSize defines the minimum size of a detectable token. Default set to 3. Values between 2 and 5 is recommended.
coverageMinWordHitsAbs defines the minimum of words that should be detected before LcsBottomIndex is returned with an index to truncate on.
coverageMinWordHitsrelative defines a relative number to the one found. Usually 0.
coverageQLimitForErrorTolerance sets the minimum of characters the query should have before the relative error tolerance is applied.
coverageLcsErrorToleranceRelativeq is a relative value on fault tolerance required. 0.2 means 20%, so that for every 5 characters the system allows one error.

Optional:

int timeOutLimitMilliseconds sets waiting time in case of overloaded CPU. We recommend 1000 ms.
keyIncludeFilter and keyExcludeFilter are inclusive and exclusive filters based on the foreign key field. See “Search query with key filters” for how to use this.
wordIncludeFilter and wordExcludeFilter are word-based including and excluding filters. Completely dynamic filter based on text that can, for example, be entered from the end user. See “Search query with word filters” for how to use this.
bool removeDuplicates removes all duplicates of documents with the same foreign key. Only the one with the best score value is returned in the results list. If foreign keys are not in use, this can be set to null.
🔦 Search results
{
  "illegalHeapNumber": false,
  "invalidArgument": false,
  "invalidState": false,
  "coverageBottomIndex": 0,
  "searchRecords": [
    {
      "metricScore": 204,
      "deleted": false,
      "documentClientInformation": "string",
      "documentKey": 0,
      "documentTextToBeIndexed": "string1",
      "segmentNumber": 0
    },
    {
      "metricScore": 204,
      "deleted": false,
      "documentClientInformation": "string",
      "documentKey": 1,
      "documentTextToBeIndexed": "string1",
      "segmentNumber": 0
    },
    {
      "metricScore": 180,
      "deleted": false,
      "documentClientInformation": "string",
      "documentKey": 2,
      "documentTextToBeIndexed": "string2",
      "segmentNumber": 0
    }
  ],
  "timedOut": false
}
bool illegalHeapNumber is only applicable for special configurations that will be documented separately.
bool invalidArgument returns true if SearchQuery is null.
bool invalidState is returned true when searching before Indexing or during indexing. In this case the search will be unsuccessful.
(new in v3.1) coverageBottomIndex returns an index number the list could be truncated on. The value is greater than -1 if the Coverage algorithm is in use, and requirements like one or more words or string are exactly matched.
SearchRecords, the search result, in the form of documents with their score.
Deleted returns true of a document that has been deleted without saving the Heap
byte metricScore is the result of pattern recognition where 255 is the best result, i.e. identical similarity. Metricscore goes from 0..255. When integers are selected, this is because some algorithms such as Levenshtein distance give a result in the number of typing errors. One letter's mistake is then scored 254. Most configurations are supplied with Indx's own algorithm.
bool timedOut is returned true if there are not enough resources to complete the search within the specified time.
🔑 Key filters

A query with filter is used by filling the keyExcludeFilter or keyIncludeFilter arrays in the Search function. This accepts foreign keys of 64 bits. Indx supports exluding filters, meaning “do not include”.

POST /api/Search/{heapId}

curl -X POST \
	'${API_URL}/api/Search/${HEAP_ID}' \
	-H 'accept: text/plain' \
	-H 'Authorization: Bearer ${TOKEN}'
	-H 'Content-Type: application/json' \
	-d '{
	"algorithm": 0,
	"keyExcludeFilter": null,
	"keyIncludeFilter": 
  [
    0, 1, 2, 3, 4
  ],
	"maxNumberOfRecordsToReturn": 20,
	"soughtText": "string",
	"coverageSetup": {}
}'
💬 Word filters

WordFilter is a word-based dynamic filter that can be predefined, but also possibly set up by the user. Can be used to set up both inclusive and exclusive filters.

WordFilter takes a string as a single word and is defined as an argument to SearchQuery, and can be used either inclusively or exclusively. The word must hit exactly.

POST /api/Search/{heapId}

curl -X POST \
	'${API_URL}/api/Search/${HEAP_ID}' \
	-H 'accept: text/plain' \
	-H 'Authorization: Bearer ${TOKEN}'-H 'Authorization: Bearer ${TOKEN}
	-H 'Content-Type: application/json' \
	-d '{
	"algorithm": 0,
	"maxNumberOfRecordsToReturn": 20,
	"soughtText": "string",
	"wordExcludeFilter": 
  [
	  "secret", 
    "admin"
	],
	"wordIncludeFilter": null,
	"coverageSetup": {}
}'
💾 Save heap for persistence on the server

In order for your dataset to be persistent on the server you will need to save it. This is typically called after insertions or deletions.

PUT /api/Search/{heapId}

curl -X PUT \
	'${API_URL}/api/Search/${HEAP_ID}' \
	-H 'accept: */*' \
	-H 'Authorization: Bearer ${TOKEN}'
🗑️ Delete a single document

Delete a document with a given key.

Remember to Save Heap for this to be persistent.

DELETE /api/Search/{heapId}/{documentKey}

curl -X 'DELETE' \
  '${API_URL}/api/Search/${HEAP_ID}/${DOCUMENT_KEY}' \
  -H 'accept: */*' \
  -H 'Authorization: Bearer ${TOKEN}'
🛑 Delete instance

DeleteHeap, will delete the entire heap (instance), including all contained documents. It will take effect immediately and the data cannot be recovered.

DELETE /api/Search/{heapId}

curl -X 'DELETE' \
  '${API_URL}/api/Search/${HEAP_ID}' \
  -H 'accept: */*' \
  -H 'Authorization: Bearer ${TOKEN}'