Skip to main content

@elizaos/core v1.0.0-beta.51 / BM25

Class: BM25

Implements the Okapi BM25 (Best Matching 25) ranking function for information retrieval.

BM25 ranks documents based on the query terms appearing in each document, considering term frequency (TF) and inverse document frequency (IDF). It improves upon basic TF-IDF by incorporating:

  • Term Frequency Saturation (k1): Prevents overly frequent terms from dominating the score.
  • Document Length Normalization (b): Penalizes documents that are longer than average, assuming longer documents are more likely to contain query terms by chance.

Key Components:

  • Tokenizer: Processes text into terms (words), handles stop words and stemming.
  • Document Indexing: Stores document lengths, term frequencies per document, and overall document frequency for each term.
  • IDF Calculation: Measures the informativeness of a term based on how many documents contain it.
  • Scoring: Combines TF, IDF, document length, and parameters k1/b to calculate relevance.

Constructors​

BM25()​

BM25(docs?, options?): BM25

Creates a new BM25 search instance.

Parameters​

• docs?: any[]

Optional array of initial documents (objects with string fields) to index.

• options?: BM25Options = {}

Configuration options for BM25 parameters (k1, b), tokenizer (stopWords, stemming, minLength), and field boosts.

Returns​

BM25

Defined in​

packages/core/src/search.ts:1071

Properties​

termFrequencySaturation​

readonly termFrequencySaturation: number

Term frequency saturation parameter (k1).

Defined in​

packages/core/src/search.ts:1046


lengthNormalizationFactor​

readonly lengthNormalizationFactor: number

Document length normalization factor (b).

Defined in​

packages/core/src/search.ts:1048


tokenizer​

readonly tokenizer: Tokenizer

Tokenizer instance used for processing text.

Defined in​

packages/core/src/search.ts:1050


documentLengths​

documentLengths: Uint32Array

Array storing the length (number of tokens, adjusted by field boosts) of each document.

Defined in​

packages/core/src/search.ts:1052


averageDocLength​

averageDocLength: number

Average length of all documents in the index.

Defined in​

packages/core/src/search.ts:1054


termToIndex​

termToIndex: Map<string, number>

Map from term (string) to its unique integer index.

Defined in​

packages/core/src/search.ts:1056


documentFrequency​

documentFrequency: Uint32Array

Array storing the document frequency (number of docs containing the term) for each term index.

Defined in​

packages/core/src/search.ts:1058


termFrequencies​

termFrequencies: Map<number, Map<number, number>>

Map from term index to another map storing docIndex: termFrequencyInDoc.

Defined in​

packages/core/src/search.ts:1060


fieldBoosts​

readonly fieldBoosts: object

Boost factors for different fields within documents.

Index Signature​

[key: string]: number

Defined in​

packages/core/src/search.ts:1062


documents​

documents: any[]

Array storing the original documents added to the index.

Defined in​

packages/core/src/search.ts:1064

Methods​

search(query, topK): SearchResult[]

Searches the indexed documents for a given query string using the BM25 ranking formula.

Parameters​

• query: string

The search query text.

• topK: number = 10

The maximum number of top-scoring results to return. Defaults to 10.

Returns​

SearchResult[]

An array of SearchResult objects, sorted by descending BM25 score.

Defined in​

packages/core/src/search.ts:1209


searchPhrase()​

searchPhrase(phrase, topK): SearchResult[]

Searches for an exact phrase within the indexed documents. Ranks documents containing the exact sequence of tokens higher. Note: This is a basic implementation. More sophisticated phrase search might consider proximity.

Parameters​

• phrase: string

The exact phrase to search for.

• topK: number = 10

The maximum number of results to return. Defaults to 10.

Returns​

SearchResult[]

An array of SearchResult objects, sorted by score, for documents containing the phrase.

Defined in​

packages/core/src/search.ts:1266


addDocument()​

addDocument(doc): Promise<void>

Adds a single new document to the index. Updates all internal index structures incrementally. Note: For adding many documents, addDocumentsParallel is generally more efficient.

Parameters​

• doc: any

The document object (with string fields) to add.

Returns​

Promise<void>

Throws​

If the document is null or undefined.

Defined in​

packages/core/src/search.ts:1379


calculateIdf()​

calculateIdf(termIndex): number

Calculates the Inverse Document Frequency (IDF) for a given term index. Uses the BM25 IDF formula: log(1 + (N - n + 0.5) / (n + 0.5)) where N is the total number of documents and n is the number of documents containing the term. The +1 smooths the logarithm.

Parameters​

• termIndex: number

The integer index of the term.

Returns​

number

The IDF score for the term. Returns 0 if the term is not found or has 0 DF.

Defined in​

packages/core/src/search.ts:1468


getTermFrequency()​

getTermFrequency(termIndex, docIndex): number

Retrieves the term frequency (TF) for a specific term in a specific document.

Parameters​

• termIndex: number

The integer index of the term.

• docIndex: number

The index of the document.

Returns​

number

The term frequency, or 0 if the term is not in the document or indices are invalid.

Defined in​

packages/core/src/search.ts:1494


getDocument()​

getDocument(index): any

Retrieves the original document object stored at a given index.

Parameters​

• index: number

The index of the document to retrieve.

Returns​

any

The document object.

Throws​

If the index is out of bounds.

Defined in​

packages/core/src/search.ts:1504


clearDocuments()​

clearDocuments(): void

Clears all indexed documents and resets the BM25 instance to its initial state.

Returns​

void

Defined in​

packages/core/src/search.ts:1515


getDocumentCount()​

getDocumentCount(): number

Gets the total number of documents currently indexed.

Returns​

number

The document count.

Defined in​

packages/core/src/search.ts:1528


addDocuments()​

addDocuments(docs): Promise<void[]>

Adds multiple documents sequentially by calling addDocument for each. This method processes documents sequentially in the main thread.

Parameters​

• docs: any[]

An array of documents to add.

Returns​

Promise<void[]>

Defined in​

packages/core/src/search.ts:1537