@elizaos/core v1.0.0-beta.51 / BM25
Class: BM25
Implements the Okapi BM25 (Best Matching 25) ranking function for information retrieval.
BM25 ranks documents based on the query terms appearing in each document, considering term frequency (TF) and inverse document frequency (IDF). It improves upon basic TF-IDF by incorporating:
- Term Frequency Saturation (k1): Prevents overly frequent terms from dominating the score.
- Document Length Normalization (b): Penalizes documents that are longer than average, assuming longer documents are more likely to contain query terms by chance.
Key Components:
- Tokenizer: Processes text into terms (words), handles stop words and stemming.
- Document Indexing: Stores document lengths, term frequencies per document, and overall document frequency for each term.
- IDF Calculation: Measures the informativeness of a term based on how many documents contain it.
- Scoring: Combines TF, IDF, document length, and parameters k1/b to calculate relevance.
Constructors​
BM25()​
BM25(
docs
?,options
?):BM25
Creates a new BM25 search instance.
Parameters​
• docs?: any
[]
Optional array of initial documents (objects with string fields) to index.
• options?: BM25Options
= {}
Configuration options for BM25 parameters (k1, b), tokenizer (stopWords, stemming, minLength), and field boosts.
Returns​
Defined in​
packages/core/src/search.ts:1071
Properties​
termFrequencySaturation​
readonly
termFrequencySaturation:number
Term frequency saturation parameter (k1).
Defined in​
packages/core/src/search.ts:1046
lengthNormalizationFactor​
readonly
lengthNormalizationFactor:number
Document length normalization factor (b).
Defined in​
packages/core/src/search.ts:1048
tokenizer​
readonly
tokenizer:Tokenizer
Tokenizer instance used for processing text.
Defined in​
packages/core/src/search.ts:1050
documentLengths​
documentLengths:
Uint32Array
Array storing the length (number of tokens, adjusted by field boosts) of each document.
Defined in​
packages/core/src/search.ts:1052
averageDocLength​
averageDocLength:
number
Average length of all documents in the index.
Defined in​
packages/core/src/search.ts:1054
termToIndex​
termToIndex:
Map
<string
,number
>
Map from term (string) to its unique integer index.
Defined in​
packages/core/src/search.ts:1056
documentFrequency​
documentFrequency:
Uint32Array
Array storing the document frequency (number of docs containing the term) for each term index.
Defined in​
packages/core/src/search.ts:1058
termFrequencies​
termFrequencies:
Map
<number
,Map
<number
,number
>>
Map from term index to another map storing docIndex: termFrequencyInDoc
.
Defined in​
packages/core/src/search.ts:1060
fieldBoosts​
readonly
fieldBoosts:object
Boost factors for different fields within documents.
Index Signature​
[key
: string
]: number
Defined in​
packages/core/src/search.ts:1062
documents​
documents:
any
[]
Array storing the original documents added to the index.
Defined in​
packages/core/src/search.ts:1064
Methods​
search()​
search(
query
,topK
):SearchResult
[]
Searches the indexed documents for a given query string using the BM25 ranking formula.
Parameters​
• query: string
The search query text.
• topK: number
= 10
The maximum number of top-scoring results to return. Defaults to 10.
Returns​
An array of SearchResult
objects, sorted by descending BM25 score.
Defined in​
packages/core/src/search.ts:1209
searchPhrase()​
searchPhrase(
phrase
,topK
):SearchResult
[]
Searches for an exact phrase within the indexed documents. Ranks documents containing the exact sequence of tokens higher. Note: This is a basic implementation. More sophisticated phrase search might consider proximity.
Parameters​
• phrase: string
The exact phrase to search for.
• topK: number
= 10
The maximum number of results to return. Defaults to 10.
Returns​
An array of SearchResult
objects, sorted by score, for documents containing the phrase.
Defined in​
packages/core/src/search.ts:1266
addDocument()​
addDocument(
doc
):Promise
<void
>
Adds a single new document to the index.
Updates all internal index structures incrementally.
Note: For adding many documents, addDocumentsParallel
is generally more efficient.
Parameters​
• doc: any
The document object (with string fields) to add.
Returns​
Promise
<void
>
Throws​
If the document is null or undefined.
Defined in​
packages/core/src/search.ts:1379
calculateIdf()​
calculateIdf(
termIndex
):number
Calculates the Inverse Document Frequency (IDF) for a given term index. Uses the BM25 IDF formula: log(1 + (N - n + 0.5) / (n + 0.5)) where N is the total number of documents and n is the number of documents containing the term. The +1 smooths the logarithm.
Parameters​
• termIndex: number
The integer index of the term.
Returns​
number
The IDF score for the term. Returns 0 if the term is not found or has 0 DF.
Defined in​
packages/core/src/search.ts:1468
getTermFrequency()​
getTermFrequency(
termIndex
,docIndex
):number
Retrieves the term frequency (TF) for a specific term in a specific document.
Parameters​
• termIndex: number
The integer index of the term.
• docIndex: number
The index of the document.
Returns​
number
The term frequency, or 0 if the term is not in the document or indices are invalid.
Defined in​
packages/core/src/search.ts:1494
getDocument()​
getDocument(
index
):any
Retrieves the original document object stored at a given index.
Parameters​
• index: number
The index of the document to retrieve.
Returns​
any
The document object.
Throws​
If the index is out of bounds.
Defined in​
packages/core/src/search.ts:1504
clearDocuments()​
clearDocuments():
void
Clears all indexed documents and resets the BM25 instance to its initial state.
Returns​
void
Defined in​
packages/core/src/search.ts:1515
getDocumentCount()​
getDocumentCount():
number
Gets the total number of documents currently indexed.
Returns​
number
The document count.
Defined in​
packages/core/src/search.ts:1528
addDocuments()​
addDocuments(
docs
):Promise
<void
[]>
Adds multiple documents sequentially by calling addDocument
for each.
This method processes documents sequentially in the main thread.
Parameters​
• docs: any
[]
An array of documents to add.
Returns​
Promise
<void
[]>