Contextual Embeddings
Enhanced retrieval accuracy using Anthropic’s contextual retrieval technique
Contextual embeddings are an advanced Knowledge plugin feature that improves retrieval accuracy by enriching text chunks with surrounding context before generating embeddings. This implementation is based on Anthropic’s contextual retrieval techniques.
What are Contextual Embeddings?
Traditional RAG systems embed isolated text chunks, losing important context. Contextual embeddings solve this by using an LLM to add relevant context to each chunk before embedding.
Traditional vs Contextual
How It Works
The Knowledge plugin uses a sophisticated prompt-based approach to enrich chunks:
- Document Analysis: The full document is passed to an LLM along with each chunk
- Context Generation: The LLM identifies relevant context from the document
- Chunk Enrichment: The original chunk is preserved with added context
- Embedding: The enriched chunk is embedded using your configured embedding model
The implementation is based on Anthropic’s Contextual Retrieval cookbook example, which showed up to 50% improvement in retrieval accuracy.
Configuration
Enable Contextual Embeddings
Important: Embeddings always use the model configured in useModel(TEXT_EMBEDDING)
from your agent setup. Do NOT try to mix different embedding models - all your documents must use the same embedding model for consistency.
Recommended Setup: OpenRouter with Separate Embedding Provider
Since OpenRouter doesn’t support embeddings, you need a separate embedding provider:
Alternative Providers
Technical Details
Chunk Processing
The plugin uses fixed chunk sizes optimized for contextual enrichment:
- Chunk Size: 500 tokens (approximately 1,750 characters)
- Chunk Overlap: 100 tokens
- Context Target: 60-200 tokens of added context
These values are based on research showing that smaller chunks with rich context perform better than larger chunks without context.
Content-Aware Templates
The plugin automatically detects content types and uses specialized prompts:
OpenRouter Caching
When using OpenRouter with Claude or Gemini models, the plugin automatically leverages caching:
- First document chunk: Caches the full document
- Subsequent chunks: Reuses cached document (90% cost reduction)
- Cache duration: 5 minutes (automatic)
This means processing a 100-page document costs almost the same as processing a single page!
Example: How Context Improves Retrieval
Without Contextual Embeddings
With Contextual Embeddings
Performance Considerations
Processing Time
- Initial processing: 1-3 seconds per chunk (includes LLM call)
- With caching: 0.1-0.3 seconds per chunk
- Batch processing: Up to 30 chunks concurrently
Cost Estimation
Document Size | Pages | Chunks | Without Caching | With OpenRouter Cache |
---|---|---|---|---|
Small | 10 | ~20 | $0.02 | $0.002 |
Medium | 50 | ~100 | $0.10 | $0.01 |
Large | 200 | ~400 | $0.40 | $0.04 |
Costs are estimates based on Claude 3 Haiku pricing. Actual costs depend on your chosen model.
Monitoring
The plugin provides detailed logging:
This will show:
- Context enrichment progress
- Cache hit/miss rates
- Processing times per document
- Token usage
Common Issues and Solutions
Context Not Being Added
Check if contextual embeddings are enabled:
Verify your configuration:
CTX_KNOWLEDGE_ENABLED=true
(not “TRUE” or “True”)TEXT_PROVIDER
andTEXT_MODEL
are both set- Required API key for your provider is set
Slow Processing
Solutions:
- Use OpenRouter with Claude/Gemini for automatic caching
- Process smaller batches of documents
- Use faster models (Claude 3 Haiku, Gemini 1.5 Flash)
High Costs
Solutions:
- Enable OpenRouter caching (90% cost reduction)
- Use smaller models for context generation
- Process documents in batches during off-peak hours
Best Practices
Use OpenRouter for Cost Efficiency
OpenRouter’s caching makes contextual embeddings 90% cheaper when processing multiple chunks from the same document.
Keep Default Settings
The chunk sizes and overlap are optimized based on research. Only change if you have specific requirements.
Monitor Your Logs
Enable debug logging when first setting up to ensure context is being added properly.
Use Appropriate Models
- Claude 3 Haiku: Best balance of quality and cost
- Gemini 1.5 Flash: Fastest processing
- GPT-4o-mini: Good quality, moderate cost
Summary
Contextual embeddings significantly improve retrieval accuracy by:
- Adding document context to each chunk before embedding
- Using intelligent templates based on content type
- Preserving the original text while enriching with context
- Leveraging caching for cost-efficient processing
The implementation is based on Anthropic’s proven approach and integrates seamlessly with ElizaOS’s existing infrastructure. Simply set CTX_KNOWLEDGE_ENABLED=true
and configure your text generation provider to get started!