Overview
TextEmbeddingsModule provides a class-based interface for generating dense vector embeddings from text. These embeddings can be used for semantic search, similarity comparison, clustering, or as features for downstream tasks.
When to Use
Use TextEmbeddingsModule when:
- You need manual control over model lifecycle
- You’re working outside React components
- You need to process text programmatically
- You want to integrate text embeddings into non-React code
Use useTextEmbeddings hook when:
- Building React components
- You want automatic lifecycle management
- You prefer declarative state management
- You need React state integration
Extends
TextEmbeddingsModule extends BaseModule.
Constructor
new TextEmbeddingsModule()
Creates a new text embeddings module instance.
Example
import { TextEmbeddingsModule } from 'react-native-executorch';
const embedder = new TextEmbeddingsModule();
Methods
load()
async load(
model: {
modelSource: ResourceSource;
tokenizerSource: ResourceSource;
},
onDownloadProgressCallback?: (progress: number) => void
): Promise<void>
Loads the text embeddings model and tokenizer.
Parameters
Resource location of the text embeddings model binary.
Resource location of the tokenizer JSON file.
onDownloadProgressCallback
(progress: number) => void
Optional callback to track download progress (value between 0 and 1).
Example
await embedder.load(
{
modelSource: 'https://example.com/text_embedder.pte',
tokenizerSource: 'https://example.com/tokenizer.json'
},
(progress) => {
console.log(`Download: ${(progress * 100).toFixed(1)}%`);
}
);
forward()
async forward(input: string): Promise<Float32Array>
Executes the model’s forward pass and returns an embedding vector for the given text.
Parameters
The text string to embed.
Returns
A Float32Array containing the vector embeddings (typically 384, 512, or 768 dimensions).
Example
const embedding = await embedder.forward('Machine learning is fascinating');
console.log('Embedding dimensions:', embedding.length);
console.log('Embedding:', embedding);
// Float32Array(384) [0.234, -0.567, 0.891, ...]
delete()
Unloads the model from memory and releases native resources.
Example
Complete Example: Semantic Search
import { TextEmbeddingsModule } from 'react-native-executorch';
class SemanticSearchEngine {
private embedder: TextEmbeddingsModule;
private documents: Map<string, { text: string; embedding: Float32Array }> = new Map();
constructor() {
this.embedder = new TextEmbeddingsModule();
}
async initialize() {
await this.embedder.load(
{
modelSource: 'https://example.com/embedder.pte',
tokenizerSource: 'https://example.com/tokenizer.json'
},
(progress) => {
console.log(`Loading: ${(progress * 100).toFixed(0)}%`);
}
);
console.log('Text embedder ready!');
}
async addDocument(id: string, text: string) {
const embedding = await this.embedder.forward(text);
this.documents.set(id, { text, embedding });
console.log(`Indexed document: ${id}`);
}
async search(query: string, topK: number = 5) {
const queryEmbedding = await this.embedder.forward(query);
// Calculate cosine similarity with all documents
const results = Array.from(this.documents.entries()).map(
([id, doc]) => ({
id,
text: doc.text,
similarity: this.cosineSimilarity(queryEmbedding, doc.embedding)
})
);
// Sort by similarity and return top K
return results
.sort((a, b) => b.similarity - a.similarity)
.slice(0, topK);
}
private cosineSimilarity(a: Float32Array, b: Float32Array): number {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
cleanup() {
this.embedder.delete();
this.documents.clear();
}
}
// Usage
const search = new SemanticSearchEngine();
await search.initialize();
// Index documents
await search.addDocument('doc1', 'React Native is a mobile framework');
await search.addDocument('doc2', 'Python is a programming language');
await search.addDocument('doc3', 'Mobile development with JavaScript');
await search.addDocument('doc4', 'Machine learning with neural networks');
// Search
const results = await search.search('mobile app development', 3);
console.log('Search results:');
results.forEach((result, i) => {
console.log(`${i + 1}. [${result.similarity.toFixed(3)}] ${result.text}`);
});
search.cleanup();
Text Clustering Example
class TextClusterer {
private embedder: TextEmbeddingsModule;
constructor() {
this.embedder = new TextEmbeddingsModule();
}
async initialize() {
await this.embedder.load({
modelSource: 'https://example.com/embedder.pte',
tokenizerSource: 'https://example.com/tokenizer.json'
});
}
async clusterTexts(texts: string[], numClusters: number) {
// Generate embeddings for all texts
console.log('Generating embeddings...');
const embeddings = await Promise.all(
texts.map(text => this.embedder.forward(text))
);
// Simple k-means clustering (simplified)
const clusters = this.kMeansClustering(embeddings, numClusters);
// Map back to original texts
return clusters.map(cluster =>
cluster.map(idx => texts[idx])
);
}
private kMeansClustering(
embeddings: Float32Array[],
k: number
): number[][] {
// Simplified k-means - in production use a proper library
const assignments = embeddings.map(() =>
Math.floor(Math.random() * k)
);
const clusters: number[][] = Array.from({ length: k }, () => []);
assignments.forEach((clusterIdx, textIdx) => {
clusters[clusterIdx].push(textIdx);
});
return clusters;
}
cleanup() {
this.embedder.delete();
}
}
// Usage
const clusterer = new TextClusterer();
await clusterer.initialize();
const texts = [
'I love programming in JavaScript',
'Python is great for data science',
'Mobile apps are built with React Native',
'Machine learning uses neural networks',
'TypeScript adds types to JavaScript',
'Deep learning is a subset of ML'
];
const clusters = await clusterer.clusterTexts(texts, 2);
console.log('Cluster 1:', clusters[0]);
console.log('Cluster 2:', clusters[1]);
clusterer.cleanup();
Similarity Comparison
class TextSimilarityAnalyzer {
private embedder: TextEmbeddingsModule;
constructor() {
this.embedder = new TextEmbeddingsModule();
}
async initialize() {
await this.embedder.load({
modelSource: 'https://example.com/embedder.pte',
tokenizerSource: 'https://example.com/tokenizer.json'
});
}
async compare(text1: string, text2: string): Promise<number> {
const [embedding1, embedding2] = await Promise.all([
this.embedder.forward(text1),
this.embedder.forward(text2)
]);
return this.cosineSimilarity(embedding1, embedding2);
}
async compareMany(baseText: string, comparisons: string[]) {
const baseEmbedding = await this.embedder.forward(baseText);
const results = [];
for (const text of comparisons) {
const embedding = await this.embedder.forward(text);
const similarity = this.cosineSimilarity(baseEmbedding, embedding);
results.push({ text, similarity });
}
return results.sort((a, b) => b.similarity - a.similarity);
}
private cosineSimilarity(a: Float32Array, b: Float32Array): number {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
cleanup() {
this.embedder.delete();
}
}
// Usage
const analyzer = new TextSimilarityAnalyzer();
await analyzer.initialize();
// Compare two texts
const similarity = await analyzer.compare(
'I enjoy playing basketball',
'I like playing sports'
);
console.log(`Similarity: ${(similarity * 100).toFixed(1)}%`);
// Compare one text against multiple
const comparisons = await analyzer.compareMany(
'I love programming',
[
'I enjoy coding',
'I like cooking',
'Software development is fun',
'I play guitar'
]
);
comparisons.forEach(result => {
console.log(`${(result.similarity * 100).toFixed(1)}% - ${result.text}`);
});
analyzer.cleanup();
Batch Processing
class BatchTextEmbedder {
private embedder: TextEmbeddingsModule;
constructor() {
this.embedder = new TextEmbeddingsModule();
}
async initialize() {
await this.embedder.load({
modelSource: 'https://example.com/embedder.pte',
tokenizerSource: 'https://example.com/tokenizer.json'
});
}
async embedBatch(texts: string[]): Promise<Map<string, Float32Array>> {
const results = new Map<string, Float32Array>();
for (const text of texts) {
console.log(`Processing: "${text.substring(0, 50)}..."`);
const embedding = await this.embedder.forward(text);
results.set(text, embedding);
}
return results;
}
async saveEmbeddings(texts: string[], outputPath: string) {
const embeddings = await this.embedBatch(texts);
// Convert to JSON-serializable format
const data = Array.from(embeddings.entries()).map(([text, embedding]) => ({
text,
embedding: Array.from(embedding)
}));
// Save to file
const RNFS = require('react-native-fs');
await RNFS.writeFile(outputPath, JSON.stringify(data, null, 2));
console.log(`Saved ${data.length} embeddings to ${outputPath}`);
}
cleanup() {
this.embedder.delete();
}
}
// Usage
const batchEmbedder = new BatchTextEmbedder();
await batchEmbedder.initialize();
const texts = [
'First document text',
'Second document text',
'Third document text'
];
const embeddings = await batchEmbedder.embedBatch(texts);
console.log(`Generated ${embeddings.size} embeddings`);
// Or save to file
await batchEmbedder.saveEmbeddings(texts, '/path/to/embeddings.json');
batchEmbedder.cleanup();
Use Cases
- Semantic Search: Find documents by meaning, not just keywords
- Similarity Detection: Identify similar or duplicate content
- Question Answering: Match questions to relevant answers
- Recommendation: Recommend similar content based on user preferences
- Clustering: Group similar texts together
- Classification: Use embeddings as features for text classification
- Multilingual Search: Compare texts across languages (with multilingual models)
- Embedding generation is fast (typically < 50ms per text)
- Cache embeddings for frequently used texts
- Use vector databases (like FAISS) for large-scale similarity search
- Normalize embeddings before storing for efficient cosine similarity
- Batch processing is more efficient than individual calls
- Always call
delete() when done to free memory
Common Models
- sentence-transformers/all-MiniLM-L6-v2: 384 dimensions, fast and efficient
- sentence-transformers/all-mpnet-base-v2: 768 dimensions, higher quality
- BAAI/bge-small-en-v1.5: 384 dimensions, optimized for retrieval
See Also