Learn how to efficiently import data into your Orama database using various techniques and best practices.
Single Document Insertion
For inserting individual documents, use the insert method:
import { create , insert } from '@orama/orama'
const db = create ({
schema: {
name: 'string' ,
description: 'string' ,
price: 'number' ,
meta: {
rating: 'number'
}
}
})
const id = await insert ( db , {
name: 'Noise cancelling headphones' ,
description: 'Best noise cancelling headphones on the market' ,
price: 99.99 ,
meta: {
rating: 4.5
}
})
The insert method returns the document ID, which can be a user-provided string or an auto-generated unique identifier.
Batch Imports with insertMultiple
For importing large datasets, use insertMultiple with batch processing for optimal performance:
import { create , insertMultiple } from '@orama/orama'
const db = create ({
schema: {
title: 'string' ,
description: 'string' ,
category: 'enum'
}
})
const products = [
{ title: 'Product 1' , description: 'Description 1' , category: 'electronics' },
{ title: 'Product 2' , description: 'Description 2' , category: 'books' },
// ... thousands more documents
]
// Insert with batch size control
const ids = await insertMultiple (
db ,
products ,
1000 // Batch size: process 1000 documents at a time
)
Prepare your data
Ensure all documents conform to your schema definition. Invalid documents will throw a SCHEMA_VALIDATION_FAILURE error.
Choose batch size
The default batch size is 1000 documents. Adjust based on your memory constraints and document size.
Execute import
Call insertMultiple with your document array and optional batch size parameter.
Handle returned IDs
The method returns an array of document IDs in the same order as your input documents.
Schema Validation
Orama validates every document against your schema before insertion:
packages/orama/src/methods/insert.ts
const errorProperty = orama . validateSchema ( doc , orama . schema )
if ( errorProperty ) {
throw createError ( 'SCHEMA_VALIDATION_FAILURE' , errorProperty )
}
Supported Data Types
Type Description Example stringA string of characters 'Hello world'numberNumeric value (float or integer) 42 or 3.14booleanBoolean value true or falseenumEnumerated value 'drama'geopointGeographic coordinates { lat: 40.7128, lon: -74.0060 }string[]Array of strings ['red', 'green', 'blue']number[]Array of numbers [42, 91, 28.5]boolean[]Array of booleans [true, false, false]enum[]Array of enums ['comedy', 'action', 'romance']vector[<size>]Vector for semantic search [0.403, 0.192, 0.830]
Importing with Vector Embeddings
For vector and hybrid search, include embeddings in your documents:
import { create , insertMultiple } from '@orama/orama'
const db = create ({
schema: {
title: 'string' ,
embedding: 'vector[512]' // 512-dimensional embeddings
}
})
await insertMultiple ( db , [
{
title: 'The Prestige' ,
embedding: [ 0.938293 , 0.284951 , ... ] // 512 numbers
},
{
title: 'Oppenheimer' ,
embedding: [ 0.827391 , 0.927381 , ... ]
}
])
Use the @orama/plugin-embeddings plugin to automatically generate embeddings at insert-time! See the Plugin Embeddings documentation.
Custom Document IDs
By default, Orama generates unique IDs. To use custom IDs, provide an id field:
const db = create ({
schema: {
id: 'string' ,
name: 'string'
}
})
await insert ( db , {
id: 'custom-id-123' ,
name: 'Product Name'
})
Document IDs must be strings. Attempting to insert a document with a duplicate ID will throw a DOCUMENT_ALREADY_EXISTS error.
Nested Schema Properties
Orama supports nested object structures:
const db = create ({
schema: {
name: 'string' ,
meta: {
rating: 'number' ,
reviews: 'number' ,
verified: 'boolean'
},
tags: 'string[]'
}
})
await insert ( db , {
name: 'Product' ,
meta: {
rating: 4.5 ,
reviews: 120 ,
verified: true
},
tags: [ 'electronics' , 'new' , 'featured' ]
})
Language-Specific Tokenization
Specify the language for proper text analysis and stemming:
import { create , insert } from '@orama/orama'
const db = create ({
schema: {
title: 'string' ,
content: 'string'
},
language: 'spanish'
})
// Override language for specific documents
await insert ( db ,
{ title: 'Título' , content: 'Contenido en español' },
'spanish'
)
Orama supports stemming and tokenization in 30 languages. See Text Analysis for the full list.
Use Batch Imports Always prefer insertMultiple over multiple insert calls for better performance.
Optimize Batch Size The default 1000 documents per batch works well for most cases. Increase for small documents, decrease for large ones.
Skip Hooks When Needed Pass skipHooks: true to bypass plugin hooks during bulk imports for faster insertion.
Validate Data First Validate your data structure before importing to avoid errors mid-batch.
Advanced: Using Hooks
Execute custom logic before or after insertions:
const db = create ({
schema: { name: 'string' },
plugins: [{
name: 'my-plugin' ,
beforeInsert : async ( orama , id , doc ) => {
console . log ( `Inserting document ${ id } ` )
},
afterInsert : async ( orama , id , doc ) => {
console . log ( `Inserted document ${ id } ` )
},
afterInsertMultiple : async ( orama , docs ) => {
console . log ( `Inserted ${ docs . length } documents` )
}
}]
})
Error Handling
Handle common insertion errors gracefully:
try {
await insert ( db , document )
} catch ( error ) {
if ( error . code === 'SCHEMA_VALIDATION_FAILURE' ) {
console . error ( 'Document does not match schema:' , error . property )
} else if ( error . code === 'DOCUMENT_ALREADY_EXISTS' ) {
console . error ( 'Document with this ID already exists' )
} else if ( error . code === 'DOCUMENT_ID_MUST_BE_STRING' ) {
console . error ( 'Document ID must be a string' )
} else {
throw error
}
}
Importing from External Sources
import { readFile } from 'fs/promises'
import { create , insertMultiple } from '@orama/orama'
const data = JSON . parse ( await readFile ( 'data.json' , 'utf-8' ))
const db = create ({ schema: { /* your schema */ } })
await insertMultiple ( db , data , 1000 )
Import from CSV with parsing
import { parse } from 'csv-parse/sync'
import { readFile } from 'fs/promises'
const csvContent = await readFile ( 'data.csv' , 'utf-8' )
const records = parse ( csvContent , { columns: true })
await insertMultiple ( db , records )
const response = await fetch ( 'https://api.example.com/products' )
const products = await response . json ()
await insertMultiple ( db , products , 500 )
Next Steps
Search Your Data Learn how to query your imported documents
Performance Optimization Optimize your database for better performance
Data Persistence Save and load your database state
Vector Search Enable semantic search with embeddings