Build better models, faster
Training data and experimentation slow down when raw data, metadata, embeddings, features, and governance artifacts live in separate systems. LanceDB keeps them together in one versioned multimodal table, so AI teams spend less time stitching infrastructure together and more time improving datasets, testing features, and keeping GPUs fed.LanceDB suite
The LanceDB suite includes LanceDB OSS, an open-source embedded retrieval library, and LanceDB Enterprise, a multimodal lakehouse platform for the full AI data lifecycle. OSS is easy to set up on a local machine for search and regular-scale workflows. LanceDB Enterprise is built for teams that need scale without building bespoke infrastructure for curation, feature engineering, search and retrieval, and efficient training data access.Why teams use LanceDB
One table for the whole AI data loop
Store images, video, audio, text, annotations, embeddings, and model-generated features together in one schema-enforced table.
The same table can support dataset curation, feature backfills, experiment splits, retrieval, and training.
High-throughput data access for training
Training workloads mix fast random access with high-throughput sequential scans. LanceDB is designed for both, so
teams can shuffle data into GPU-ready batches more efficiently, improve input throughput, and iterate on experiments faster.
Fast, versatile search and retrieval
Whether the end user is a human or an agent, LanceDB powers production retrieval workloads such as semantic search,
hybrid search, RAG, agent memory, and recommendation systems. Retrieval runs against the same LanceDB tables used
for curation, feature engineering, and training workflows.
Start with your workload
Train and fine-tune models
Learn why LanceDB works well as the data layer for training workloads.
Load data into PyTorch
Use LanceDB tables and permutations for projected, shuffled, random-access training reads.
Browse ready-to-use datasets
Explore Lance-formatted multimodal datasets with raw bytes, metadata, embeddings, and indices.
Build search and retrieval
Use vector search, full-text search, hybrid search, reranking, filtering, and SQL.
From local development to production scale
LanceDB OSS and LanceDB Enterprise share the same Lance format and table model. Start locally with the embedded OSS library, then move to Enterprise when your team needs distributed scale, managed infrastructure, private deployment, or higher-throughput curation, feature engineering, search and retrieval, and training workflows.1. LanceDB OSS
The fastest way to get started is the open-source embedded library, with client SDKs in Python, TypeScript and Rust. Run it locally in just a few steps, which lets you explore datasets, curate data, and run search and retrieval workloads for agents. Start here:Quickstart
Get started with LanceDB in minutes.
Basic Table Operations
Create tables, evolve schemas, version data, and modify rows in LanceDB.
2. LanceDB Enterprise
LanceDB Enterprise is a petabyte-scale (and beyond), distributed multimodal lakehouse platform built for search, curation, feature engineering, and high-throughput training data access workflows on top of the same core table abstraction. This eliminates the need for teams to build bespoke infrastructure to manage large multimodal datasets. To set up LanceDB Enterprise in your organization, reach out to us at contact@lancedb.com.Built with scale, performance, and security in mind.LanceDB Enterprise is designed for very large-scale, high-performance, distributed workloads in
private deployments, and can operate under strict security requirements.
Quickstart
Get started with LanceDB Enterprise in minutes.