How to create and query vector stores
Head to Integrations for documentation on built-in integrations with vectorstore providers.
One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.
Get started
This walkthrough showcases basic functionality related to VectorStores. A key part of working with vector stores is creating the vector to put in them, which is usually created via embeddings. Therefore, it is recommended that you familiarize yourself with the text embedding model interfaces before diving into this.
This walkthrough uses a basic, unoptimized implementation called MemoryVectorStore that stores embeddings in-memory and does an exact, linear search for the most similar embeddings.
Usage
Create a new index from texts
- npm
- Yarn
- pnpm
npm install @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
const vectorStore = await MemoryVectorStore.fromTexts(
["Hello world", "Bye bye", "hello nice world"],
[{ id: 2 }, { id: 1 }, { id: 3 }],
new OpenAIEmbeddings()
);
const resultOne = await vectorStore.similaritySearch("hello world", 1);
console.log(resultOne);
/*
[
Document {
pageContent: "Hello world",
metadata: { id: 2 }
}
]
*/
API Reference:
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
Create a new index from a loader
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
import { TextLoader } from "langchain/document_loaders/fs/text";
// Create docs with a loader
const loader = new TextLoader("src/document_loaders/example_data/example.txt");
const docs = await loader.load();
// Load the docs into the vector store
const vectorStore = await MemoryVectorStore.fromDocuments(
docs,
new OpenAIEmbeddings()
);
// Search for the most similar document
const resultOne = await vectorStore.similaritySearch("hello world", 1);
console.log(resultOne);
/*
[
Document {
pageContent: "Hello world",
metadata: { id: 2 }
}
]
*/
API Reference:
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
- TextLoader from
langchain/document_loaders/fs/text
Here is the current base interface all vector stores share:
interface VectorStore {
/**
* Add more documents to an existing VectorStore.
* Some providers support additional parameters, e.g. to associate custom ids
* with added documents or to change the batch size of bulk inserts.
* Returns an array of ids for the documents or nothing.
*/
addDocuments(
documents: Document[],
options?: Record<string, any>
): Promise<string[] | void>;
/**
* Search for the most similar documents to a query
*/
similaritySearch(
query: string,
k?: number,
filter?: object | undefined
): Promise<Document[]>;
/**
* Search for the most similar documents to a query,
* and return their similarity score
*/
similaritySearchWithScore(
query: string,
k = 4,
filter: object | undefined = undefined
): Promise<[object, number][]>;
/**
* Turn a VectorStore into a Retriever
*/
asRetriever(k?: number): BaseRetriever;
/**
* Delete embedded documents from the vector store matching the passed in parameter.
* Not supported by every provider.
*/
delete(params?: Record<string, any>): Promise<void>;
/**
* Advanced: Add more documents to an existing VectorStore,
* when you already have their embeddings
*/
addVectors(
vectors: number[][],
documents: Document[],
options?: Record<string, any>
): Promise<string[] | void>;
/**
* Advanced: Search for the most similar documents to a query,
* when you already have the embedding of the query
*/
similaritySearchVectorWithScore(
query: number[],
k: number,
filter?: object
): Promise<[Document, number][]>;
}
You can create a vector store from a list of Documents, or from a list of texts and their corresponding metadata. You can also create a vector store from an existing index, the signature of this method depends on the vector store you're using, check the documentation of the vector store you're interested in.
abstract class BaseVectorStore implements VectorStore {
static fromTexts(
texts: string[],
metadatas: object[] | object,
embeddings: EmbeddingsInterface,
dbConfig: Record<string, any>
): Promise<VectorStore>;
static fromDocuments(
docs: Document[],
embeddings: EmbeddingsInterface,
dbConfig: Record<string, any>
): Promise<VectorStore>;
}
Which one to pick?
Here's a quick guide to help you pick the right vector store for your use case:
- If you're after something that can just run inside your Node.js application, in-memory, without any other servers to stand up, then go for HNSWLib, Faiss, LanceDB or CloseVector
- If you're looking for something that can run in-memory in browser-like environments, then go for MemoryVectorStore or CloseVector
- If you come from Python and you were looking for something similar to FAISS, try HNSWLib or Faiss
- If you're looking for an open-source full-featured vector database that you can run locally in a docker container, then go for Chroma
- If you're looking for an open-source vector database that offers low-latency, local embedding of documents and supports apps on the edge, then go for Zep
- If you're looking for an open-source production-ready vector database that you can run locally (in a docker container) or hosted in the cloud, then go for Weaviate.
- If you're using Supabase already then look at the Supabase vector store to use the same Postgres database for your embeddings too
- If you're looking for a production-ready vector store you don't have to worry about hosting yourself, then go for Pinecone
- If you are already utilizing SingleStore, or if you find yourself in need of a distributed, high-performance database, you might want to consider the SingleStore vector store.
- If you are looking for an online MPP (Massively Parallel Processing) data warehousing service, you might want to consider the AnalyticDB vector store.
- If you're in search of a cost-effective vector database that allows run vector search with SQL, look no further than MyScale.
- If you're in search of a vector database that you can load from both the browser and server side, check out CloseVector. It's a vector database that aims to be cross-platform.
- If you're looking for a scalable, open-source columnar database with excellent performance for analytical queries, then consider ClickHouse.