Sunday, February 18, 2024

Vector databases for AI embeddings

In natural language processing, a sentence embedding refers to a numeric representation of a sentence in the form of a vector of real numbers which encodes meaningful semantic information.

A vector database management system (VDBMS) or simply vector database or vector store is a database that can store vectors (fixed-length lists of numbers) along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms,[1][2] so that one can search the database with a query vector to retrieve the closest matching database records.

Vector databases can be used for similarity search, multi-modal search, recommendations engines, large language models (LLMs), etc

vector databases

A vector database indexes and stores vector embeddings for fast retrieval and similarity search, with capabilities like CRUD operations, metadata filtering, horizontal scaling, and serveless.

AI (LLMs) rely on vector embeddings, a type of vector data representation that carries within it semantic information that’s critical for the AI to gain understanding and maintain a long-term memory they can draw upon when executing complex tasks.

Pinecone serverless lets you deliver remarkable GenAI applications faster, at up to 50x lower cost.

@Github, Apache, GoLang
Milvus is an open-source vector database built to power embedding similarity search and AI applications. 

courses on Udemy

"Learn Vector Database using Python, Pinecone, LangChain, Open AI, Hugging Face and build out AI, ML , Chat applications" 7.5h
by Dr. KM Mohsin


FAISS local vectorstore


a good brief explanation of LLMs (AI transformers models), embeddings, "attention", with examples

No comments: