Monday, March 11, 2024

AI: Vector Databases, Embeddings, Tokenization, Transformers

Vector Databases: The Secret Sauce of the AI Revolution | by David Gutsch | Medium

"Ever wondered how your favorite music streaming service seems to read your mind, suggesting songs that perfectly fit your mood? Or how your online shopping platform knows just what you need, even before you do? Ever marveled at the extensible models and applications that are being built on top of LLMs like ChatGPT? The secret behind these modern marvels is not magic, but a powerful tool in the realm of databases: vector databases. Let’s embark on a journey to unravel the mysteries of these unsung heroes of the AI revolution."

Vector Database transforms data into vectors in a multi-dimensional space. Similar to a Graph Database that represents data relationships, a Vector Database represents data similarity, with closer vectors indicating more similar data. It’s a powerful tool for machine learning and AI applications, enabling efficient similarity searches and clustering in high-dimensional data.

The Art of Embeddings: Transforming Text for Vector Databases (Part 2) | by David Gutsch | Medium

"Embeddings are a fundamental concept in deep learning that enable us to capture rich context in ones and zeros. They are powerful, flexible, and while their implementations are necessarily complex, their purpose is beautifully simple.
...
The process of transforming text into embeddings begins with tokenization, which is the process of breaking down text into smaller parts, or “tokens.” These tokens can be as small as individual characters or as large as entire sentences, However, in most cases they represent individual words or sub-words."