DraganSr: 2024-05-26

Sunday, May 26, 2024

AI course: RAG with Llamaindex

DLAI - Building Agentic RAG with Llamaindex @deeplearning.ai

What Is Retrieval-Augmented Generation aka RAG | NVIDIA Blogs

Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.

LlamaIndex, Data Framework for LLM Applications

LlamaIndex is the leading data framework for building LLM applications

AI LLMs: Tokens, Embeddings, Vector Databases

Strange as it may look after trying ChatGPT,
modern AI's don't "speak" English, or any other human language.

Vectors

Sequences of numbers, usually called arrays in programming, and "vectors" in mathematics.

i.e. like this,[123, 4, 56] or [0.34, 0.23, 0.01, 0.98], just much longer

Digital computers "speak" numbers, and numbers only.

In particular GPU and NPU components used to process data,
are optimized for very fast processing such data.

So all "text", images and other content needs to be "translated" to such arrays of numbers,

processed, and results translated back to human-understandable form.

There are two different concepts related to this translation that look similar
while have different purpose: "tokens" and "embeddings".

To make any sense of AI APIs, one needs to understand those concepts.

Tokens

are about usage and money. It is about "quantity" of text sent to and received from AI API.

Depending on API used, there are limits and costs about how much content, i.e. text,
can be sent at once, and price is calculated based on this. A token is an integer number, and

usually is mapped to a few characters that is usually a small word, or part of larger word.

For example "Hello World!" is converted to [9906, 4435, 0] = ["Hello", "World", "!"]
and "Sequences" is [1542, 45045] = ["Se", "quences"]

OpenAI Platform: Tokenizer

To measure and estimate const of AI API calls we can use libraries, i.e OpenAI tiktoken

openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models. @GitHub (Python)

tiktoken - npm (JavaScript)

import { encoding_for_model } from 'tiktoken'
const encoder = encoding_for_model('gpt-3.5-turbo');
const words = encoder.encode(prompt);

Embeddings

are about "meaning" and "similarity" of content.

Through process of "training", AI "models" are able to map content to array of floating point numbers, usually between 0 and 1, and to compare such arrays, called vectors to find "similarity" or "distance".

For example, "Hello World" can be converted to vector of about 1500 numbers like this
"input": "Hello World!",
"embedding": [
-0.0030342294,
-0.056672804,
0.029482627,
0.042976152,
-0.040828794,
-0.025202423,
-0.012789831,
0.035228256,
-0.031571947,

...

Embeddings - OpenAI API

import OpenAI from 'openai'
const openai = new OpenAI();
export async function createEmbeddings(input: string| string[]) {
    return await openai.embeddings.create({
        input: input,
        model: 'text-embedding-3-small'
    })
}

Getting Started With Embeddings

"An embedding is a numerical representation of a piece of information, for example, text, documents, images, audio, etc. The representation captures the semantic meaning of what is being embedded

Since the embeddings capture the semantic meaning of the questions, it is possible to compare different embeddings and see how different or similar they are. Thanks to this, you can get the most similar embedding to a query, which is equivalent to finding the most similar FAQ. Check out our semantic search tutorial for a more detailed explanation of how this mechanism works."

LLM AI Embeddings | Microsoft Learn

Word embedding - Wikipedia

Calculating Similarity between Embeddings

export function calcDotProduct(a: number[], b: number[]) {
    return a.map((value, index) => value * b[index]).reduce((a, b) => a + b, 0);
}

function calcCosineSimilarity(a: number[], b: number[]) {
    const product = dotProduct(a, b);
    const aMagnitude = Math.sqrt(a.map(value => value * value).reduce((a, b) => a + b, 0));
    const bMagnitude = Math.sqrt(b.map(value => value * value).reduce((a, b) => a + b, 0));
    return product / (aMagnitude * bMagnitude);
}

The Building Blocks of LLMs: Vectors, Tokens and Embeddings @ TheNewStack

a vector is a single-dimensional array, in this case of numbers only

Tokens are the basic units of data processed by LLMs. In the context of text, a token can be a word, part of a word (subword), or even a character — depending on the tokenization process.

In LLMs, vectors are used to represent text or data in a numerical form that the model can understand and process. This representation is known as an embedding. Embeddings are high-dimensional vectors that capture the semantic meaning of words, sentences or even entire documents.

Vector Databases

vector databases list - Google Search

Picking a vector database: a comparison and guide for 2023

What are Vector Databases and Why Are They Important for LLMs? - KDnuggets

Top 16 Best Vector Databases for 2024 | Detailed List

Vector Database | Microsoft Learn

Chroma: the AI-native open-source embedding database

Chroma Docs

chroma-core/chroma: the AI-native open-source embedding database @GitHub

chromadb/chroma - Docker Image | Docker Hub

The vector database to build knowledgeable AI | Pinecone

Pinecone serverless lets you deliver remarkable GenAI applications faster, at up to 50x lower cost.

(online only, free tier available)

Your Guide to Vectorizing Structured Text | Pinecone

A Developer’s Guide to Approximate Nearest Neighbor (ANN) Algorithms | Pinecone

PostgreSQL + PV Vector extension

Milvus

Weaviate

Faiss

Vespa

Redis

World's most downloaded vector database: Elasticsearch | Elastic

What are Vector Embeddings? | A Comprehensive Vector Embeddings Guide | Elastic

EV fast charging startup funded by Google

Google funded company plan to beat Tesla with 1,000's of 500kW chargers - YouTube

Gravity to add 500 kW EV charger trees on streets, targets Tesla @electrek

NY-based startup and EV infrastructure specialist Gravity has launched a new line of universal EV charger “trees” it hopes will bring convenient charging sessions curbside on city streets. The deployment will start modestly, but Gravity is targeting a street charging network that is ” more expansive than Tesla’s current Supercharger network.”