Wednesday, October 09, 2024

AI embeddings: Cosine similarity versus dot product

classification - Cosine similarity versus dot product as distance metrics - Data Science Stack Exchange

It looks like the cosine similarity of two features is just their dot product scaled by the product of their magnitudes. When does cosine similarity make a better distance metric than the dot product?

Think geometrically. Cosine similarity only cares about angle difference, while dot product cares about angle and magnitude. If you normalize your data to have the same magnitude, the two are indistinguishable. Sometimes it is desirable to ignore the magnitude, hence cosine similarity is nice, but if magnitude plays a role, dot product would be better as a similarity measure. Note that neither of them is a "distance metric".


How to Implement Cosine Similarity in Python | by DataStax | Medium

If the cosine similarity is 1, it means the vectors have the same direction and are perfectly similar. If the cosine similarity is 0, it means the vectors are perpendicular to each other and have no similarity. If the cosine similarity is -1, it means the vectors have opposite directions and are perfectly dissimilar.

A = [5, 3, 4]
B = [4, 2, 4]

# Calculate dot product
dot_product = sum(a*b for a, b in zip(A, B))

# Calculate the magnitude of each vector
magnitude_A = sum(a*a for a in A)**0.5
magnitude_B = sum(b*b for b in B)**0.5

# Compute cosine similarity
cosine_similarity = dot_product / (magnitude_A * magnitude_B)
print(f"Cosine Similarity using standard Python: {cosine_similarity}")


embeddings dot product vs cosine Similarity - Google Search

In data science, the dot product and cosine similarity are both used to measure similarity between vectors, but they differ in how they do so:

Dot product
This metric is a fundamental similarity metric that sums the products of corresponding elements in two vectors. It reflects both the direction and magnitude of the vectors, and can range from negative to positive infinity. A value of 0 indicates that the vectors are perpendicular, while negative values indicate opposite directions and positive values indicate alignment. The dot product is especially useful for high-dimensional vectors.

Cosine similarity
This metric is the cosine of the angle between two vectors, or the dot product between their normalizations. It normalizes the value to account only for direction, and results in a range from -1 to 1. Cosine similarity is often used to quantify semantic similarity between high-dimensional objects.

The choice of dot product or cosine similarity depends on the type of similarity being measured. For example, the dot product takes raw word count more into account, while cosine similarity takes proportional word distribution more into account.

export function dotProduct(a, b) {
    return a.map((value, index) => value * b[index]).reduce((a, b) => a + b, 0);
}

export function cosineSimilarity(a, b) {
    const product = dotProduct(a, b);
    const aMagnitude = Math.sqrt(a.map(value => value * value).reduce((a, b) => a + b, 0));
    const bMagnitude = Math.sqrt(b.map(value => value * value).reduce((a, b) => a + b, 0));
    return product / (aMagnitude * bMagnitude);
}

No comments: