Monday, July 15, 2024

Nomic Atlas: AI data clustering tool

nomic-ai/nomic: Interact, analyze and structure massive text, image, embedding, audio and video datasets @GitHub

Python bindings for working with Nomic Atlas, the world’s most powerful unstructured data interaction platform. Atlas supports datasets from hundreds to tens of millions of points, and supports data modalities ranging from text to image to audio to video.

With Nomic Atlas, you can:
  • Generate, store and retrieve embeddings for your unstructured data.
  • Find insights in your unstructured data and embeddings all from your web browser.
  • Share and present your datasets and data findings to anyone.

Nomic Atlas  //atlas.nomic.ai/

Interact, discover insights and build
with unstructured text, image and audio data.

Course: The Complete OPENAI JS APIs Course - Build 15 Projects | Udemy


AI: llm.c vs GPT-2, $672, 24h, by Andrej Karpathy

 Let's reproduce GPT-2 (1.6B): one 8XH100 node, 24 hours, $672, in llm.c · karpathy/llm.c · Discussion #677

by karpathy (Andrej)

In this post we are reproducing GPT-2 in llm.c. This is "the GPT-2", the full, 1558M parameter version that was introduced in OpenAI's blog post Better Language Models and their Implications in February 14, 2019. llm.c does so directly in C/CUDA (total of ~5,000 lines of code), without the typical training stack that would involve the Python interpreter and a significantly more complex deep learning library like PyTorch/JAX, huggingface/transformers, or etc. In 2019, training GPT-2 was an involved project from an entire team and considered a big model run but, ~5 years later, due to improvements in compute (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data (e.g. the FineWeb-Edu dataset), we can reproduce this model on a single 8XH100 node in 24 hours, and for $672, which is quite incredible


Feed | LinkedIn by Aleksa Gordić | LinkedIn


Andrej Karpathy - Wikipedia

Andrej Karpathy (born 23 October 1986[2]) is a Slovak-Canadian computer scientist who served as the director of artificial intelligence and Autopilot Vision at Tesla. He co-founded and formerly worked at OpenAI,[3][4][5] where he specialized in deep learning and computer vision