DraganSr: 2025-04-10

Thursday, April 10, 2025

AI HW: NVIDIA DGX Spark vs AWS

NVIDIA DGX Spark US - A Grace Blackwell AI supercomputer on your desk

this is 1000 "TOPS" (Tera Operations Per Second) AI performance, very large

for $3999

This is the ASUS Ascent GX10 a NVIDIA GB10 Mini PC with 128GB of Memory and 200GbE

apparently there are other similar systems, with same GPU, like this one from ASUS
for $1000 less.
On the NVIDIA GB10 motherboard, we can see the NVIDIA GB10 chip with 10 Cortex-X925 and 10 Cortex-A725 Arm cores for 20 cores total.

here are comparable cloud-based instances with pricing

The math gets "interesting"... when reserved for 3 years, that instance price is $0.79
But the grand total for 3 years is just about $20,000 !
So AWS price is 5x more than NVIDIA HW price, or almost 6.7x ASUS HW price!
The cost of electricity like likely not even near price of HW.
So for those with significant AI load, owning HW can be very profitable.

Compute – Amazon EC2 Inf2 instances – AWS

Instance Size	Inferentia2 Chips	Accelerator Memory (GB)	vCPU	Memory (GiB)	Local Storage	Inter-Chip Interconnect	Network Bandwidth (Gbps)	EBS Bandwidth (Gbps)	On-Demand Price	1-Year Reserved Instance	3-Year Reserved Instance

inf2.8xlarge	1	32	32	128	EBS Only	N/A	Up to 25	10	$1.97	$1.81	$0.79

AI with TypeScript: Deno, Llama, Jupyter

The Dino 🦕, the Llama 🦙, and the Whale 🐋

An environment for our language model – while you can connect up to various LLM hosting environments via APIs, we are going to leverage the Ollama framework for running language models on your local machine.
A large language model – we will use a resized version of DeepSeek R1 that can run locally.
A notebook – Jupyter Notebook for interactive code and text.
Deno – a runtime that includes a built-in Jupyter kernel. We assume a recent version is installed.
An IDE – we’ll use VSCode with built-in Jupyter Notebook support and the Deno extension (extension link).
An AI library/framework – LangChain.js to simplify interactions with the LLM.
A schema validator – we’ll structure LLM output. We will use zod for this.

Build a custom RAG AI agent in TypeScript and Jupyter

Retrieve and prepare several blog posts to be used by our AI agent.
Create an AI agent which has several tools:

A tool to query the blog posts in the database.
A tool to grade if the documents are relevant to the query.
The ability to rewrite and improve the query if required.

Finally we generate a response to the query based on our collection of information.