DraganSr: 202605

Saturday, May 23, 2026

AI HW: $95B IPO Cerebras vs Groq

Cerebras and Groq are both cutting-edge semiconductor startups challenging Nvidia by focusing on ultra-fast AI inference rather than generalized processing. Cerebras uses massive, wafer-scale chips to deliver record-breaking token throughput for massive models, while Groq uses smaller Language Processing Units (LPUs) optimized for extremely low latency. [1, 2, 3]

For an in-depth breakdown of how the hardware architectures and specific advantages of each chip stack up, watch this video:

Groq vs Cerebras: Lightning Fast Inference for LLMs! - YouTube

Andrew Feldman, Cerebras Systems | SC24 - YouTube

Groq vs Cerebras (2026) - Which One Is BETTER? - YouTube

Key Differences

Feature [1, 2, 3, 4, 5]	Cerebras	Groq
Architecture	Wafer-Scale Engine (WSE): One massive chip that keeps models completely on-chip with zero off-chip memory access.	Language Processing Unit (LPU): Custom deterministic architecture with 230MB of on-chip SRAM, scaling via proprietary fabric.
Model Size Handling	Excellent. A single Cerebras device can hold multi-billion parameter models in fast SRAM, reducing complexity and points of failure.	Smaller capacity per LPU. Large models require hundreds of networked chips, introducing complex clusters.
Inference Speed	Groundbreaking. Frequently sets industry records for highest token throughput (e.g., hundreds to thousands of tokens per second).	Industry-leading latency. Highly deterministic with steady output, ideal for real-time voice, robotics, and edge applications.
Primary Focus	Heavy enterprise inference, high-performance computing (HPC), and large-scale model deployment.	Latency-sensitive inference, cloud compute (GroqCloud), and immediate edge/real-time processing.
Accessibility	Available on-premises or via inference APIs (Meta, Hugging Face, OpenRouter, Vercel).	Available via GroqCloud, on-prem (GroqRack), and through API ecosystems like Hugging Face.

For more specific details and pricing considerations, review the official Cerebras CS-3 vs Groq LPU comparison and test out models directly on GroqCloud.

Cerebras Systems - Wikipedia

AI chipmaker Cerebras Systems completed one of the largest U.S. tech IPOs in history. Trading on the Nasdaq under the ticker CBRS, the stock priced at $\$185$ per share and opened its first day of trading at $350. [1, 2, 3]

Key Details of the Market Debut:

Date: May 14, 2026
Ticker: CBRS (Nasdaq)
Initial Pricing: $185 per share (above the initially marketed $115-$125 range)
First-Day Action: Surged up to $\$385$ intraday and closed up 68% at $\$311.07$, raising $\$5.55$ billion.
Market Capitalization: Reached approximately $67 billion at the close of its first day. [1, 2, 3, 4, 5, 6]

What You Should Know:

Business Model: Cerebras specializes in massive, wafer-scale chips designed to speed up AI model training and inference.
Key Customers: The company has recently diversified its revenue through massive infrastructure and deployment deals with major players like Amazon and OpenAI.
Valuation: The stock has drawn intense institutional demand, heavily oversubscribed ahead of its launch, though some analysts caution that it is trading at a significant premium based on current revenue. [1, 2, 3, 4, 5]

Cerebras IPO debuts at $95B valuation Chamath Palihapitiya

On May 14, Cerebras opened on the Nasdaq at $350 a share, peaked at $386, and closed at $311, valuing the AI chipmaker at ~$95 billion on day one. The company priced its IPO at $185 the night before, which was already a twice-upsized range, and raised $5.55 billion. It is the largest US tech IPO since Snowflake’s $3.8 billion debut in 2020. The stock ended the week at around $278.

Google I/O 2026 Releases and Cerebras’ $95B IPO w/ Andrew Feldman | EP #256 - YouTube
@Moonshots podcast

The video features a deep dive into Cerebras Systems and their industry-defining work in AI hardware, led by co-founder and CEO Andrew Feldman. Here are the key points regarding the company's trajectory and technology:

Record-Breaking IPO: The company recently achieved a major milestone with a 95 billion** 0:43 - 0:46, 1:33:00 - 1:33:45.
Wafer-Scale Computing: Cerebras distinguishes itself from standard GPU manufacturers by pioneering wafer-scale computing. Instead of using individual chips, they utilize an entire silicon wafer to accelerate AI training and inference, aiming to solve performance bottlenecks in large-scale model development 1:33:25 - 1:34:34, 1:54:24 - 1:55:30.
Focus on Inference: While the company is well-known for training capabilities, they have aggressively pivoted to meet the massive demand for fast inference. Andrew Feldman emphasizes that for AI to be truly useful, it must be capable of high-speed, cost-effective inference 1:40:20 - 1:41:23.
Hardware Advantages: The company’s architecture involves significant innovations in lithography, cooling, power delivery, and compiler design. Because they use a massive, singular engine, they had to solve unique problems regarding fault tolerance—specifically the ability to shut down a core and route around flaws—which makes their technology exceptionally resilient 1:54:53 - 1:55:30, 2:04:19 - 2:05:03.
Performance Benchmarks: Andrew Feldman noted that their systems can run trillion-parameter models with significantly higher token-per-second output compared to standard high-end GPU clusters 1:43:55 - 1:44:23.

Google AI in Chrome: Gemini Nano

Google Chrome integrates its lightweight, on-device AI model, Gemini Nano, into the browser. Running directly on your device, it powers features like text summarization, content rewriting, and scam detection without sending your data to external servers. [1, 2, 3]

How it works & controversies:

Silent Downloads: Chrome will silently download a ~4GB weights.bin file into your user profile directory when it detects your device meets the hardware and storage requirements.
Privacy and Speed: Because the AI operates client-side, your data remains secure within your device, and it can function offline once downloaded.
Controversy: The background download occurs without explicit user consent, which has sparked privacy and storage concerns among users. [1, 2, 3, 4, 5]

How to Disable and Remove Gemini Nano:
If you want to free up the storage space or opt out of on-device AI, you can easily disable it in your browser settings:

Click the three dots in the top-right corner of Chrome.
Go to Settings.
Select the System tab.
Toggle On-device AI to the Off position. [1, 2, 3]

Disabling this feature stops the model from running and triggers Chrome to remove the model files from your computer. [1, 2]

You can read more about Google's Built-In AI APIs and features directly on the Google for Developers AI on Chrome documentation.

Artificial Intelligence in Chrome | AI on Chrome | Chrome for Developers

With built-in AI, your browser provides and manages foundation and expert models. In Chrome, that includes Gemini Nano.

SpaceX’s $2T Case, Nvidia’s Shock Selloff, America Turns on AI, Trump Pulls AI Order, Bond Crisis? - YouTube @all-in podcast

In the video, the inclusion of Gemini Nano in the Chrome browser is discussed between (9:20) and (12:28).

The Discovery: It is noted that approximately two weeks prior to the recording, Google quietly included the Gemini Nano model in the Chrome browser without explicitly notifying users (9:20 - 9:46).
Technical Details: The model, which is roughly 4 gigabytes in size, is installed locally on the computer and handles tasks such as proofreading, spelling, and autocomplete (9:32 - 9:41).
Discussion on Privacy: The panel discusses the user reaction to this, noting that many people were surprised or "shocked" by the background download. While some raised concerns regarding privacy, the hosts generally agree that Google is not necessarily acting with malicious intent, suggesting it was more of a "speed error" in communication rather than them being a "bad actor" in the space (11:22 - 12:28).

Google I/O 2026 Releases and Cerebras’ $95B IPO w/ Andrew Feldman | EP #256 - YouTube @Moonshots podcasts

The video provides an extensive recap of Google I/O 2026, highlighting a massive shift toward "agentic" AI across the Google ecosystem. The participants note that Google successfully "disrupted the disruptors," pivoting from being perceived as "cooked" to re-establishing leadership with full-stack AI integration (0:00 - 12:44).

Key Highlights from Google I/O 2026:

Gemini Omni & 3.5 Flash: Google introduced Gemini Omni for real-time multimodal interaction (video, text, audio) and Gemini 3.5 Flash, emphasizing superior throughput and speed (12:44 - 13:30, 18:41 - 20:00).
Agentic Operating Systems: The event showcased Anti-Gravity 2.0, a dedicated desktop application designed to orchestrate multiple AI agents in parallel to perform complex tasks like software development (33:00 - 34:26).
Gemini Spark: Positioned as an always-on personal agent, Spark handles background tasks like email drafting, RSVP tracking, and financial monitoring, utilizing dedicated virtual machines to save user time (39:50 - 41:30).
AI Search & Shopping: Google debuted an AI-powered search mode that changes the "shape of the rectangle" to provide interactive, agent-driven results, alongside a Universal Cart for cross-merchant shopping (48:30 - 55:10).
Science & Innovation: Gemini for Science was introduced to accelerate research, helping scientists generate hypotheses and simulate complex systems (1:15:00 - 1:18:43).

Note on Gemini Nano & Chrome: While the video focuses on the broader "agentic" transition at I/O, external context clarifies the role of Gemini Nano in the browser. Unlike the large, cloud-based Gemini models discussed in the video, Gemini Nano is an on-device model integrated directly into the Chrome browser (starting with Chrome 138). Its primary benefits include:

Privacy: Processes sensitive data locally, ensuring it never leaves the user's machine.
Performance: Eliminates network latency for tasks like text summarization, content rewriting, and language detection.
Security: Provides local, real-time scanning to detect phishing and scam sites.
Developer APIs: Enables web developers to build intelligent, chat-like features directly into their web applications using local compute.