DraganSr: Running AI locally: Ollama in Docker container

Monday, May 04, 2026

Running AI locally: Ollama in Docker container

Ollama is not "free only" anymore, has paid versions.
But the installer is not digitally signed, and is HUGE, 2 GB.
So Windows 11 rejects it for security reasons.

Download Ollama on Windows

Pricing · Ollama

Ollama can be run in Docker using its official Docker image.

Running Ollama in a container provides a clean, isolated environment and simplifies deployment across different systems. [1, 2, 3, 4]

CPU-only:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Nvidia GPU support (Requires NVIDIA Container Toolkit):

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

AMD GPU support:

docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm

ollama/ollama - Docker Image

1.4 GB

ollama/ollama: Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models. @GitHub

The Ollama Docker container appears significantly larger than the native Windows installation primarily due to a space-for-time tradeoff and the nature of how Docker reports image sizes.

1. Uncompressed Run-Time Payload

In recent versions (0.3.11+), Ollama developers changed how the Docker image is built to prioritize startup speed.

Windows/Linux Native: The installer carries compressed LLM runner executables and libraries that are extracted to a temporary directory at runtime. This saves disk space but adds a 5–20 second delay during every startup.
Docker Container: To make container startup nearly instantaneous (less than a second), these runners and libraries are stored uncompressed and ready-to-execute within the container's root filesystem. [1]

2. Reported vs. Compressed Size

When you view image sizes in Docker, it displays the uncompressed size of the entire filesystem. [1]

The actual "wire size" (what you download from Docker Hub) is often much smaller because the layers are compressed during transit.
The native Windows installer size reflects the compressed package, whereas the Docker "Size" column reflects the fully expanded environment. [1]

Summary of Differences

Feature [1, 2, 3, 4, 5]	Windows Native	Docker Container
Startup	Slower (extracts files at boot)	Near-instant (pre-extracted)
Isolation	Shared with host system	Fully isolated
Reported Size	Smaller (compressed)	Larger (uncompressed filesystem)
Dependencies	Uses host libraries	Bundled in image layers

gemma4
Gemma 4 models are designed to deliver frontier-level performance at each size. They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.

How to Install Ollama on Windows 11 | Use Ollama for Running AI Models Locally (2026) - YouTube

How to run Ollama on Docker - YouTube

Pros and Cons using containerized Ollama vs. local setup for Generative AI Applications | by Alain Airom (Ayrom) | Medium

docker run -d \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama

docker run -d \
--gpus=all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama

# and with the flag as...
--gpus=all Passes all available GPUs to the container (this one requires the NVIDIA Container Toolkit)

docker exec -it ollama ollama run llama3

Monday, May 04, 2026

Running AI locally: Ollama in Docker container

No comments: