Monday, May 04, 2026

Running AI locally: Ollama in Docker container

Ollama is not "free only" anymore, has paid versions.
But the installer is not digitally signed, and is HUGE, 2 GB.
So Windows 11 rejects it for security reasons.

Download Ollama on Windows



Ollama can be run in Docker using its official Docker image
Running Ollama in a container provides a clean, isolated environment and simplifies deployment across different systems. [1, 2, 3, 4]

CPU-only:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama


Nvidia GPU support (Requires NVIDIA Container Toolkit):

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

AMD GPU support:

docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm



1.4 GB




The Ollama Docker container appears significantly larger than the native Windows installation primarily due to a space-for-time tradeoff and the nature of how Docker reports image sizes.
1. Uncompressed Run-Time Payload
In recent versions (0.3.11+), Ollama developers changed how the Docker image is built to prioritize startup speed.
  • Windows/Linux Native: The installer carries compressed LLM runner executables and libraries that are extracted to a temporary directory at runtime. This saves disk space but adds a 5–20 second delay during every startup.
  • Docker Container: To make container startup nearly instantaneous (less than a second), these runners and libraries are stored uncompressed and ready-to-execute within the container's root filesystem. [1]
2. Reported vs. Compressed Size
When you view image sizes in Docker, it displays the uncompressed size of the entire filesystem. [1]
  • The actual "wire size" (what you download from Docker Hub) is often much smaller because the layers are compressed during transit.
  • The native Windows installer size reflects the compressed package, whereas the Docker "Size" column reflects the fully expanded environment. [1]

Summary of Differences
Feature [1, 2, 3, 4, 5]Windows NativeDocker Container
StartupSlower (extracts files at boot)Near-instant (pre-extracted)
IsolationShared with host systemFully isolated
Reported SizeSmaller (compressed)Larger (uncompressed filesystem)
DependenciesUses host librariesBundled in image layers


gemma4
Gemma 4 models are designed to deliver frontier-level performance at each size. They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.










docker run -d \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama


docker run -d \
--gpus=all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama

# and with the flag as...
--gpus=all Passes all available GPUs to the container (this one requires the NVIDIA Container Toolkit)


docker exec -it ollama ollama run llama3


No comments: