Open WebUI on NAS: Setup Guide

By the Need to Know IT Team · Last updated 25 March 2026

How to install Open WebUI on a NAS using Docker. A self-hosted ChatGPT-style interface for local LLMs via Ollama. Covers setup on Synology and QNAP with GPU passthrough notes.

Open WebUIOllamalocal LLMNASself-hosted AI

Open WebUI is a self-hosted, ChatGPT-style interface for running large language models locally. It connects to Ollama (a local LLM runtime) and provides a polished chat interface for models like Llama 3, Mistral, Phi-3, and others. Running it on a NAS means your AI assistant is always available on your local network, requires no API key or subscription, and keeps all data local. This guide covers deploying the Open WebUI + Ollama Docker stack on a NAS, model selection for NAS hardware, and the GPU passthrough configuration that significantly improves inference speed on capable hardware.

ⓘ

In short: Deploy Ollama and Open WebUI as Docker containers on your NAS, pull a model (start with Llama 3.2 3B or Phi-3 Mini for resource-constrained NAS hardware), and access the chat interface at port 3000. LLM inference on CPU is slow. Expect 2-8 tokens/second on a Celeron NAS. GPU passthrough on QNAP PCIe models or dedicated GPU hardware dramatically improves speed.

Hardware Reality: What to Expect from NAS LLMs

NAS hardware is not designed for LLM inference. Setting realistic expectations:

Intel Celeron N5095 (TS-464, DS423+): CPU-only inference at 2-5 tokens/second for a 3B parameter model. Usable for non-time-critical tasks. Not suitable for interactive conversation with large (7B+) models
AMD Ryzen R1600 (DS923+, TS-473A): Slightly faster at ~5-8 tokens/second for 3B models. The integrated GPU can assist with some models
QNAP with PCIe GPU (TS-473A + NVIDIA GPU): Adding a dedicated GPU via PCIe (NVIDIA RTX 3060/4060 in a compatible half-height format) enables GPU inference. 40-80+ tokens/second for 7B models. This is the correct hardware approach for real-time LLM use on NAS hardware

For casual, non-real-time use (asking questions and waiting 30-60 seconds for a full response), CPU-only inference on a Celeron NAS is functional. For interactive use, a GPU or dedicated inference hardware is needed.

Step 1: Deploy Ollama and Open WebUI

Create a Docker Compose file at /volume1/docker/openwebui/docker-compose.yml (Synology) or /share/docker/openwebui/docker-compose.yml (QNAP):

version: '3.8'
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ./ollama:/root/.ollama
    restart: unless-stopped

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    depends_on:
      - ollama
    ports:
      - 3000:8080
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - ./open-webui-data:/app/backend/data
    restart: unless-stopped

Deploy with docker compose up -d. First startup downloads the Open WebUI image (~1.5GB) and starts both services. Access Open WebUI at http://[NAS-IP]:3000. Create an admin account on first access.

Step 2: Pull a Model

After Open WebUI loads, pull a model from Ollama's library. Model selection depends on your NAS RAM:

4GB RAM available for Ollama: Use Phi-3 Mini (3.8B parameters, ~2.3GB) or Llama 3.2 3B (~2GB). These are the smallest capable models
8GB RAM available: Use Llama 3.2 3B or Mistral 7B (~4.1GB). Mistral 7B is significantly more capable than 3B models
16GB RAM available: Use Llama 3.1 8B (~4.7GB) or Mistral 7B. More comfortable headroom

To pull a model in Open WebUI: Admin Settings → Models → pull from Ollama library. Enter the model name (e.g. llama3.2:3b) and click Pull. Model downloads from Ollama's registry. Sizes range from 2GB to 70GB+. First pull may take 20-60 minutes depending on model size and internet speed.

Alternatively, pull from the Ollama container CLI: docker exec -it ollama ollama pull llama3.2:3b

Step 3: GPU Passthrough (QNAP PCIe Models)

QNAP NAS models with PCIe slots (TS-473A, TS-673A) can host a GPU card for hardware-accelerated inference. NVIDIA consumer GPUs (RTX 3060/4060 in low-profile form factor) work with Ollama's CUDA backend.

To enable GPU passthrough in the Compose file, modify the Ollama service:

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ./ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

This requires the NVIDIA Container Toolkit installed on the NAS host. On QNAP, this is available as a QTS package for supported GPU models. After configuration, verify GPU usage: docker exec -it ollama ollama ps. Running models should show GPU allocation.

🇦🇺 Australian Users: Hardware Notes

Recommended hardware configurations for local LLM on NAS in Australia (March 2026):

QNAP TS-473A (~$1,269) + NVIDIA RTX 3060 12GB ($1269-$1576 AUD): Best self-hosted LLM NAS platform in the current AU lineup. AMD Ryzen CPU, PCIe slot for GPU, 8GB RAM expandable. The RTX 3060 12GB handles 7B-13B models at 40-80 tokens/second. Total cost ~$1,700-1,800 AUD
Intel Celeron NAS (TS-464, DS423+). CPU only: Usable for 3B models at 2-5 tokens/second. Acceptable for summarisation tasks and non-interactive queries. Not suitable for real-time conversation with capable models

If you want local LLM inference as a primary use case rather than an add-on, a dedicated mini-PC with integrated GPU (Intel Core Ultra or AMD Ryzen with strong integrated graphics) or a PC with a used NVIDIA card provides better price/performance than a NAS with GPU card.

See the best NAS for local LLM guide for a complete hardware comparison across AI workloads.

Related reading: our NAS buyer's guide and our NAS explainer.

Use our free NAS Sizing Wizard to get a personalised NAS recommendation.

Can I use Open WebUI with the OpenAI API instead of local models?

Yes. Open WebUI supports connecting to the OpenAI API as a backend alongside or instead of Ollama. Add your OpenAI API key under Admin Settings → Connections → OpenAI API. This lets you use GPT-4o, GPT-4 Turbo, and other OpenAI models through the same interface as local models. Useful if you want a unified chat interface for both local (private, free) and cloud (capable, paid) models depending on the task.

What is the difference between Open WebUI and ChatGPT?

Open WebUI is a self-hosted interface running models on your own hardware. ChatGPT uses OpenAI's cloud-hosted GPT models. The key differences: Open WebUI is private (data never leaves your network), free to run (no API costs once hardware is paid for), but limited by your hardware's inference speed. ChatGPT (and GPT-4) is significantly more capable than the open models available for local inference today, and responds in real-time. Local LLMs are best for private, offline, or cost-sensitive use cases; ChatGPT/Claude are better for capability-demanding tasks.

How much storage do LLM models take?

Model sizes: Phi-3 Mini (3.8B) ~2.3GB, Llama 3.2 3B ~2GB, Mistral 7B ~4.1GB, Llama 3.1 8B ~4.7GB, Llama 3.1 70B ~40GB. Models are stored in the Ollama volume mount on your NAS. For a selection of 3-4 models (one small, one medium), budget 10-15GB of NAS storage. Larger models (30B, 70B) require 20-40GB storage and 24+ GB RAM to load. Not suitable for typical NAS hardware.

Is Ollama only for NAS?

No. Ollama runs on any Linux, macOS, or Windows machine. The NAS deployment is convenient because the NAS is always on and accessible on the local network. You can query your local LLM from any device in your home without leaving a PC running. But for best performance, running Ollama on a PC or Mac with a GPU is more capable than NAS hardware. Many homelab users run Ollama on their primary PC for performance and use the NAS for everything-always-on services like Nextcloud, Immich, and Home Assistant.

Methodology (Real-World, AU-Verified)

Need to Know IT is an independent resource focused on storage and infrastructure decisions. Recommendations are based on official specifications, vendor documentation, and real-world deployment considerations, including availability, warranty, connectivity, and running costs.

Where relevant, guidance is grounded in Australian conditions and pricing, while remaining applicable to global audiences. Our tools and calculators are designed to reflect real-world usage scenarios, not theoretical maximums.

Updates & corrections: Content is reviewed and updated as products change. If you spot an error, contact the editorial team and we'll investigate and correct it.

Can Open WebUI be accessed remotely?

Yes. Configure HTTPS via NGINX Proxy Manager or a Cloudflare Tunnel. Same approach as other self-hosted NAS services. Once accessible via HTTPS, you can query your local LLM from anywhere. Note that remote access routes your queries through your internet connection (sending text queries out, receiving responses in). For private documents you want to keep off-internet entirely, restrict to local network access only via VPN.

Curious which NAS hardware handles local AI inference and what to expect from each model? The best NAS for local LLM guide covers hardware requirements, model selection, and GPU options.

Best NAS for Local LLM →