Best NAS for Local LLM and AI Inference

Which NAS handles local LLM inference. Comparing Intel Celeron, AMD Ryzen, and PCIe GPU expansion options for running Ollama, Open WebUI, and AI-powered self-hosted apps like Immich.

This page contains affiliate links. If you purchase via our links we may earn a small commission, at no extra cost to you. Editorial independence policy.

Running large language models and AI workloads on a NAS is an emerging use case. Ollama, Open WebUI, and Immich's machine learning pipeline all run on NAS hardware, but performance varies dramatically between ARM, Intel Celeron, AMD Ryzen, and PCIe GPU-equipped models. This guide covers which NAS architectures are viable for local AI inference, what performance to expect from each, and the hardware sweet spots for different workloads: interactive LLM chat, photo face recognition, and always-on AI services.

In short: For photo AI (Immich, Nextcloud Recognize): any Intel x86 NAS with 8GB RAM works. The TS-464 or DS423+ is sufficient. For interactive local LLM chat: you need a GPU. QNAP's PCIe-equipped models (TS-473A) with an NVIDIA card are the viable NAS-based GPU inference platform. CPU-only LLM on a Celeron is functional for non-time-critical tasks at 2-5 tokens/second.

AI Workloads on NAS: Three Categories

NAS AI workloads fall into three distinct categories with different hardware requirements:

1. Background AI processing (photo recognition, semantic search): Immich face recognition, Nextcloud Recognize, Photoprism indexing. Runs in background, not time-sensitive. CPU-only on Intel Celeron is viable. Takes longer but the result is the same. ARM (TS-233, DS223) is very slow and practically unusable for ML-heavy features. Required: Intel/AMD x86 NAS, 8GB+ RAM.

2. CPU-only LLM inference (Ollama/Open WebUI, non-interactive): Asking questions and waiting 30-120 seconds for a complete response. Llama 3.2 3B, Phi-3 Mini. Functional on any x86 NAS with 8GB RAM. Required: x86 NAS, 8GB+ RAM, patience.

3. Interactive LLM (real-time conversation, 7B+ models): Real-time chat at 30+ tokens/second. Requires GPU inference. Only viable on NAS hardware with PCIe GPU expansion. Required: QNAP PCIe model + compatible NVIDIA GPU, 8GB+ VRAM on GPU.

NAS Performance by CPU Architecture

NAS AI Performance by Platform

ARM (TS-233, DS223) ARM (TS-233, DS223) Intel Celeron (TS-464, DS423+) Intel Celeron (TS-464, DS423+) AMD Ryzen (TS-473A, DS923+) AMD Ryzen (TS-473A, DS923+) Intel/AMD + GPU (QNAP PCIe)
Immich face recognition Very slow (days for large libraries)Adequate (hours)Good (faster than Celeron)Fast (with GPU passthrough)
LLM (3B model, tokens/sec) Not recommended (<1 tok/s)2-5 tok/s4-8 tok/s40-100+ tok/s (GPU)
LLM (7B model, tokens/sec) Not viable0.5-2 tok/s1-3 tok/s20-60+ tok/s (GPU)
Max viable model size N/A3B-7B (slow)7B (marginally usable)13B-30B (with GPU VRAM)
RAM requirement 4GB fixed8GB (16GB recommended)8GB+ (16GB preferred)8GB NAS + GPU VRAM
Best use case File serving onlyBackground AI, slow LLMBackground AI, slow LLMReal-time LLM, GPU ML

Best NAS for Photo AI (Immich, Nextcloud)

For running Immich with face recognition and CLIP semantic search, or Nextcloud with the Recognize app:

Recommended: QNAP TS-464 (~$989) or Synology DS423+ (~$980)

Both have Intel Celeron N-series CPUs that handle AI background processing adequately. For a 10,000-photo library, initial ML indexing takes 4-12 hours. Tolerable since it runs in the background. Ongoing processing of new photos (daily uploads) takes minutes. The key requirement is 8GB RAM. Run at least 8GB to avoid container memory pressure when ML models load.

The Synology DS423+ has a stock RAM of 2GB. Upgrade to 8GB with a third-party SO-DIMM before running Immich ML. The TS-464 ships with 8GB.

Best NAS for Local LLM (CPU Inference)

For running Ollama + Open WebUI on CPU (no GPU):

Recommended: QNAP TS-473A (~$1,269) or Synology DS923+ (~$1269)

The AMD Ryzen V1500B in the TS-473A and R1600 in the DS923+ offer the best CPU inference performance in NAS hardware. The 8-core Ryzen configuration handles Llama 3.2 3B at 4-8 tokens/second. Usable for asynchronous queries (send a question, do other things, come back to the response). Not suitable for real-time conversation.

For CPU-only inference, the practical ceiling is 7B models. Anything larger (13B, 70B) is too slow on NAS CPUs for any useful workflow.

Best NAS for GPU-Accelerated LLM

For real-time LLM inference at 30+ tokens/second, you need GPU inference via a PCIe expansion slot:

Recommended: QNAP TS-473A (~$1,269) + NVIDIA GPU

The TS-473A has a PCIe Gen 3 x4 slot that accepts low-profile PCIe cards. GPU recommendations for LLM inference:

  • NVIDIA RTX 3060 12GB (~$400-550 AUD used/new): 12GB VRAM is the sweet spot for 7B-13B models. Fits in low-profile slot with an appropriate bracket
  • NVIDIA RTX 4060 8GB (~$500-600 AUD): Faster than RTX 3060 per FLOP, but 8GB VRAM limits model size. Suitable for 7B models only
  • NVIDIA RTX 4060 Ti 16GB (~$800-900 AUD): 16GB VRAM enables 13B models comfortably. Best balance for NAS LLM use if budget allows

With GPU inference, Llama 3.1 8B runs at 60-80 tokens/second on an RTX 3060. Interactive conversation speed. This transforms the local LLM experience from a curiosity to a daily-use tool.

Note: Synology DS923+ has no PCIe expansion slot. GPU inference is QNAP-only for NAS hardware in the current AU market.

🇦🇺 Australian Buyers: Pricing Summary

Total build costs for local AI NAS in Australia (March 2026):

  • Photo AI only (Immich/Nextcloud ML): QNAP TS-464 ($989) + 2 drives (~$200) = ~$1,200
  • CPU LLM (slow but functional): QNAP TS-473A ($1,269) + 2 drives (~$200) = ~$1,470. Handles 3B-7B models at 4-8 tokens/second
  • GPU LLM (interactive speed): QNAP TS-473A ($1,269) + 2 drives (~$200) + NVIDIA RTX 3060 12GB (~$1269) = ~$1,970. Handles 7B-13B models at 50-80 tokens/second

QNAP TS-473A is available at Scorptec, PLE, and Mwave in AU. NVIDIA GPUs are available from the same retailers as well as Umart and Computer Alliance. Verify GPU dimensions against the TS-473A's low-profile PCIe slot requirements before purchasing.

For those who want local LLM as a primary use case rather than a secondary service on a NAS, a dedicated mini-PC (Intel NUC, mini-ITX build with AMD Ryzen APU, or used workstation with NVIDIA GPU) provides better price/performance than a NAS + GPU approach. But lacks the always-on, low-power NAS advantages.

Related reading: our NAS buyer's guide, our NAS vs cloud storage comparison, and our NAS explainer.

Free tools: NAS Sizing Wizard and AI Hardware Requirements Calculator. No signup required.

Can Synology NAS run local LLMs?

Yes. Ollama and Open WebUI run on Synology via Container Manager on any Intel or AMD model. Performance is CPU-only. Synology has no PCIe expansion slot on consumer models, so GPU acceleration is not available. The DS923+ (AMD Ryzen R1600) is the most capable Synology for CPU inference at approximately 4-7 tokens/second for 3B models. For interactive LLM use, a QNAP with PCIe GPU is the better choice. For background AI tasks (Immich, Nextcloud ML), Synology is fully capable.

How much RAM does local LLM need?

Rule of thumb: model file size in GB ≈ RAM required in GB. Llama 3.2 3B (~2GB model) needs ~3GB RAM available. Mistral 7B (~4.1GB) needs ~5GB. Llama 3.1 8B (~4.7GB) needs ~6GB. Always have additional RAM headroom beyond the model size for the runtime. For GPU inference, VRAM is the constraint. Model must fit in GPU VRAM. An RTX 3060 with 12GB VRAM can hold Mistral 7B or Llama 3.1 8B comfortably with room for context.

Is local LLM on NAS private?

Yes. All inference runs on your hardware and nothing leaves your network (unless you configure remote access). Queries, responses, and the model weights are all local. This is the primary privacy advantage over ChatGPT and other cloud AI services, where your queries are processed on third-party servers. For sensitive business queries, legal documents, or personal data you don't want processed by a cloud provider, local LLM provides a private alternative.

What is the best model for NAS hardware?

For CPU-only inference on an 8GB NAS: Llama 3.2 3B or Phi-3 Mini. Both are capable instruction-following models that fit within 4GB RAM and run at 2-5 tokens/second on Celeron hardware. Phi-3 Mini is particularly efficient for its size. For GPU-accelerated inference (RTX 3060 12GB): Mistral 7B or Llama 3.1 8B offer dramatically better capability than 3B models and run at interactive speeds. Llama 3.1 70B requires 40GB VRAM. Not viable on single GPU NAS configurations.

Can I run Immich and Ollama on the same NAS?

Yes, if the NAS has sufficient RAM. On a TS-464 with 8GB RAM: Immich (with ML disabled to save RAM) + Ollama (3B model) coexists within 8GB. With 16GB RAM, you can run Immich with ML enabled + Ollama with a 7B model comfortably. Running both ML-intensive applications (Immich ML + Ollama) simultaneously on 8GB RAM will cause memory pressure. Schedule Immich ML processing during off-hours when Ollama is idle, or upgrade RAM to 16GB.

Want to set up Open WebUI and Ollama on your NAS today? The Open WebUI setup guide covers the Docker deployment and model selection step by step.

Open WebUI Setup Guide →
Not sure your build is right? Get a PDF review of your planned NAS setup: drive compatibility, RAID selection, and backup gaps checked. $149 AUD, 3 business days.
Review My Build →