This AI hardware requirements calculator estimates the RAM, CPU, and storage needed to run a local large language model on your NAS or home server based on model size, use case, and concurrent users. Compares on-device inference cost against cloud API pricing in AUD.
Find out exactly how much RAM, CPU, and processing power you need to run local AI models on a NAS - then compare the 3-year cost against paying for GPT-4o or Claude in AUD. Enter your model size and use case below.
USD-priced cloud AI: GPT-4o and Claude are billed in USD. At current exchange rates (~$1 USD = ~$1.55 AUD), a moderate usage pattern costs $1,300-$2,000 AUD/year, more if you're running team workflows. One-time NAS hardware typically breaks even within 12-18 months.
Privacy Act 2024 implications: When you query a cloud AI, your data leaves AU soil and enters US jurisdiction. Local inference on a NAS keeps all data, queries, documents, and outputs, within your premises. No data retention, no model training on your inputs.
NBN upload constraints: Cloud AI round-trips add latency for AU users. Local inference is bounded only by your NAS CPU/NPU speed, typically 2-30 tokens/second depending on model size and hardware.
Quantization reduces model precision from 32-bit floats to 4-bit integers (Q4), shrinking RAM requirements by roughly 75%. A 7B parameter model at full precision needs ~28 GB RAM. The same model at Q4_K_M quantization (the most common format for Ollama) needs ~4-5 GB. This calculator uses Q4_K_M figures throughout, the practical default for NAS hardware.
Yes, via Docker/Container Station. Synology DS925+, QNAP TS-464, and UGREEN DXP4800 all support Docker and can run Ollama. For 3B-7B models, the CPU is adequate for personal use. Response times range from 3-15 seconds per response depending on model size and NAS CPU. For faster inference, UGREEN DXP4800 Plus with its Intel N100 iGPU provides acceleration through llama.cpp's GPU offloading.
RAG (Retrieval-Augmented Generation) lets the AI search your private documents before answering. It requires running an embedding model alongside the main LLM, maintaining a vector database (like Chroma or Qdrant) in memory, and handling larger context windows as retrieved chunks are injected into each prompt. This adds 20-50% RAM overhead on top of the base model requirements.
An NPU (Neural Processing Unit) is a chip optimised for matrix multiplications, the core operation in AI inference. NPU performance is measured in TOPS (Tera Operations Per Second). As of 2026, most consumer NAS CPUs do not have a dedicated NPU. The UGREEN DXP4800 Plus uses an Intel N100 with integrated GPU (iGPU) which provides partial acceleration via llama.cpp. Dedicated NPU NAS models are emerging: QNAP's AI-series targets this space, but AU stock availability is limited.
Rarely. Fine-tuning requires full-precision weights (32-bit), optimizer states, and gradient storage, typically 4-6× the inference RAM. A 7B model fine-tune needs 40-60 GB RAM minimum, plus significant CPU time (days, not hours). NAS hardware is not designed for this workload. For fine-tuning, a dedicated GPU server or cloud training instance is the correct tool. This calculator includes fine-tuning as a reference, the RAM figures show why it's impractical on standard NAS hardware.
Costs are based on published API pricing as of March 2026: GPT-4o at USD $15/million input tokens + $60/million output tokens (blended ~$37.50/M), Claude Sonnet at USD $3/million input + $15/million output tokens (blended ~$9/M). Converted at AUD $1.55 per USD. Actual costs vary with your input/output ratio and any applicable volume discounts or subscription tiers. The "token/day" figures are estimates, your actual usage may differ significantly.
Image generation models (Stable Diffusion, FLUX) require a dedicated GPU, the compute requirements are an order of magnitude higher than text LLMs. Most NAS units cannot run these workloads at any practical speed. A QNAP model with PCIe GPU passthrough (e.g. TVS-H874) paired with a consumer GPU (RTX 3060/4060) is the entry point. This calculator covers text LLMs only.