This AI hardware requirements calculator estimates the RAM, CPU, and storage needed to run a local large language model on your NAS or home server based on model size, use case, and concurrent users. Compares on-device inference cost against cloud API pricing in AUD.
Find out exactly how much RAM, CPU, and processing power you need to run local AI models on a NAS - then compare the 3-year cost against paying for GPT-4o or Claude in AUD. Enter your model size and use case below.
At moderate use (500 queries/day), Claude or GPT-4o costs $1,300 to $2,000 AUD per year in API fees. A NAS capable of running a 7B parameter model typically costs $800 to $1,200 AUD upfront and pays for itself within 12 to 18 months. This calculator gives you the exact break-even figure for your specific usage pattern and AU electricity rate.
Privacy is the second reason. When you query a cloud AI, your prompts, documents, and outputs leave Australia and enter US jurisdiction under US data laws. Local inference on a NAS keeps everything on-premises. For businesses handling client data, this is directly relevant to the Australian Privacy Act 2024 and any data residency obligations your contracts require.
NBN latency is the third factor. A cloud AI round-trip from Australia to US servers adds 150-300ms of network latency per response. Local inference is bounded only by your NAS CPU or NPU speed, not by your upload plan. On NBN 25 or NBN 50, cloud AI feels slower than advertised. Local inference on a capable NAS removes that constraint entirely.
The rule of thumb is simple: you need roughly 1 GB of RAM per billion parameters at Q4 quantization, plus 2 GB overhead for the operating system and inference engine. A 7B model needs 5-6 GB RAM. A 13B model needs 9-10 GB. A 70B model needs 38-42 GB, which puts it out of reach of most consumer NAS hardware and into dedicated server territory.
This calculator uses Q4_K_M quantization throughout. This is the practical default for home and SMB NAS hardware: it reduces RAM requirements by roughly 75% versus full-precision weights, with a quality penalty most users cannot distinguish from the full-precision output. If you are running RAG (Retrieval-Augmented Generation), add 20-50% overhead on top of the base model RAM figure to account for the embedding model and vector database.
Image generation models (Stable Diffusion, FLUX) require a dedicated GPU. The compute requirements are an order of magnitude higher than text LLMs, and NAS hardware cannot run them at any practical speed. Fine-tuning also requires full-precision weights and optimizer states, typically 4 to 6 times the inference RAM, making it impractical on consumer NAS hardware. This calculator covers text LLM inference only.
USD-priced cloud AI: GPT-4o and Claude are billed in USD. At current exchange rates (~$1 USD = ~$1.55 AUD), a moderate usage pattern costs $1,300-$2,000 AUD/year, more if you're running team workflows. One-time NAS hardware typically breaks even within 12-18 months.
Privacy Act 2024 implications: When you query a cloud AI, your data leaves AU soil and enters US jurisdiction. Local inference on a NAS keeps all data, queries, documents, and outputs, within your premises. No data retention, no model training on your inputs.
NBN upload constraints: Cloud AI round-trips add latency for AU users. Local inference is bounded only by your NAS CPU/NPU speed, typically 2-30 tokens/second depending on model size and hardware.
Quantization reduces model precision from 32-bit floats to 4-bit integers (Q4), shrinking RAM requirements by roughly 75%. A 7B parameter model at full precision needs ~28 GB RAM. The same model at Q4_K_M quantization (the most common format for Ollama) needs ~4-5 GB. This calculator uses Q4_K_M figures throughout, the practical default for NAS hardware.
Yes, via Docker/Container Station. Synology DS925+, QNAP TS-464, and UGREEN DXP4800 all support Docker and can run Ollama. For 3B-7B models, the CPU is adequate for personal use. Response times range from 3-15 seconds per response depending on model size and NAS CPU. For faster inference, UGREEN DXP4800 Plus with its Intel N100 iGPU provides acceleration through llama.cpp's GPU offloading.
RAG (Retrieval-Augmented Generation) lets the AI search your private documents before answering. It requires running an embedding model alongside the main LLM, maintaining a vector database (like Chroma or Qdrant) in memory, and handling larger context windows as retrieved chunks are injected into each prompt. This adds 20-50% RAM overhead on top of the base model requirements.
An NPU (Neural Processing Unit) is a chip optimised for matrix multiplications, the core operation in AI inference. NPU performance is measured in TOPS (Tera Operations Per Second). As of 2026, most consumer NAS CPUs do not have a dedicated NPU. The UGREEN DXP4800 Plus uses an Intel N100 with integrated GPU (iGPU) which provides partial acceleration via llama.cpp. Dedicated NPU NAS models are emerging: QNAP's AI-series targets this space, but AU stock availability is limited.
Rarely. Fine-tuning requires full-precision weights (32-bit), optimizer states, and gradient storage, typically 4-6× the inference RAM. A 7B model fine-tune needs 40-60 GB RAM minimum, plus significant CPU time (days, not hours). NAS hardware is not designed for this workload. For fine-tuning, a dedicated GPU server or cloud training instance is the correct tool. This calculator includes fine-tuning as a reference, the RAM figures show why it's impractical on standard NAS hardware.
Costs are based on published API pricing as of March 2026: GPT-4o at USD $15/million input tokens + $60/million output tokens (blended ~$37.50/M), Claude Sonnet at USD $3/million input + $15/million output tokens (blended ~$9/M). Converted at AUD $1.55 per USD. Actual costs vary with your input/output ratio and any applicable volume discounts or subscription tiers. The "token/day" figures are estimates, your actual usage may differ significantly.
Image generation models (Stable Diffusion, FLUX) require a dedicated GPU, the compute requirements are an order of magnitude higher than text LLMs. Most NAS units cannot run these workloads at any practical speed. A QNAP model with PCIe GPU passthrough (e.g. TVS-H874) paired with a consumer GPU (RTX 3060/4060) is the entry point. This calculator covers text LLMs only.
Ask us anything about NAS, storage, or infrastructure. We reply by email.
Message sent!
We'll be in touch soon.