Home › Tools › AI Hardware Requirements Calculator

AI Hardware Requirements Calculator

This AI hardware requirements calculator estimates the RAM, CPU, and storage needed to run a local large language model on your NAS or home server based on model size, use case, and concurrent users. Compares on-device inference cost against cloud API pricing in AUD.

Find out exactly how much RAM, CPU, and processing power you need to run local AI models on a NAS - then compare the 3-year cost against paying for GPT-4o or Claude in AUD. Enter your model size and use case below.

Model Size (parameters)

13B

30B

70B

e.g. Phi-3 Mini, Llama 3.2 3B, fastest on NAS hardware

Use Case

RAG adds vector store RAM overhead. Fine-tuning is rarely practical on NAS.

Concurrent Users

Each concurrent request holds a copy of the model context in RAM.

Daily Token Usage (for cost comparison)

Used only for the cloud API cost comparison below.

RAM Requirements

Minimum

—

GB RAM

Model loads but headroom is tight

Recommended

—

GB RAM

Comfortable operation with your workload

Hardware Assessment

✅

Recommended NAS Models (AU pricing, March 2026)

3-Year Cost Comparison (AUD)

Estimated Annual Power Cost (AU, NSW rate ~$0.35/kWh)

Full running cost breakdown → NAS Power Calculator

Australian Context

USD-priced cloud AI: GPT-4o and Claude are billed in USD. At current exchange rates (~$1 USD = ~$1.55 AUD), a moderate usage pattern costs $1,300-$2,000 AUD/year, more if you're running team workflows. One-time NAS hardware typically breaks even within 12-18 months.

Privacy Act 2024 implications: When you query a cloud AI, your data leaves AU soil and enters US jurisdiction. Local inference on a NAS keeps all data, queries, documents, and outputs, within your premises. No data retention, no model training on your inputs.

NBN upload constraints: Cloud AI round-trips add latency for AU users. Local inference is bounded only by your NAS CPU/NPU speed, typically 2-30 tokens/second depending on model size and hardware.

Frequently Asked Questions

What does "quantized" mean and why does it matter for RAM?

Quantization reduces model precision from 32-bit floats to 4-bit integers (Q4), shrinking RAM requirements by roughly 75%. A 7B parameter model at full precision needs ~28 GB RAM. The same model at Q4_K_M quantization (the most common format for Ollama) needs ~4-5 GB. This calculator uses Q4_K_M figures throughout, the practical default for NAS hardware.
Can a Synology or QNAP NAS actually run Ollama?

Yes, via Docker/Container Station. Synology DS925+, QNAP TS-464, and UGREEN DXP4800 all support Docker and can run Ollama. For 3B-7B models, the CPU is adequate for personal use. Response times range from 3-15 seconds per response depending on model size and NAS CPU. For faster inference, UGREEN DXP4800 Plus with its Intel N100 iGPU provides acceleration through llama.cpp's GPU offloading.
What is RAG and why does it use more RAM?

RAG (Retrieval-Augmented Generation) lets the AI search your private documents before answering. It requires running an embedding model alongside the main LLM, maintaining a vector database (like Chroma or Qdrant) in memory, and handling larger context windows as retrieved chunks are injected into each prompt. This adds 20-50% RAM overhead on top of the base model requirements.
What is an NPU and which NAS units have one?

An NPU (Neural Processing Unit) is a chip optimised for matrix multiplications, the core operation in AI inference. NPU performance is measured in TOPS (Tera Operations Per Second). As of 2026, most consumer NAS CPUs do not have a dedicated NPU. The UGREEN DXP4800 Plus uses an Intel N100 with integrated GPU (iGPU) which provides partial acceleration via llama.cpp. Dedicated NPU NAS models are emerging: QNAP's AI-series targets this space, but AU stock availability is limited.
Is fine-tuning practical on a NAS?

Rarely. Fine-tuning requires full-precision weights (32-bit), optimizer states, and gradient storage, typically 4-6× the inference RAM. A 7B model fine-tune needs 40-60 GB RAM minimum, plus significant CPU time (days, not hours). NAS hardware is not designed for this workload. For fine-tuning, a dedicated GPU server or cloud training instance is the correct tool. This calculator includes fine-tuning as a reference, the RAM figures show why it's impractical on standard NAS hardware.
How accurate are the cloud API cost figures?

Costs are based on published API pricing as of March 2026: GPT-4o at USD $15/million input tokens + $60/million output tokens (blended ~$37.50/M), Claude Sonnet at USD $3/million input + $15/million output tokens (blended ~$9/M). Converted at AUD $1.55 per USD. Actual costs vary with your input/output ratio and any applicable volume discounts or subscription tiers. The "token/day" figures are estimates, your actual usage may differ significantly.
Can I run Stable Diffusion or image generation on a NAS?

Image generation models (Stable Diffusion, FLUX) require a dedicated GPU, the compute requirements are an order of magnitude higher than text LLMs. Most NAS units cannot run these workloads at any practical speed. A QNAP model with PCIe GPU passthrough (e.g. TVS-H874) paired with a consumer GPU (RTX 3060/4060) is the entry point. This calculator covers text LLMs only.