Setting Up a Local AI Home Server: Hardware, Ollama and Frontends

Running a local AI server at home takes about an hour to set up and costs between $0 (if you already have a NAS) and $1,200 for a dedicated mini-PC. This guide covers the full path from hardware choice to a working chat interface running on your network.

A local AI home server lets every device on your network access LLM inference without internet dependency, cloud subscriptions, or sending data off your premises. The setup involves three decisions: choosing hardware that has enough RAM to run the model size you need, installing Ollama to handle model management and inference, and optionally adding a web frontend so you can use a chat interface from any device. Each of these steps is well-documented and takes less than an hour on a modern NAS or mini-PC. The total cost ranges from near zero if you are using hardware you already own, to $1,200 or more for a purpose-built dedicated device.

In short: Start with the hardware you already have. If you have a NAS with 8GB RAM and Docker support, install Ollama via Docker and add Open WebUI. If you do not have suitable existing hardware, a mini-PC with 32GB RAM in the $900 to $1,200 range gives you the best experience for regular interactive use. Install Ollama, pull a model, and optionally add Open WebUI for a browser-based chat interface accessible from any device on your network.

Step 1: Choose Your Hardware

The hardware decision is the most consequential part of the setup. It determines which model sizes are practical and how fast inference runs. There are three viable approaches depending on what you already own and what you want to spend.

If you already have a NAS: Start there. NAS devices with Docker support (Synology DS225+, DS425+, DS925+, DS1525+, and equivalent QNAP models) can run Ollama in a container. A NAS with 8GB of RAM handles 7B parameter models at 1 to 4 tokens per second, which is usable for background tasks, document summarisation, and occasional conversational use. It is too slow for regular interactive conversation. See the guide on NAS vs mini-PC for AI for a performance breakdown at each hardware tier.

If you want a dedicated device: A mini-PC in the $900 to $1,200 range with a Core Ultra or Ryzen AI processor and 32GB of RAM is the current sweet spot for daily interactive AI use. It generates 15 to 25 tokens per second on 7B models and handles 13B models at comfortable conversational speed. See the mini-PC buying guide for current Australian price tiers and what to look for.

If you want both: Many home setups use a NAS for storage and file serving and a dedicated mini-PC as the AI inference endpoint. The NAS holds model files and serves them over the network; the mini-PC handles compute-intensive inference requests from every device in the household. This is the most capable configuration and the one that makes the best use of hardware you likely already have running.

Step 2: Install Ollama

Ollama is the inference engine that manages model files and handles requests. It exposes an HTTP API on port 11434 that other applications (including Open WebUI) connect to. Installing Ollama is straightforward on any supported platform.

On Linux (mini-PC or desktop):

curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama as a system service that starts automatically on boot. The service listens on localhost by default. To expose it to other devices on the network, edit the service configuration to set the listen address to 0.0.0.0:11434.

On a NAS via Docker (Synology, QNAP, Ugreen):

In Container Manager or Container Station, pull the ollama/ollama image and run it with a persistent volume for model storage. The container needs to expose port 11434 and have a volume mounted at /root/.ollama so downloaded models persist across container restarts. Use the NAS-specific guides for Synology or QNAP for exact configuration steps, as the NAS Docker interfaces vary in how they handle host networking and GPU passthrough.

On macOS:

Download the Ollama application from the Ollama website, install it, and it starts automatically. Ollama on macOS uses Apple Silicon's unified memory architecture, which means the M-series chip's GPU and NPU share the same memory pool as the CPU. This makes Apple Silicon Macs among the most efficient platforms for local inference at any RAM capacity.

Step 3: Download a Model

With Ollama running, pull your first model. For most users starting out, Llama 3.1 8B is the recommended default:

ollama pull llama3.1:8b

This downloads approximately 4.7GB to your model directory. On an Australian NBN connection, this takes 5 to 20 minutes depending on your download speed. Once downloaded, the model is stored locally and does not need to be downloaded again. Subsequent pulls for the same model only download if a newer version is available.

To test the model from the command line immediately after pulling:

ollama run llama3.1:8b

This opens an interactive terminal session where you can type prompts and receive responses. It confirms that inference is working before adding a frontend layer. For a full breakdown of which models to download for different use cases, see the Ollama model guide. For a detailed explanation of the Q4/Q6/Q8 quantisation tags in model names, see the quantisation guide.

Step 4: Add a Web Interface (Optional)

The command line is functional but not practical for daily use. Open WebUI is the most capable browser-based frontend for Ollama and runs in Docker alongside it. Once installed, any device on your network can access the chat interface through a browser without installing anything on those devices.

Install Open WebUI with Docker (replace host.docker.internal if Ollama is on a different machine):

docker run -d -p 3000:80 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

After the container starts, open http://[your-device-IP]:3000 in any browser on your network. The first time you open it, you create an admin account. Additional users can be created with their own accounts and conversation history. Open WebUI connects to Ollama automatically via the host gateway address and shows all locally available models in a dropdown. See the full Open WebUI setup guide for configuration details and alternatives.

What You Can Do With a Local AI Server

Once the server is running, every device on your network has access to a private AI assistant. The most common use cases for home setups are conversational AI for daily tasks, code assistance for developers working locally, document summarisation for reading long reports or contracts, and writing assistance for drafts and correspondence.

The Open WebUI frontend adds document upload for retrieval-augmented generation (RAG), letting you upload a PDF or text file and query it directly. This is useful for extracting information from long documents, comparing contracts, or searching a personal document archive using natural language. The RAG pipeline runs entirely on your hardware with no data leaving your network.

Running inference locally also removes the network dependency entirely. Your local AI server continues working during an NBN outage. It responds to prompts in under 50 milliseconds on a local network connection versus 200 to 300 milliseconds for cloud AI requests routed through Australian internet to US servers. For interactive conversation and coding assistance, that latency difference is noticeable. See the guide on NBN upload speeds and local AI for a detailed comparison.

Power, Cost and Practical Considerations

A local AI server that runs 24/7 adds between $40 and $130 per year in Australian electricity depending on your state and hardware. A mini-PC at 25 watts continuous costs approximately $57 to $94 per year. A NAS already running for storage costs almost nothing extra for AI inference. A GPU-equipped workstation running continuously can cost $500 or more per year, which is the main reason GPU rigs should be powered on only during active use. See the full power cost breakdown by state for detailed numbers.

Model storage takes disk space. A 7B model at Q4 quantisation is approximately 4 to 5GB. Keeping three or four models ready requires 15 to 25GB of model storage. If your main device has limited storage, a NAS as a model file server is a practical solution: store the model files on the NAS and configure Ollama to load models from the network path or an SMB mount.

For users accessing the server from outside the home network (from a phone over mobile data, or from a remote office), most Australian ISPs use CGNAT which prevents direct inbound connections to home servers. Tailscale resolves this by creating an encrypted mesh network between your devices that works through CGNAT without any router configuration. See the CGNAT and remote access guide for details.

How long does it take to set up a local AI home server?

The core setup. Installing Ollama and pulling a model. Takes 15 to 30 minutes including model download time on a typical NBN connection. Adding Open WebUI via Docker adds another 10 to 15 minutes. Configuring network access and user accounts in Open WebUI takes another 5 to 10 minutes. Total time from starting to having a working chat interface accessible from other devices in the house is typically under an hour on a device with Docker already installed. First-time Docker setup on a NAS adds 15 to 30 minutes depending on the NAS software version.

Do I need to keep the server on all the time for local AI to work?

Only when you want to use it. Unlike cloud AI, your server needs to be running to handle requests. If you turn off your mini-PC, no one in the house can query the AI until it is turned back on. Many users leave their AI server on during waking hours and turn it off overnight to save power. Some users configure the mini-PC to wake on LAN so it can be powered on remotely. If you are running Ollama on a NAS that is already on 24/7 for storage, AI inference is available continuously at no extra power cost. See the power cost guide for the annual electricity cost at each hardware tier and usage pattern.

Can my whole family use the same local AI server at the same time?

Yes. Open WebUI supports multiple user accounts with separate conversation histories. Ollama handles concurrent requests, though inference for multiple simultaneous users queues rather than parallelises. If two people submit prompts at exactly the same time, the second request waits for the first to complete. For households where family members tend to use AI at different times rather than simultaneously, a single mid-range mini-PC handles the load comfortably. Open WebUI on the server is accessible from any browser on the network without installing anything on the user's device.

Is a local AI server private? Does any data leave my network?

Yes, if you are using local Ollama models. All inference runs on your hardware. Prompts, responses, conversation history, and uploaded documents never leave your network. Nothing is sent to cloud providers, AI companies, or any external server. Open WebUI stores all user data locally in its Docker volume on your device. The only time data leaves your network is if you configure Open WebUI to connect to an external API provider like OpenAI or Anthropic alongside local models. That connection is optional and you control whether it is set up.

What is the minimum hardware for a home AI server?

The absolute minimum is any Linux device with 6GB or more of RAM free for inference, Docker support, and an internet connection to pull the model. This could be a Synology NAS with 8GB RAM, an older mini-PC, or even a Raspberry Pi 5 with enough RAM (though inference on the Pi 5 is very slow). The practical minimum for a useful experience is 8GB of RAM available for inference and a modern processor. For interactive use, you want at least 10 to 15 tokens per second on a 7B model, which requires a modern mini-PC processor rather than NAS-grade hardware. See the RAM tier guide for which models run at each memory level.

Choosing between a NAS and a dedicated mini-PC for local AI? The comparison guide covers performance, real AU costs, and which platform fits which use case.

Read the Hardware Comparison