Building My All-AMD AI Workstation: Ryzen 9950X3D + Radeon R9700

Back to Blog

A personal build log covering hardware selection, assembly, and getting local AI inference running on RDNA 4.

Why I Built This

I'd been running ML workloads on cloud GPUs for a while — RunPod, Vast.ai — and while they work, the costs add up fast when you're experimenting daily. I wanted something local, fast, and capable of running LLMs and audio transcription without a monthly bill. I also wanted to go all-in on AMD.

At the time, the Ryzen 9 9950X3D had just dropped, and AMD's Radeon AI PRO R9700 was shaping up to be something special: 32GB of VRAM on an RDNA 4 architecture. For local inference, VRAM is everything — it determines what models you can load entirely on-device. 32GB means I can run 70B quantized models without breaking a sweat. The decision was easy.

The Parts List

Component Part
CPU AMD Ryzen 9 9950X3D
Cooler Thermalright Peerless Assassin 120 SE
Motherboard MSI PRO X870E-P WIFI
RAM 64GB G.Skill Flare X5 DDR5-6000 CL36 (4×16GB)
Storage Klevv CRAS C910 2TB PCIe 4.0 NVMe
GPU ASRock Creator Radeon AI PRO R9700 32GB
Case Phanteks XT PRO ULTRA
PSU Corsair RM850e 850W

A few notes on the choices:

The 9950X3D is the top of AMD's Zen 5 lineup, and the 3D V-Cache makes a real difference for workloads with large working sets. For AI inference it handles the CPU-side preprocessing and pipeline orchestration without ever becoming a bottleneck.

The R9700 is the star here. The ASRock Creator variant comes with 32GB of GDDR6 and is aimed squarely at AI workloads. The RDNA 4 architecture (gfx1201) brings significant efficiency improvements over RDNA 3, and 32GB of VRAM puts it ahead of most consumer NVIDIA cards for pure model capacity.

64GB of DDR5-6000 in a quad-channel config gives the CPU plenty of headroom for anything running in system RAM, and the fast memory helps with bandwidth-sensitive tasks.

The Build

Assembly was straightforward for the most part. A few things worth noting:

The Peerless Assassin 120 SE is a beast of an air cooler for the price. Mounting it on the X870E board was simple, and it keeps the 9950X3D well within thermal limits even under sustained load.

The Phanteks XT PRO ULTRA is an excellent case with great airflow and tons of room. Cable management was clean thanks to the spacious rear chamber, and the R9700 — which is a large card — fit without any clearance issues.

One thing to plan for: the R9700 needs significant PCIe power. Make sure your PSU and cable configuration supports it. The Corsair RM850e handled it without issue.

Setting Up Linux + ROCm

This is where things get interesting — and where most of the documentation gaps are.

I installed Ubuntu 24.04 LTS as the base OS. The goal was to get ROCm working with the R9700 (gfx1201), then layer Ollama and whisper.cpp on top.

Installing ROCm

ROCm support for RDNA 4 (gfx1201) is still relatively new as of this writing. I'm running ROCm 7.1.1. The install follows AMD's official instructions, but there are a few extra steps needed to get gfx1201 recognized correctly.

After installing the ROCm stack, verify your GPU is detected:

rocminfo | grep gfx

You should see gfx1201 in the output. If it doesn't show up, you may need to set the override environment variable:

export HSA_OVERRIDE_GFX_VERSION=12.0.1

This tells ROCm to treat the hardware as a known gfx version. Add it to your .bashrc or /etc/environment to make it persistent.

Running Ollama with GPU Offload

Once ROCm is working, getting Ollama set up is quick. Install it via the official script, and it should auto-detect your ROCm installation and offload to the R9700. You can verify GPU usage with:

rocm-smi

Watch the memory and utilization columns while running a model — you should see VRAM usage climb as the model loads.

whisper.cpp with ROCm Acceleration

For audio transcription, I went with whisper.cpp compiled with HIP support (ROCm's CUDA-compatible layer). The build process requires specifying the target GPU architecture:

make GGML_HIP=1 AMDGPU_TARGETS=gfx1201

I'm running the large-v3 model, which gives excellent transcription accuracy. With the R9700 handling inference, transcription runs significantly faster than real-time.

The Idle GPU Bug (gfx1201 + ROCm 7.1.1)

Worth flagging for anyone else running this setup: there is a known idle GPU usage bug affecting gfx1201 under ROCm 7.1.1. Even when no workload is running, you may see non-zero GPU utilization reported by rocm-smi. This is a driver/ROCm-level issue, not a hardware problem. AMD is aware of it. Keep an eye on the ROCm GitHub for updates, and don't be alarmed if your idle stats look unusual.

Performance

The results have been great. With Ollama running a 70B quantized model entirely in the R9700's 32GB VRAM, inference is smooth and fast. The large-v3 whisper model transcribes audio well above real-time speed.

For my day-to-day use — running LLMs locally, transcribing recordings, experimenting with ML projects — this machine handles everything I throw at it without touching a cloud GPU.

What's Next

This machine is already earning its keep. Here's what I'm running on it right now:

Autoresearch on AMD — I'm replicating my overnight H100 experiment series on the R9700 using ROCm. Same autonomous agent loop, different hardware. Early results are showing interesting differences in how the agent explores the search space on RDNA 4 versus CUDA — more on that soon.

Voza — a push-to-talk voice-to-text app running entirely local. It chains whisper.cpp (compiled with ROCm) into Ollama for transcription and downstream processing. Full Wayland support, zero cloud dependency. The R9700 handles real-time transcription without breaking a sweat.

TurboQuant on gfx1201 — I'm testing KV cache quantization on RDNA 4 and benchmarking inference performance. As far as I can tell, these may be among the first Linux gfx1201 results out there. Planning to share the numbers with the community once I have a full dataset.

Tax Agent — a fully local tax preparation agent built on top of multiple Ollama models (DeepSeek, Qwen3, and others) all running on-device. The 32GB of VRAM lets me keep several models loaded simultaneously, which makes the multi-agent orchestration practical without any cloud calls.

Academic Benchmarks — as part of my Master's in Computer Science at ASU, I'm using this machine to run academic benchmarks on local models. Being able to evaluate and compare model performance on-device — without waiting for cloud GPU availability or racking up API costs — has been a huge advantage for coursework and research. Having 32GB of VRAM locally means I can iterate on experiments between classes instead of queuing jobs on shared infrastructure.

If you're considering a similar build — especially if you're targeting local AI inference — I'd strongly recommend this combination. The R9700's 32GB VRAM is a genuine differentiator, and RDNA 4 + ROCm is only going to get better as driver support matures.

Next up in this series: deep dives into each of these projects — starting with the autoresearch ROCm vs CUDA comparison and the Voza build log.

Questions or ideas? Reach out — I'm always happy to talk about this stuff.

Share this post