REPOMIND

Open-source repo-scale coding agent on AMD MI300X.

Ingest a git repository (up to 256K tokens, FP8) on a single GPU and reason across the whole codebase with multi-step tool use.

Verified on a single MI300X (2026-05-05): 256K context · 31/31 concurrent users at 8K–64K · 200K needle-in-haystack 3/3 · 9/9 end-to-end repo questions correct · $4.12 total stress test cost · AITER FP8 attention backend regression filed for AMD review.

🎬 1-minute demo video · 📦 GitHub source (MIT) · 🏆 Lablab project page · 🐛 AMD Developer Forum thread #505

Why AMD MI300X — memory architecture

Component	Verified on MI300X	NVIDIA H100 80 GB
Qwen3-Coder-Next-FP8 weights in VRAM	77.29 GiB	fits
256K KV cache @ FP8 (2,065,744 tokens)	94.58 GiB available	cannot fit
Total peak utilization	176 / 191.7 GiB (92%)	cannot accommodate (~143 GB > 80 GB)

This is a memory-architecture story. AMD MI300X 192 GB has the headroom on a single card; NVIDIA H100 80 GB cannot accommodate the same configuration by VRAM accounting.

Demo backend

This Space serves a CPU mock for UI demonstration only — HF Spaces don't ship MI300X GPUs. The verified performance numbers above and in the Verified evidence tab come from a real MI300X stress test on AMD Developer Cloud (124 min, $4.12).

Backend right now: 🟡 CPU mock — HF Spaces ship CPU/T4 by default, not MI300X

To wire a real MI300X endpoint, set Space secrets VLLM_BASE_URL + MODEL_NAME=Qwen/Qwen3-Coder-Next-FP8. For a live walkthrough on a hosted MI300X, contact razikovsardor1@gmail.com.