REPOMIND
Open-source repo-scale coding agent on AMD MI300X.
Ingest a git repository (up to 256K tokens, FP8) on a single GPU and reason across the whole codebase with multi-step tool use.
Verified on a single MI300X (2026-05-05): 256K context · 31/31 concurrent users at 8K–64K · 200K needle-in-haystack 3/3 · 9/9 end-to-end repo questions correct · $4.12 total stress test cost · AITER FP8 attention backend regression filed for AMD review.
🎬 1-minute demo video · 📦 GitHub source (MIT) · 🏆 Lablab project page · 🐛 AMD Developer Forum thread #505
Why AMD MI300X — memory architecture
| Component | Verified on MI300X | NVIDIA H100 80 GB |
|---|---|---|
| Qwen3-Coder-Next-FP8 weights in VRAM | 77.29 GiB | fits |
| 256K KV cache @ FP8 (2,065,744 tokens) | 94.58 GiB available | cannot fit |
| Total peak utilization | 176 / 191.7 GiB (92%) | cannot accommodate (~143 GB > 80 GB) |
This is a memory-architecture story. AMD MI300X 192 GB has the headroom on a single card; NVIDIA H100 80 GB cannot accommodate the same configuration by VRAM accounting.
Demo backend
This Space serves a CPU mock for UI demonstration only — HF Spaces don't ship MI300X GPUs. The verified performance numbers above and in the Verified evidence tab come from a real MI300X stress test on AMD Developer Cloud (124 min, $4.12).
Backend right now: 🟡 CPU mock — HF Spaces ship CPU/T4 by default, not MI300X
To wire a real MI300X endpoint, set Space secrets VLLM_BASE_URL + MODEL_NAME=Qwen/Qwen3-Coder-Next-FP8. For a live walkthrough on a hosted MI300X, contact razikovsardor1@gmail.com.
Paste any GitHub URL or owner/repo shorthand. REPOMIND clones it, parses the source files, and chunks them into priority-ranked sections (README first, then top-level symbols, then nested code, then tests).
Examples that work on a single MI300X: pallets/flask (408K tokens, fits in 256K window with priority chunking) · pytorch/vision (1.3M tokens, trimmed to 180K of highest-priority content via the chunker) · this repo SRKRZ23/repomind (~68K tokens, fits whole).