REPOMIND

Open-source repo-scale coding agent on AMD MI300X.

Ingest a git repository (up to 256K tokens, FP8) on a single GPU and reason across the whole codebase with multi-step tool use.

Verified on a single MI300X (2026-05-05): 256K context · 31/31 concurrent users at 8K–64K · 200K needle-in-haystack 3/3 · 9/9 end-to-end repo questions correct · $4.12 total stress test cost · AITER FP8 attention backend regression filed for AMD review.

🎬 1-minute demo video · 📦 GitHub source (MIT) · 🏆 Lablab project page · 🐛 AMD Developer Forum thread #505

Why AMD MI300X — memory architecture

Component Verified on MI300X NVIDIA H100 80 GB
Qwen3-Coder-Next-FP8 weights in VRAM 77.29 GiB fits
256K KV cache @ FP8 (2,065,744 tokens) 94.58 GiB available cannot fit
Total peak utilization 176 / 191.7 GiB (92%) cannot accommodate (~143 GB > 80 GB)

This is a memory-architecture story. AMD MI300X 192 GB has the headroom on a single card; NVIDIA H100 80 GB cannot accommodate the same configuration by VRAM accounting.

Demo backend

This Space serves a CPU mock for UI demonstration only — HF Spaces don't ship MI300X GPUs. The verified performance numbers above and in the Verified evidence tab come from a real MI300X stress test on AMD Developer Cloud (124 min, $4.12).

Backend right now: 🟡 CPU mock — HF Spaces ship CPU/T4 by default, not MI300X

To wire a real MI300X endpoint, set Space secrets VLLM_BASE_URL + MODEL_NAME=Qwen/Qwen3-Coder-Next-FP8. For a live walkthrough on a hosted MI300X, contact razikovsardor1@gmail.com.

Paste any GitHub URL or owner/repo shorthand. REPOMIND clones it, parses the source files, and chunks them into priority-ranked sections (README first, then top-level symbols, then nested code, then tests).

256 4096

Examples that work on a single MI300X: pallets/flask (408K tokens, fits in 256K window with priority chunking) · pytorch/vision (1.3M tokens, trimmed to 180K of highest-priority content via the chunker) · this repo SRKRZ23/repomind (~68K tokens, fits whole).