GASv2
Zero-shot semantic mapping from phone video to 3D scene graph, using four foundation models stitched together on pay-per-second GPUs. Ten invocations, seven crashes, one honest map. Everything we learned about Modal, VGGT, Grounding DINO, SAM 2, and the CUDA version that quietly changed under us.