Abstract: In computer vision, most data are captured in 2D formats, limiting spatial understanding in real-world applications. This presents a challenge for fields such as architecture, construction, ...