Abstract: Addressing the critical challenge of spatiotemporal semantic disjunction caused by conventional bird’s-eye view trajectory modeling methods in ego-vehicle perspective road user behavior ...
Abstract: We built a spatial hybrid system that combines a personal computer (PC) and virtual reality (VR) for visual sensemaking, addressing limitations in both environments. Although VR offers ...
One of the principal challenges in building VLM-powered GUI agents is visual grounding, i.e., localizing the appropriate screen region for action execution based on both the visual content and the ...