Abstract: With the rapid progress of the computer, communication, and consumer (3C) industry, robotic assembly tasks have become more important and challenging, such as the assembly of flexible ...
Abstract: The advent of vision-language pre-training techniques enhanced substantial progress in the development of models for image captioning. However, these models frequently produce generic ...