上一条:A Comprehensive Survey on Composed Image Retrieval
下一条:Exploiting the Social-Like Prior in Transformer for Visual Reasoning