Abstract: Multi-modal few-shot semantic segmentation (FSS) aims to perform dense prediction from multiple modality images including visible image, depth image, and thermal image with a few annotated ...