The abundance of unlabeled forest images on the web is a powerful yet untapped resource to train forestry vision models. Two key challenges limiting the use of these unlabeled images are i) collecting the images and ii) obtaining the labels, as supervised learning remains the prevailing approach for model training. In this work, we address the first issue by providing a dataset of 110 k forest images sourced from a repository of pictures taken by amateur photographers worldwide. To generate supplementary labels for supervised training, we propose a two-step approach. First, we train a network on a small labelled dataset, to generate pseudo-labels on the much larger, unlabeled one. Then, we leverage the zero-shot segmentation capability of the Segment Anything Model to improve the quality of these pseudo-labels. Our experiments demonstrate that both the proposed dataset and the pseudo-labeling method increase performance of a tree detector at no additional labeling cost. This performance increase is particularly significant in challenging scenarios, showing that training the model with better segmentation masks notably helps disentangle overlapping trees and detect odd-shaped ones, gaining between 3.3 APbb, 7.7 APseg or 1.6 APbb, 3.5 APseg percentage points depending on the burn-in model. Code and dataset links are available at https://github.com/norlab-ulaval/PercepTreeV1.
Month: May
Year: 2024
Venue: 21st Conference on Robots and Vision
URL: https://crv.pubpub.org/pub/it4xxpil