Supervised classification of satellite images is performed based on utilization of reference training data. Therefore, the availability and quality of reference data highly influences the results and the course of the entire classification process.
In the Sentinel-2 Global Land Cover (S2GLC) project Sentinel-2 images are classified using Random Forest (RF) algorithm powered by training points selected from existing low resolution land cover databases. This approach allows to perform the classification process in a highly automatic manner without much intervention of an operator. An alternative method for creating training dataset has been developed in order to ensure the implementation of the S2GLC classification in case of limited access to the required land cover databases or their low quality.
The proposed method is a semi-automatic process initiated by an operator, who by a visual interpretation, indicates only several starting samples for the classes of interest. Afterwards, utilizing this limited set of initial training samples, hundreds or thousands of training samples with similar spectral characteristics are automatically selected from the image. Such a set of data, can be further used as an alternative source of training data for land cover classification on much greater scale.
Comparing to the traditional approach, in which all samples or training areas are manually indicated, the developed
method is very effective and also allows for processing data more rapidly. The semi-automatic
Gromny E., Lewiński S., Rybicki M., Malinowski R., Krupiński M., Nowakowski A., and Jenerowicz M. (2019), Creation of training dataset for Sentinel-2 land cover classification, Proc. SPIE 11176, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, 111763D (6 November 2019)