Estimation of dissolved organic carbon from inland waters using remote sensing data and machine learning

University essay from Lunds universitet/Institutionen för naturgeografi och ekosystemvetenskap

Abstract: This thesis presents the first attempt to estimate Dissolved Organic Carbon (DOC) in inland waters over a large-scale area using satellite data and machine learning (ML) methods. Four ML approaches, namely Random Forest Regression (RFR), Support Vector Regression (SVR), Gaussian Process Regression (GPR), and a Multilayer Backpropagation Neural Network (MBPNN) were tested to retrieve DOC using a filtered version of the recently published open source AquaSat dataset with more than 16 thousand samples across the continental US matched with satellite data from Landsat 5, 7 and 8 missions. In this work, the AquaSat dataset was extended with environmental data from the ERA5-Land product. Including environmental data considerably improved the prediction of DOC for all algorithms, with GPR showing the best and most robust performance results with moderate estimation errors (RMSE: 4.08 mg/L). Permutation feature importance analysis showed that from the Landsat bands, the wavelength in the visible green and for the ERA5-Land product, the monthly average air temperature were the most important variables for the machine learning approaches. The results demonstrate the predictive strength of advanced ML approaches faced with a complex learning task, such as GPR and MBPNN, and highlight the important role of considering environmental processes to explain DOC variations over large scales. While performance evaluation showed that DOC concentrations can be retrieved with adequate accuracy, algorithm development was challenged by the heterogenous nature of large-scale open source in situ data, issues related to atmospheric correction, and the low spatial and temporal resolution of the environmental predictors. Although locally tuned models are likely to outperform the developed model in terms of accuracy, the model can address key issues of inland water remote sensing as a promising approach to overcome the lack of in-situ measurements and to map large scale trends of inland DOC dynamically over long time periods and seasons. This research demonstrates how open source, large scale datasets like AquaSat in combination with ML and remote sensing can make research toward large scale estimations of inland water DOC more realistic while highlighting its remaining limitations and challenges.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)