Exploratory Assessment of Machine Learning, Geospatial Methods, and Data Evaluation Techniques for Modelling Cooking Fuels in Colombia

University essay from KTH/Skolan för industriell teknik och management (ITM)

Author: Federico Naranjo Hernandez; [2023]

Keywords: ;

Abstract: This thesis evaluates the application of machine learning and Geographic Information System (GIS) methods to analyze Colombia's cooking fuels landscape. The study aims to identify socio-economic and geographical factors affecting cooking fuel choices and assess the predictive power of various machine learning algorithms using associative data. The research discovers significant socio-economic influences on clean cooking fuel adoption through Pearson Correlation Coefficients and feature importance analyses. Key determinants include utilities availability, garbage collection coverage, and urbanization level, with the gas network proving to be a critical factor in household energy decisions. Data gathered from various departments and socio-economic backgrounds create a comprehensive dataset, revealing regional differences and helping to pinpoint areas needing more focused data collection. Data type conversions are employed to address computational constraints, affecting file sizes and training times of machine learning models without substantially compromising accuracy. Performance evaluations of classifiers, re-samplers, dimensionality reduction, and dataset size are conducted using balanced accuracy and training time. These analyses identify optimal combinations of classifiers and data processing techniques, considering the computational cost and predictive accuracy. Predictive models are then applied to the complete census data, forecasting cooking fuel types across departments. The results show a tendency towards traditional fuels in rural areas.Overall, the thesis provides a methodical approach to understanding and predicting cooking fuel usage, offering valuable insights for policymakers, and suggesting directions for future research to enhance model performance and address data imbalances.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)