Classification Tree Based Algorithms in Studying Predictors for Long-Term Unemployment in Early Adulthood : An Exploratory Analysis Combining Supervised Machine Learning and Administrative Register Data

University essay from Linköpings universitet/Institutionen för ekonomisk och industriell utveckling; Linköpings universitet/Filosofiska fakulteten

Abstract: Unemployment at young age is a negative life event that has been found to have scarring effects for future life outcomes, especially when continuing long-term. Understanding precursors for long-term unemployment in early adulthood is important to be able to target policy interventions in critical junctures in the life course. Paths to unemployment are complex and a comprehensive outlook on the most important factors and mechanisms is difficult to obtain. This study proposes a data-driven, exploratory approach for studying individual and family level factors during ages 0-24, that predict long-term unemployment at the age of 25-30. A supervised machine learning approach was applied to understand associations deriving from longitudinal, individual-level administrative data from a full birth cohort in Finland. The data comprise information about physical and social wellbeing, life course events, as well as demographics, including the parents of the cohort members. Potential predictors were chosen from the data based on theories and previous research, and used to train a model aiming to correctly classify unemployed individuals. A CART algorithm was used to build a classification tree that reveals important variables, ranges of them as well as combinations of factors that together are predictive of long-term unemployment. A random forest algorithm was used to build several trees producing smoothed predictions that reduce overfitting of one tree. CARTs and random forest models were compared to each other to understand how they perform in a research task predicting life outcomes. Both individual and family level factors were found to be predictive of the outcome. Combinations of variables such as GPA lower than ~7.5, ego’s low education level, late work history start, depressive disorders and low parental education and income levels were found to be particularly predictive of unemployment. CART models correctly classified up to 87% of the unemployed, while misclassifying 70% of the employed and having 45% overall accuracy. Testing for CART model stability, finding consistency across several tree models improved robustness. Random forest correctly predicted up to 59% of the unemployed, while also correctly classifying 65% of the employed and producing robust results. The two algorithms together provided valuable insight for better understanding factors contributing to unemployment. The study shows promise for classification tree based methods in studying life course and life outcomes.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)