An Approach on Learning Multivariate Regression Chain Graphs from Data

University essay from Databas och informationsteknik; Tekniska högskolan

Abstract: The necessity of modeling is vital for the purpose of reasoning and diagnosing in complex systems, since the human mind might sometimes have a limited capacity and an inability to be objective. The chain graph (CG) class is a powerful and robust tool for modeling real-world applications. It is a type of probabilistic graphical models (PGM) and has multiple interpretations. Each of these interpretations has a distinct Markov property. This thesis deals with the multivariate regression chain graph (MVR-CG) interpretation. The main goal of this thesis is to implement and evaluate the results of the MVR-PC-algorithm proposed by Sonntag and Peña in 2012. This algorithm uses a constraint based approach used in order to learn a MVR-CG from data.In this study the MRV-PC-algorithm is implemented and tested to see whether the implementation is correct. For this purpose, it is run on several different independence models that can be perfectly represented by MVR-CGs. The learned CG and the independence model of the given probability distribution are then compared to ensure that they are in the same Markov equivalence class. Additionally, for the purpose of checking how accurate the algorithm is, in learning a MVR-CG from data, a large number of samples are passed to the algorithm. The results are analyzed based on number of nodes and average number of adjacents per node. The accuracy of the algorithm is measured by the precision and recall of independencies and dependencies.In general, the higher the number of samples given to the algorithm, the more accurate the learned MVR-CGs become. In addition, when the graph is sparse, the result becomes significantly more accurate. The number of nodes can affect the results slightly. When the number of nodes increases it can lead to better results, if the average number of adjacents is fixed. On the other hand, if the number of nodes is fixed and the average number of adjacents increases, the effect is more considerable and the accuracy of the results dramatically declines. Moreover the type of the random variables can affect the results. Given the samples with discrete variables, the recall of independencies measure would be higher and the precision of independencies measure would be lower. Conversely, given the samples with continuous variables, the recall of independencies would be less but the precision of independencies would be higher.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)