Application of data warehousing and data mining in forecasting cancer diseases threats

University essay from Blekinge Tekniska Högskola/Avdelningen för programvarusystem

Abstract: Multidimensional analysis, trends analysis, summaries and drill-downs as data warehousing methods of choice provided rich, valuable and detailed perspective of cancer threats in terms of virtually any dimension covered by data. These allowed to model the risk of cancer including age, race, sex and survival chances among others, to spot most dangerous and incident cancers, revealed how little survival chances and treatment efficiency increased over last 30 years and how little early diagnosis was improved, presented trends and changes in them and changes in cancer risk related to place of residence and emphasized the importance of risk mitigation by screening and healthy lifestyle. These methods also turned out to be easy, requiring less computer science related knowledge as one could expect. With little support from IT staff, oncology domain professionals can easily benefit from vast data sets and analytical power applied to it. Data mining algorithms evaluated over melanoma of the skin data managed to extract what's already known in the domain. Therefore, when used by oncology professionals over less generic data one can expect data mining to have the potential of extending experts' knowledge. Neural networks, decision trees and clusters showed higher prediction accuracy than Naive Bayes classifiers and association rules but it is advised to merge results from many algorithms. Findings by particular algorithms are often disjoint and when combined, allow to reveal more despite varying predictive performance. Analysis of caCORE system and systemic integration experiment proved that building a large-scale oncological data system integrating distributed data is extremely complex. Integrating with it requires a lot of effort to understand its structures, prepare data mappings and implement integration procedures. Strict cooperation of IT and oncology professionals is mandatory. Suggestions were made to simplify the generic caCORE data model (ontology) or split it into smaller parts and expose as much integration functionality as web interfaces or encapsulated classes to decrease the complexity of the process. Tweaked like that, caCORE would be fully feasible and could be considered as the future of application of data warehousing and data mining techniques in oncology, providing distributed and common-model compliant dataset and leveraging the power of research community.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)