Evaluating Data Quality for behavioural event data using semiotic theory : Analysing how data roles perceive Data Quality and how it is influenced by Data Quality awareness and experience

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Today companies are handling and producing big data. To maximise the value of the data, companies need to achieve high data quality (DQ), and be able to measure it. This study analyses if semiotic framework is suitable to asses DQ for big data, specifically for behavioural event data. The research also investigates how data roles perceive DQ and how DQ awareness and experience influence DQ perception. The case study is conducted within the media company Schibsted. The investigation is carried out using semiotic framework on Schibsted’s data and surveying data consumers, producers and brokers. From the results it is possible to conclude that semiotic framework can be used for behavioural event data. However, the metrics should be easy to understand and the data should be sampled at the source. Moreover, the sample used in the survey should be equally distributed between data consumers, producers and brokers to minimise bias toward one of the data roles. The results also show that data roles give more importance to DQ criteria linked to their role. The level of DQ awareness and experience have a slight influence on the DQ perception but the sample size is too limited to affirm such a statement. The research can be extended by applying semiotic framework at different companies and use-case scenarios.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)