Privacy-Preserving Big Data in an In-Memory Analytics Solution

University essay from Luleå/Department of Computer science, Electrical and Space engineering

Abstract: In the modern information society, a high volume and a tremendous variety of data are produced at any given time and are facilitated by technological advances. Commercial organizations have been the first to embrace this change and most organizations employ a wide range of information systems to support their work as a result. As the number of systems increases, the usage also increases, which results in more data being produced. Social networks are another phenomenon that also contributes to the tremendous growth of data. This exceptional amount of data is referred to by a new term, “big data”. Several properties are associated with the term big data, but the most important properties are volume, velocity, variety and veracity. This implies that, in the context of big data analytics, volumes and varieties of data from multiple sources are collected, cleansed, processed and analyzed to support making decisions or finding solutions to problems. However, in some cases, the requirements are to provide these capabilities in real time. This is called real-time big data analytics, which implies that analytical steps are performed in real time, but this could be quite demanding in terms of implementation and operations. In addition, it also introduces new challenges in the form of applying and maintaining security, and one of the areas of concern is how to preserve privacy when publishing data, especially when considering analytical scenarios in which a high degree of accuracy is required to make decisions. In conclusion, privacy is critical because, if sensitive data fall into the wrong hands, this could have serious consequences. Thus, the purpose of this thesis is to study multiple models for privacy preservation in an In-memory based real-time big data analytics solution, and to subsequently evaluate and analyze the outcome to propose one optimum model that supports the privacy requirements without compromising the analytical aspect of the solution. The result shows that a newly developed model using native capabilities of such environment fulfills all the requirements including the most important requirement of high data accuracy.

  CLICK HERE TO DOWNLOAD THE WHOLE ESSAY. (in PDF format)