Evaluation of Security in Hadoop

University essay from KTH/Kommunikationsnät

Abstract:

There are different ways to store and process large amount of data. Hadoop iswidely used, one of the most popular platforms to store huge amount of dataand process them in parallel. While storing sensitive data, security plays animportant role to keep it safe. Security was not that much considered whenHadoop was initially designed. The initial use of Hadoop was managing largeamount of public web data so confidentiality of the stored data was not anissue. Initially users and services in Hadoop were not authenticated; Hadoop isdesigned to run code on a distributed cluster of machines so without properauthentication anyone could submit code and it would be executed. Differentprojects have started to improve the security of Hadoop. Two of these projectsare called project Rhino and Project Sentry [1].Project Rhino implements splittable crypto codec to provide encryptionfor the data that is stored in Hadoop distributed file system. It also developsthe centralized authentication by implementing Hadoop single sign on whichprevents repeated authentication of the users accessing the same services manytimes. From the authorization point of view Project Rhino provides cell-basedauthorization for Hbase [2].Project Sentry provides fine-grained access control by supporting role-basedauthorization which different services can be bound to it to provide authorizationfor their users [3].It is possible to combine security enhancements which have been done inthe Project Rhino and Project Sentry to further improve the performance andprovide better mechanisms to secure Hadoop.In this thesis, the security of the system in Hadoop version 1 and Hadoopversion 2 is evaluated and different security enhancements are proposed, consideringsecurity improvements made by the two aforementioned projects, ProjectRhino and Project Sentry, in terms of encryption, authentication, and authorization.This thesis suggests some high-level security improvements on theCentralized authentication system (Hadoop Single Sign on) implementationmade by Project Rhino.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)