Multi-Tenant Apache Kafka for Hops : Kafka Topic-Based Multi-Tenancy and ACL- Based Authorization for Hops

University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)

Abstract: Apache Kafka is a distributed, high throughput and fault-tolerant publish/subscribe messaging system in the Hadoop ecosystem. It is used as a distributed data streaming and processing platform. Kafka topics are the units of message feeds in the Kafka cluster. Kafka producer publishes messages into these topics and a Kafka consumer subscribes to topics to pull those messages. With the increased usage of Kafka in the data infrastructure of many companies, there are many Kafka clients that publish and consume messages to/from the Kafka topics. In fact, these client operations can be malicious. To mitigate this risk, clients must authenticate themselves and their operation must be authorized before they can access to a given topic. Nowadays, Kafka ships with a pluggable Authorizer interface to implement access control list (ACL) based authorization for client operation. Kafka users can implement the interface differently to satisfy their security requirements. SimpleACLAuthorizer is the out-of-box implementation of the interface and uses a Zookeeper for ACLs storage.HopsWorks, based on Hops a next generation Hadoop distribution, provides support for project-based multi-tenancy, where projects are fully isolated at the level of the Hadoop Filesystem and YARN. In this project, we added Kafka topicbased multi-tenancy in Hops projects. Kafka topic is created from inside Hops project and persisted both at the Zookeeper and the NDBCluster. Persisting a topic into a database enabled us for topic sharing across projects. ACLs are added to Kafka topics and are persisted only into the database. Client access to Kafka topics is authorized based on these ACLs. ACLs are added, updated, listed and/or removed from the HopsWorks WebUI. HopsACLAuthorizer, a Hops implementation of the Authorizer interface, authorizes Kafka client operations using the ACLs in the database. The Apache Avro schema registry for topics enabled the producer and consumer to better integrate by transferring a preestablished message format. The result of this project is the first Hadoop distribution that supports Kafka multi-tenancy.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)