Search This Blog

Thursday, October 12, 2017

Kafka - Intro, Laptop Lab Setup and Best Practices

In this blog, I will summarize the best practices which should be used while implementing Kafka.
Before going to best practices, lets understand what is Kafka. Kafka is publish-subscribe messaging rethought as a distributed commit log and is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
Here is the high level conceptual diagram of Kafka, where in you can see Kafka cluster of size 4 (4 number of brokers) managed by Apache Zookeeper which is serving multiple of Producers and Consumers. Messages are sent to Topics. Each topic can have multiple partitions for scaling. For  fault-tolerance we have to use replication factor, which ensures that messages are written in multiple partitions.


Kafka Laptop Lab Setup

To setup Kafka Laptop Lab, please install VMware Workstation, Create a Ubuntu VM, Download and unzip Kafka

wget http://apache.cs.utah.edu/kafka/0.11.0.1/kafka_2.11-0.11.0.1.tgz
tar -xvf kafka_2.11-0.11.0.1.tgz

-- Set environment parameters
vi .bashrc
--Add following 2 lines at the end of .bashrc file, save and close the file.
export KAFKA_HOME=/home/myadav/kafka_2.11-0.11.0.1;
export PATH=$PATH:$KAFKA_HOME/bin;
exit and open new terminal

-- Install JDK
sudo apt-get purge  openjdk-\*
sudo mkdir -p /usr/local/java
sudo apt-get install default-jre
which java
java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

--Start Zookeeper
cd $KAFKA_HOME/bin
zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties

--Start Kafka server
cd $KAFKA_HOME/bin
kafka-server-start.sh $KAFKA_HOME/config/server.properties

--Start Topic start and list
cd $KAFKA_HOME/bin
kafka-topics.sh --create --topic mytopic --zookeeper localhost:2181 --replication-factor 1 -partitions 1
kafka-topics.sh --list  --zookeeper localhost:2181
kafka-topics.sh --describe  --zookeeper localhost:2181

--Start Produce console
cd $KAFKA_HOME/bin
kafka-console-producer.sh --broker-list localhost:9092 --topic mytopic

-- Start Consumer console
cd $KAFKA_HOME/bin
kafka-console-consumer.sh --zookeeper localhost:2181 --topic mytopic --from-beginning

In this screenshot, you can see, I have started Zookeeper and Kafka in top 2 terminals, in middle terminal I have created topic, and bottom 2 terminals has Producer and Consumer Console. You can see the same messages in producer and consumers.




Best Practices for Enterprise implementation

Sharing best practices for Enterprise level Kafka Implementation
  1. Make sure that Zookeeper is on different server than Kafka Brokers.
  2. There should be minimum 3 to 5 zookeeper nodes in one zookeeper cluster
  3. Make sure that you are using latest java 1.8 with G1 collector 
  4. There should be minimum 4-5 Kafka brokers in Kafka cluster
  5. Make sure that there are sufficient / optimum partitions for each topic, higher the number of partitions more parallel consumers can be added , thus resulting in a higher throughput. More partitions can increase the latency.
  6. There should be minimum 2 replication factor for each topic for fault-tolerance, again more number of replication factors will have impact on performance  
  7. Make sure that you install and configure monitoring tools such as Kafka Manager
  8. If possible implement Kafka MirrorMaker for replication across data-centers for Disaster Recovery purpose 
  9. For Delivery Guarantees set appropriate value for Broker Acknowledgement (“acks”) 
  10. For exceptions / Broker Responds with error set proper values of Number of retries, retry.backoff.ms and max.in.flight.request.per.connection

I will keep appending this section on regular basis.

No comments: