In this blog, I will summarize the best practices which should be used while implementing Kafka.
Before going to best practices, lets understand what is Kafka. Kafka is publish-subscribe messaging rethought as a distributed commit log and is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
Here is the high level conceptual diagram of Kafka, where in you can see Kafka cluster of size 4 (4 number of brokers) managed by Apache Zookeeper which is serving multiple of Producers and Consumers. Messages are sent to Topics. Each topic can have multiple partitions for scaling. For fault-tolerance we have to use replication factor, which ensures that messages are written in multiple partitions.
Kafka Laptop Lab Setup
To setup Kafka Laptop Lab, please install VMware Workstation, Create a Ubuntu VM, Download and unzip Kafka
wget http://apache.cs.utah.edu/kafka/0.11.0.1/kafka_2.11-0.11.0.1.tgz
tar -xvf kafka_2.11-0.11.0.1.tgz
-- Set environment parameters
vi .bashrc
--Add following 2 lines at the end of .bashrc file, save and close the file.
export KAFKA_HOME=/home/myadav/kafka_2.11-0.11.0.1;
export PATH=$PATH:$KAFKA_HOME/bin;
exit and open new terminal
-- Install JDK
sudo apt-get purge openjdk-\*
sudo mkdir -p /usr/local/java
sudo apt-get install default-jre
which java
java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
--Start Zookeeper
cd $KAFKA_HOME/bin
zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
--Start Kafka server
cd $KAFKA_HOME/bin
kafka-server-start.sh $KAFKA_HOME/config/server.properties
--Start Topic start and list
cd $KAFKA_HOME/bin
kafka-topics.sh --describe --zookeeper localhost:2181
--Start Produce console
cd $KAFKA_HOME/bin
kafka-console-producer.sh --broker-list localhost:9092 --topic mytopic
-- Start Consumer console
cd $KAFKA_HOME/bin
kafka-console-consumer.sh --zookeeper localhost:2181 --topic mytopic --from-beginning
In this screenshot, you can see, I have started Zookeeper and Kafka in top 2 terminals, in middle terminal I have created topic, and bottom 2 terminals has Producer and Consumer Console. You can see the same messages in producer and consumers.
Best Practices for Enterprise implementation
Sharing best practices for Enterprise level Kafka Implementation
I will keep appending this section on regular basis.
Before going to best practices, lets understand what is Kafka. Kafka is publish-subscribe messaging rethought as a distributed commit log and is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
Here is the high level conceptual diagram of Kafka, where in you can see Kafka cluster of size 4 (4 number of brokers) managed by Apache Zookeeper which is serving multiple of Producers and Consumers. Messages are sent to Topics. Each topic can have multiple partitions for scaling. For fault-tolerance we have to use replication factor, which ensures that messages are written in multiple partitions.
To setup Kafka Laptop Lab, please install VMware Workstation, Create a Ubuntu VM, Download and unzip Kafka
wget http://apache.cs.utah.edu/kafka/0.11.0.1/kafka_2.11-0.11.0.1.tgz
tar -xvf kafka_2.11-0.11.0.1.tgz
-- Set environment parameters
vi .bashrc
--Add following 2 lines at the end of .bashrc file, save and close the file.
export KAFKA_HOME=/home/myadav/kafka_2.11-0.11.0.1;
export PATH=$PATH:$KAFKA_HOME/bin;
exit and open new terminal
-- Install JDK
sudo apt-get purge openjdk-\*
sudo mkdir -p /usr/local/java
sudo apt-get install default-jre
which java
java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
--Start Zookeeper
cd $KAFKA_HOME/bin
zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
--Start Kafka server
cd $KAFKA_HOME/bin
kafka-server-start.sh $KAFKA_HOME/config/server.properties
--Start Topic start and list
cd $KAFKA_HOME/bin
kafka-topics.sh --create --topic mytopic --zookeeper localhost:2181 --replication-factor 1 -partitions 1
kafka-topics.sh --list --zookeeper localhost:2181kafka-topics.sh --describe --zookeeper localhost:2181
--Start Produce console
cd $KAFKA_HOME/bin
kafka-console-producer.sh --broker-list localhost:9092 --topic mytopic
-- Start Consumer console
cd $KAFKA_HOME/bin
kafka-console-consumer.sh --zookeeper localhost:2181 --topic mytopic --from-beginning
In this screenshot, you can see, I have started Zookeeper and Kafka in top 2 terminals, in middle terminal I have created topic, and bottom 2 terminals has Producer and Consumer Console. You can see the same messages in producer and consumers.
Best Practices for Enterprise implementation
Sharing best practices for Enterprise level Kafka Implementation
- Make sure that Zookeeper is on different server than Kafka Brokers.
- There should be minimum 3 to 5 zookeeper nodes in one zookeeper cluster
- Make sure that you are using latest java 1.8 with G1 collector
- There should be minimum 4-5 Kafka brokers in Kafka cluster
- Make sure that there are sufficient / optimum partitions for each topic, higher the number of partitions more parallel consumers can be added , thus resulting in a higher throughput. More partitions can increase the latency.
- There should be minimum 2 replication factor for each topic for fault-tolerance, again more number of replication factors will have impact on performance
- Make sure that you install and configure monitoring tools such as Kafka Manager
- If possible implement Kafka MirrorMaker for replication across data-centers for Disaster Recovery purpose
- For Delivery Guarantees set appropriate value for Broker Acknowledgement (“acks”)
- For exceptions / Broker Responds with error set proper values of Number of retries, retry.backoff.ms and max.in.flight.request.per.connection
I will keep appending this section on regular basis.