Search This Blog

Thursday, October 12, 2017

Kafka - Intro, Laptop Lab Setup and Best Practices

In this blog, I will summarize the best practices which should be used while implementing Kafka.
Before going to best practices, lets understand what is Kafka. Kafka is publish-subscribe messaging rethought as a distributed commit log and is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
Here is the high level conceptual diagram of Kafka, where in you can see Kafka cluster of size 4 (4 number of brokers) managed by Apache Zookeeper which is serving multiple of Producers and Consumers. Messages are sent to Topics. Each topic can have multiple partitions for scaling. For  fault-tolerance we have to use replication factor, which ensures that messages are written in multiple partitions.


Kafka Laptop Lab Setup

To setup Kafka Laptop Lab, please install VMware Workstation, Create a Ubuntu VM, Download and unzip Kafka

wget http://apache.cs.utah.edu/kafka/0.11.0.1/kafka_2.11-0.11.0.1.tgz
tar -xvf kafka_2.11-0.11.0.1.tgz

-- Set environment parameters
vi .bashrc
--Add following 2 lines at the end of .bashrc file, save and close the file.
export KAFKA_HOME=/home/myadav/kafka_2.11-0.11.0.1;
export PATH=$PATH:$KAFKA_HOME/bin;
exit and open new terminal

-- Install JDK
sudo apt-get purge  openjdk-\*
sudo mkdir -p /usr/local/java
sudo apt-get install default-jre
which java
java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

--Start Zookeeper
cd $KAFKA_HOME/bin
zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties

--Start Kafka server
cd $KAFKA_HOME/bin
kafka-server-start.sh $KAFKA_HOME/config/server.properties

--Start Topic start and list
cd $KAFKA_HOME/bin
kafka-topics.sh --create --topic mytopic --zookeeper localhost:2181 --replication-factor 1 -partitions 1
kafka-topics.sh --list  --zookeeper localhost:2181
kafka-topics.sh --describe  --zookeeper localhost:2181

--Start Produce console
cd $KAFKA_HOME/bin
kafka-console-producer.sh --broker-list localhost:9092 --topic mytopic

-- Start Consumer console
cd $KAFKA_HOME/bin
kafka-console-consumer.sh --zookeeper localhost:2181 --topic mytopic --from-beginning

In this screenshot, you can see, I have started Zookeeper and Kafka in top 2 terminals, in middle terminal I have created topic, and bottom 2 terminals has Producer and Consumer Console. You can see the same messages in producer and consumers.




Best Practices for Enterprise implementation

Sharing best practices for Enterprise level Kafka Implementation
  1. Make sure that Zookeeper is on different server than Kafka Brokers.
  2. There should be minimum 3 to 5 zookeeper nodes in one zookeeper cluster
  3. Make sure that you are using latest java 1.8 with G1 collector 
  4. There should be minimum 4-5 Kafka brokers in Kafka cluster
  5. Make sure that there are sufficient / optimum partitions for each topic, higher the number of partitions more parallel consumers can be added , thus resulting in a higher throughput. More partitions can increase the latency.
  6. There should be minimum 2 replication factor for each topic for fault-tolerance, again more number of replication factors will have impact on performance  
  7. Make sure that you install and configure monitoring tools such as Kafka Manager
  8. If possible implement Kafka MirrorMaker for replication across data-centers for Disaster Recovery purpose 
  9. For Delivery Guarantees set appropriate value for Broker Acknowledgement (“acks”) 
  10. For exceptions / Broker Responds with error set proper values of Number of retries, retry.backoff.ms and max.in.flight.request.per.connection

I will keep appending this section on regular basis.

Monday, October 9, 2017

vRA 7.3 Implementation Sample Project Plan

VMware vRealize Automation (vRA) is the IT Automation tool of the modern Software-Defined Data Center. vRA enables IT Automation through the creation and management of personalized infrastructure, application and custom IT services (XaaS). This IT Automation lets you deploy IT services rapidly across a multi-vendor, multi-cloud infrastructure.

In this blog, I am going to describe overall vRA implementation project plan which can be used as sample for any vRA implementation.

We need variety of skills for this implementation such as, Cloud admin, OS Admin, Process expert, Monitoring tools team, Project Manager, Technical Manager etc and last but not least, a willingness from Customer Management to implement VRA!


Timelines mentioned in this sample project plan are indicative and may vary depending complexity. For example, creating handful of templates and 20 odd blueprints without any Application or Database may take less time, however when we are considering provision of App and DB using vRA, we need to consider more time, including testing time.

Information gathering stage is very important and make sure that customer understands advantages, disadvantages, product feature, limitations which needs to be considered while designing the vRA solution.

Sunday, September 18, 2016

Some of my Ideas / Innovations in past

In this blog, I want to describe few of my innovations / innovative ideas for which I was rewarded in my current and past organizations.

1) 32-bit number system ( 1991): During 11th standard / PUC, we had computer science as a professional subject, and we were learning various number systems such as binary, octa, decimal, and hexadecimal systems. I took it further ahead and developed 32 bit number system and also provided methods to convert from 32 bit to hexa, octa, decimal and binary systems. I did'nt know how to publish and take forward at that point of time, however, it became quite popular in my class and professors recognized the work I did at that point of time.


2) Clock as angular measuring equipment (1995): During the last year of my Engineering, I was playing with my mechanical table clock, and accidentally, it fall down and got damaged. Basically, when we wind the key, it was getting unwind quickly and all the hands were also moving quickly. I was thinking about what to do with this clock, and Idea strike me! One full round of the key is getting amplified and we are getting a detail reading on dial, then why not use this angular measuring equipment. I opened the clock calculated the gear ratios and came up with the scale of the clock, i.e. 1 second equal to how much angle. Then I created a platform to mount the clock and also a platform to keep the object of which we need to measure the angle. I represented this project in SEARCH in Solapur where I got the second prize.

3) Remote Monitoring system (2005): Remote Monitoring Service is Systems management solution that was designed by me around Enterprise Manager Grid Control technology, which means the majority of this solution was, in and around 10g Grid Control. It proactively monitors all components of IT infrastructure, like database, listeners, application servers, storage, you name it.. CPU, Memory, load balancer etc.. And nowadays using plug-in even third party s/w like IBM, Microsoft databases can be monitored. It immediately sends alerts and notifications to relevant and registered mail id’s, like DBA’s, Unix Administrators, Helpdesk or some times to managers as well for very critical errors with a short message. In-built intelligence through “Fixit Jobs”, say like, if a database went down, we can proactively give instructions to restart the database. Also, we had write scripts to fix regular DBA issues, for example, if a tablespace is out of space, it will automatically add a data file to that tablespace and inform the DBA, regarding the action it has taken. Customizable threshold and critical levels: different customers will be having different standards for warning and critical levels, for example, one customer may say 85 is our warning and 95% is critical limit, but these limits may be different for other customer, and this can be achieved by setting different limits for different customer, so its customizable as per customer needs. And finally, it facilitates conformity to Service Line Agreements. This became entry-level service and started generating huge revenue in the form of main services. For this innovative idea, I was awarded.

4) Question of the day (2009): We were running 24X7 monitoring operations, and we hardly used to get time for in-class training to my team. So I came up with an innovative idea, why can't they learn every day a small portion of the technology. For them the understanding of OEM was a must, so I came up with series of tasks and questions and sequenced them, and also automated them to be sent to team members on a daily basis so that that they can try these activities / try to answer these practical questions during their spare time. For this innovative idea of training, I was also awarded.

5) Automated partial and full health checks (2013) for automated health checks of the Load test environment. Before we start any load test, we need to perform a health check on the entire environment, restart services in sequence on all the 150 servers. This used to be manual and labor-intensive activity, and we need to follow the sequence, needs too many handshakes between various admin teams. We did automated all the health checks and also automated the restart of all services including Databases, Middleware, Portals, eBS , IDM etc and also inbuilt the sequence. This resulted in a saving of around $110K/year. For this, I was awarded CIO Bottom line award as well!

And here is my son's first invention at the age of 15!
http://yadavmukund.blogspot.com/2019/01/my-sons-first-innovation.html