Search This Blog

Wednesday, August 28, 2024

LLMaaS, LLMOps and Enterprise AI Platform!

In this blog, I will describe LLMaaS and LLMOps and the building block of the Enterprise AI Platform! 



LLMaaS represents the idea of providing access to powerful LLM services through an internet/intranet. It's similar to other "as-a-service" models (like SaaS, PaaS, etc.), where users can access and utilize LLMs without needing to manage the underlying infrastructure or complex development.

LLMOps focuses on the operational aspects of managing and deploying LLMs in production environments. It's essentially the application of DevOps principles to the lifecycle of LLMs

In the above block diagram, I have depicted comprehensive architecture for an enterprise-grade AI platform, designed to support various stages of development and deployment of AI models, especially focusing on Large Language Models (LLMs). It's broken down into several layers, starting with infrastructure and going up to the application level:

1. Infrastructure: This layer provides the foundation for the platform, encompassing physical resources like:  Compute: Servers for running workloads, including model training, inference, and other tasks. Storage: Storage systems for datasets, model checkpoints, and other essential data.  Network: Networking infrastructure connecting various components and enabling communication.  GPU: Specialized hardware (e.g., GPUs from NVIDIA) for accelerating AI model training and inference.

2. Private Cloud/Virtualization: This layer provides a virtualized environment that leverages virtualization technologies such as VMware, Nutanix, or Oracle's VirtualBox. It enables flexible resource allocation and isolation, making it easier to manage and scale the platform.

3. Private AI Platform: This layer represents the core of the AI platform, encompassing the following components: GPU Virtualization (Ex. Nvidia Enterprise AI or VMware BitFusion): This layer further enhances GPU utilization by providing a virtualized environment for accessing GPUs. It enables resource sharing, efficient utilization, and isolation for different AI workloads. VM (Virtual Machine): Virtual machines can be used for running specific workloads or specific components that require a dedicated environment. Kubernetes: A container orchestration system that manages and scales containerized workloads, enabling the deployment and management of microservices that make up the AI platform. Container: Containers are used to package and run applications and their dependencies in a consistent and isolated manner.

4. MLOps Platforms: These are platforms specifically designed for managing the machine learning lifecycle. They provide tools and services for: Run:ai: A platform for managing and scaling GPU resources, ensuring efficient utilization and performance during model training. Kubeflow: A platform built on Kubernetes that provides tools and infrastructure for managing the machine learning pipeline, including model training, deployment, and monitoring. DOMINO: A platform that provides collaborative tools for data scientists and machine learning engineers to manage and share their work, including data, code, and models.

5. Developer Tools and Catalog: This layer provides a set of tools and resources specifically designed for AI developers: Huggingface: A popular repository and platform for sharing pre-trained models, datasets, and code for natural language processing (NLP), enabling developers to leverage existing resources and accelerate their work. PyTorch: A widely used deep learning framework that provides tools and libraries for building and training AI models. RAY: A library and framework for building distributed applications, enabling scalable training and inference of AI models. Rasa: A framework for building conversational AI chatbots, providing tools for creating and deploying chatbots that interact with users naturally. VAC: A platform for managing and deploying AI models and applications, providing tools for monitoring and managing the lifecycle of models.

6. Experimentation Platform: This layer focuses on providing tools and infrastructure for experimenting with AI models: API Gateway: Provides a secure and managed way for developers to access the platform's services, ensuring controlled access and authentication. Micro Services: The platform is structured as a collection of independent, modular services that communicate through APIs, enabling flexibility, scalability, and isolation of functionalities. Kafka: A messaging system used for real-time data streaming and communication between different components of the platform. Redis: An in-memory data store used for caching frequently accessed data to improve performance. MongoDB, MySQL: Databases for storing metadata, experiment data, and other platform-related information. Monitors: Tools for monitoring the platform's health, performance, and other key metrics, providing insights into the overall system behavior.

7. App and Data Services: This layer represents the applications and services that leverage the AI platform's capabilities for end users. They interact with the applications through various interfaces, benefiting from the platform's capabilities.

The diagram highlights the modular and interconnected nature of a modern AI platform, showcasing how different components work together to support the entire AI development lifecycle. The platform's design emphasizes scalability, flexibility, and developer experience, enabling efficient experimentation, training, and deployment of AI models, especially large language models, for various applications.

Tuesday, February 13, 2024

GCP Overview

Similarly to the AWS summary, here is the GCP summary file

Update on 17 Oct 2024 

Cleared back to back 3 GCP certifications, one every month


1) Google Certified Professional Cloud Database Engineer (Oct-2024)
2) Google Certified Professional Machine Learning Engineer (May-2024)
3) Google Certified Professional Cloud Architect (Apr-2024)
4) Google Certified Associate Cloud Engineer (Mar-2024)

Check out my official badges on Credly!


Also, note that Google Cloud gives a free hoodie if you pass professional certification!


By winter (Dec-2024), I will collect a few more (Data and DevOps) to stay warm! :-)

Monday, October 16, 2023

Ace your AWS Certified Cloud Practitioner exam!

AWS Certified Cloud Practitioner is the first step towards your advanced AWS certifications!

So, in this blog, I am going to share how I cleared the AWS Certified Cloud Practitioner exam on my first attempt!

  • First of all, on a daily basis, you have to dedicate a couple of hours to self-study, I prefer late at night!
  • Create a free account in AWS / ACG and keep it ready
  • Understand the high-level domain/section and the number of questions/weightage for each domain/section.
  • Go through the recorded AWS training materials and videos.
  • Create a mind map of the topics. Here is the screenshot of my mind map.

AWS study Mind Map

You can download the PDF version here.

  • After each chapter/section, try it on AWS / ACG, which you created earlier 
  • After all the training is completed, attempt free sample/trial exams provided by AWS and other learning platforms.
  • Check where you made mistakes during the sample/trial exam, and take down error notes.
  • Review the mistakes before the next sample/trial test, and come up with a strategy.
  • Finally, schedule the AWS certification exam and attempt it with confidence!
  • Wish you all the best and let me know in the comments if this was helpful!

Check out my AWS Certificate

Also, please check out my other posts related to this subject

Tuesday, October 3, 2023

My Private AI setup on laptop - Step by step guide to install LLM on your local Windows Laptop!

Are you fascinated by LLMs like GPT by OpenAI, and at the same time wondering if you will compromise your privacy, IP leak, Security, or Model Inaccuracies? well, the answer is PrivateAI!

Check out how you can install and configure LLM like LLAMA 2 on your Windows laptop, and overcome the concerns listed in the above paragraph!

Before we start, I highly recommend all of you see my earlier blog which lists the difference between AI, ML, DL, and GenAI

Pre-requisites: Windows 11, minimum 10GB RAM, Nvidia GeForce MX570 GPU 

Manual Steps

1) Download and install Visual Studio Build Tools, use the defaults like the following screenshot.


2) Install Conda / Anaconda, Git, wget

3) Go to the Start menu and launch "Anaconda Powershell prompt" Run as Administrator 

> conda install PyTorch
> conda install cuda --channel nvidia/label/cuda-11.7.0b 

4) Visit the Meta AI website and register to download the model/s. After registration you will get an email, please don't delete this email.

> mkdir llama2
> cd llama2
> git clone https://github.com/facebookresearch/llama
run the download.sh using Git Bash
Provide the URL that you get from the email 
Choose 7B-chat
It will take a while to download the model (it took nearly 22 min for me)

5) Once the model is installed, you can use it programmatically and interact with it, for which you need some GUI.

Automated way: If you are finding it difficult to follow manual steps, you can try following steps
  • Using the command prompt, clone the repo "git clone https://github.com/oobabooga/text-generation-webui"
  • From the folder, double-click on install_windows.bat
  • This will install the required software on your laptop and start the UI http://127.0.0.1:7860/.
  • Create an account at https://huggingface.co/ using the same email address that you have used to download the LLAMA 2 from the Facebook website.
  • In the local UI, in the model tab search for meta-llama/Llama-2-7b and click download
  • For the download to start, your request should have been approved by Meta and you need to set HF_USR and HF_PASS environment variables in your local machine by generating a token in the Huggingface > Settings > Access Tokens.
  • Once downloaded, you can refresh, select the LLAMA, and load it (Please refer the following screenshot) 
  • Then navigate to the Chat tab and start using it without worrying about privacy, IP leaks, Security etc.
  • If you are facing issues in downloading Meta LLAMA, then try VMware/open-llama-7b-v2-open-instruct
  • Enjoy your Private AI LLM set! 

Local LLM in action! 

References: