{"id":1518,"date":"2024-11-15T09:54:57","date_gmt":"2024-11-15T09:54:57","guid":{"rendered":"https:\/\/nas01.tallpaul.net\/wordpress\/?p=1518"},"modified":"2025-06-24T16:39:00","modified_gmt":"2025-06-24T15:39:00","slug":"deploy-llm-on-power10-for-inferencing","status":"publish","type":"post","link":"https:\/\/nas01.tallpaul.net\/wordpress\/2024\/11\/deploy-llm-on-power10-for-inferencing\/","title":{"rendered":"Build &#038; Deploy LLM &#038; RAG Applications with OpenShift"},"content":{"rendered":"\n<p>Hands-on lab guide<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Build &amp; Deploy LLM &amp; RAG Applications with OpenShift\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/3NplORwm_Wo?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">27 minutes recorded live demonstration<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Table of Contents<\/h2>\n\n\n\n<p class=\"has-large-font-size\">1 <a href=\"#Introduction\">Introduction<\/a><\/p>\n\n\n\n<p>1.1 <a href=\"#About-this-hands-on-lab\">About this hands-on lab<\/a><\/p>\n\n\n\n<p class=\"has-large-font-size\">2 <a href=\"#Getting-Started\">Getting-Started<\/a><\/p>\n\n\n\n<p>2.1 <a href=\"#Assumptions\">Assumptions<\/a><\/p>\n\n\n\n<p>2.2 <a href=\"#Connect-to-the-Bastion-CLI\">Connect to the Bastion CLI<\/a><\/p>\n\n\n\n<p>2.3 <a href=\"#Login-to-OpenShift\">Login to OpenShift<\/a><\/p>\n\n\n\n<p>2.4 <a href=\"#Copy-login-command-for-CLI-use\">Copy login command for CLI use<\/a><\/p>\n\n\n\n<p class=\"has-large-font-size\">3 <a href=\"#Working-with-the-inference-runtime-llama.cpp\">Working with the inference runtime llama.cpp<\/a><\/p>\n\n\n\n<p>3.1 <a href=\"#Building-an-inference-runtime-container-with-llama.cpp-library\">Building an inference runtime container with llama.cpp library<\/a><\/p>\n\n\n\n<p>3.2 <a href=\"#Pushing-the-built-image-to-OpenShift-on-a-new-project\">Pushing the built image to OpenShift on a new project<\/a><\/p>\n\n\n\n<p>3.3 <a href=\"#Deploy-the-runtime-using-Mistral-model-on-the-namespace\">Deploy the runtime using Mistral model on the namespace<\/a><\/p>\n\n\n\n<p class=\"has-medium-font-size\">3.4 <a href=\"#Create-a-namespace-and-deploy-the-pre-built-Mistral-Model-container\">Create a namespace and deploy the pre-built Mistral Model container<\/a><\/p>\n\n\n\n<p>3.5 <a href=\"#Build-and-deploy-llama.cpp-with-IBM-Granite-3-LLM-within-OpenShift\">Build and deploy llama.cpp with LLM and OpenShift<\/a><\/p>\n\n\n\n<p class=\"has-large-font-size\">4 <a href=\"#Opening-the-Inference-runtime-UI\">Opening-the-Inference-runtime-UI<\/a><\/p>\n\n\n\n<p class=\"has-large-font-size\">5 <a href=\"#Retrieval-Augmented-Generation\">Retrieval-Augmented-Generation<\/a><\/p>\n\n\n\n<p>5.1 <a href=\"#Deploying-a-Vector-Database\">Deploying-a-Vector-Database<\/a><\/p>\n\n\n\n<p>5.2 <a href=\"#Querying-the-RAG-model\">Querying the RAG model<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Introduction\">1 Introduction<\/h2>\n\n\n\n<p>In this lab you&#8217;ll use a pre-trained Large Language Model and deploy it on OpenShift. It will make use of the unique Power10 features such as the Vector Scalar Extension (VSX) as well as the newly introduced Matrix Math Accelerator (MMA) engines.<\/p>\n\n\n\n<p>This lab makes use of the following technologies.<\/p>\n\n\n\n<p><strong>IBM Power10<\/strong> is a high-performance microprocessor designed for IBM Power servers. It features advanced architecture and technology innovations, such as a high-bandwidth cache hierarchy, improved memory subsystem, and support for<br>accelerated machine learning workloads. Power10 is designed to deliver high performance, scalability, and energy efficiency for enterprise and cloud computing applications.<\/p>\n\n\n\n<p><strong>IBM Power MMA (Matrix Math Accelerator)<\/strong> technology is a hardware-based solution designed to accelerate machine learning workloads on IBM Power servers. It includes specialized AI processors and software optimizations to improve the performance of machine learning tasks, such as training and inference, on these systems.<\/p>\n\n\n\n<p><strong>Vector Scalar Extension (VSX)<\/strong> is a set of instructions and architecture extensions for IBM POWER processors that enables efficient processing of vector (array-like) data types. VSX allows for faster and more efficient execution of tasks that involve large amounts of data, such as scientific computing, data analysis, and machine learning workloads.<\/p>\n\n\n\n<p><strong>A large language model (LLM) <\/strong>is an artificial intelligence model trained on a vast amount of text data, enabling it to generate human-like text based on the input it receives. These models can be used for various natural language processing tasks such as translation, summarization, and conversation.<\/p>\n\n\n\n<p><strong>RAG stands for Retrieval Augmented Generation<\/strong>. It&#8217;s a method that combines the power of retrieval systems with language models to generate more accurate and relevant responses. In this approach, the model first retrieves relevant information from a large external knowledge source (like a database or the internet), and then uses this information to generate a response.<\/p>\n\n\n\n<p><strong>llama.cpp<\/strong> is an open-source software library written mostly in C++ that performs inference on various large language models such as Llama. It is co-developed alongside the GGML project, a general-purpose tensor library.<\/p>\n\n\n\n<p>Command-line tools are included with the library, alongside a server with a simple web interface.<\/p>\n\n\n\n<p><strong>MinIO <\/strong>is an open-source, high-performance object storage server compatible with the Amazon S3 API. It&#8217;s designed to provide reliable and scalable storage for cloud-native applications, big data, and AI workloads. MinIO is a community-driven project supported by MinIO Inc., a company founded by the original creators of MinIO.<\/p>\n\n\n\n<p><strong>MinIO <\/strong>offers several key features:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>High Performance<\/strong>: MinIO is optimized for high throughput and low latency, making it suitable for large-scale object storage deployments.<\/li>\n\n\n\n<li><strong>Scalability<\/strong>: It can handle petabytes of data and thousands of concurrent users, allowing you to scale your storage needs as required.<\/li>\n\n\n\n<li><strong>Reliability<\/strong>: MinIO ensures data durability and availability with built-in replication and erasure coding capabilities.<\/li>\n\n\n\n<li><strong>Security<\/strong>: It supports encryption at rest and in transit, ensuring the confidentiality and integrity of your data.<\/li>\n\n\n\n<li><strong>Integration<\/strong>: MinIO is compatible with the Amazon S3 API, making it easy to migrate existing applications and workflows to MinIO.<\/li>\n<\/ol>\n\n\n\n<p><strong>MinIO <\/strong>is widely used in various industries, including cloud computing, big data, AI, and machine learning, due to its high performance, scalability, reliability, and security features. It&#8217;s an excellent choice for organizations looking for a robust, open-source object storage solution.<\/p>\n\n\n\n<p><strong>Milvus <\/strong>is an open-source vector database built for AI applications, particularly those involving machine learning models that generate high-dimensional vectors for similarity search and clustering. It&#8217;s designed to handle large-scale vector data with low latency and high throughput, making it ideal for applications like recommendation systems, image and video search, and natural language processing.<\/p>\n\n\n\n<p><strong>Milvus <\/strong>offers several key features:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Vector Search<\/strong>: Milvus enables fast and efficient vector similarity search, allowing you to find similar vectors in large datasets<br>quickly.<\/li>\n\n\n\n<li><strong>Scalability<\/strong>: It can handle petabytes of data and thousands of concurrent queries, ensuring your AI applications can scale as needed.<\/li>\n\n\n\n<li><strong>Flexibility<\/strong>: Milvus supports various vector types, including embeddings, features, and distances, making it suitable for a wide range of use cases.<\/li>\n\n\n\n<li><strong>Integration<\/strong>: Milvus is compatible with popular machine learning frameworks like TensorFlow, PyTorch, and scikit-learn, allowing you to integrate it seamlessly into your existing workflows.<\/li>\n\n\n\n<li><strong>Real-time Analytics<\/strong>: Milvus supports real-time vector search, enabling you to process and analyze data in near real-time.<\/li>\n<\/ol>\n\n\n\n<p><strong>Milvus <\/strong>is used in various industries, including e-commerce, gaming, and entertainment, to power AI applications that rely on vector similarity search and clustering. It&#8217;s an excellent choice for organizations looking for a scalable, flexible, and high-performance vector database for their AI applications.<\/p>\n\n\n\n<p><strong>Streamlit <\/strong>is an open-source Python library for creating interactive web applications for data science, machine learning, and machine learning model deployment. It enables data scientists and developers to build user-friendly interfaces for their models, making it easier to share and deploy AI solutions with non-technical users.<\/p>\n\n\n\n<p><strong>Streamlit <\/strong>offers several key features:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Interactive Web Apps<\/strong>: Streamlit allows you to create interactive web applications that display your data, visualizations, and machine learning models in a user-friendly interface.<\/li>\n\n\n\n<li><strong>Easy to Use<\/strong>: Streamlit has a simple and intuitive API, making it easy to build and deploy web applications without requiring extensive web development knowledge.<\/li>\n\n\n\n<li><strong>Real-time Updates<\/strong>: Streamlit supports real-time updates, allowing users to see changes to your data or models instantly.<\/li>\n\n\n\n<li><strong>Sharing and Deployment<\/strong>: Streamlit makes it easy to share your applications with others by generating a URL that can be accessed by anyone. You can also deploy your applications as web apps using platforms like Heroku, AWS, or Google Cloud.<\/li>\n\n\n\n<li><strong>Integration<\/strong>: Streamlit is compatible with popular data science libraries like Pandas, NumPy, and Scikit-learn, making it easy to integrate machine learning models into your web applications.<\/li>\n<\/ol>\n\n\n\n<p><strong>Streamlit <\/strong>is used in various industries, including finance, healthcare, and marketing, to build user-friendly interfaces for AI solutions.<br>It&#8217;s an excellent choice for data scientists and developers looking to share and deploy their AI models with non-technical users.<\/p>\n\n\n\n<p>Etcd is a distributed key-value store that provides a reliable way to store and manage configuration data, service discovery, and remote procedure calls (RPCs) in distributed systems. It&#8217;s widely used in container orchestration platforms like Kubernetes, Docker Swarm, and Mesos to manage the configuration and state of these systems.<\/p>\n\n\n\n<p><strong>Etcd <\/strong>offers several key features:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Distributed Key-Value Store<\/strong>: Etcd is a distributed key-value store that stores data across multiple nodes, ensuring high availability and fault tolerance.<\/li>\n\n\n\n<li><strong>Reliability<\/strong>: Etcd provides strong consistency guarantees and ensures that data is never lost, even in the event of node failures.<\/li>\n\n\n\n<li><strong>Scalability<\/strong>: Etcd can handle large-scale deployments with thousands of nodes and petabytes of data.<\/li>\n\n\n\n<li><strong>Security<\/strong>: Etcd supports encryption at rest and in transit, ensuring the confidentiality and integrity of your data.<\/li>\n\n\n\n<li><strong>Integration<\/strong>: Etcd is compatible with various programming languages and platforms, making it easy to integrate into existing systems.<\/li>\n<\/ol>\n\n\n\n<p><strong>Etcd <\/strong>is used in various industries, including cloud computing, big data, and AI, to manage the configuration and state of distributed systems. It&#8217;s an excellent choice for organizations looking for a reliable, scalable, and secure key-value store for their distributed systems.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"About-this-hands-on-lab\">1.1 About this hands-on lab<\/h3>\n\n\n\n<p>The first part of the lab in Section 2 focuses on how to access and interact with the lab and describes the following steps:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Connecting to the SSH console for the bastion<\/li>\n\n\n\n<li>Connecting to the OpenShift Web GUI interface on a browser<\/li>\n\n\n\n<li>Connecting to the OpenShift Command Line Interface (CLI) via PuTTY<\/li>\n<\/ul>\n\n\n\n<p>The second part of the lab in Sections 3 and 4 focuses on these topics:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Building the llama.cpp container from scratch<\/li>\n\n\n\n<li>Deploying the llama.cpp container within a project in OpenShift<\/li>\n<\/ul>\n\n\n\n<p>The Third part of the lab in Section 5 demonstrates the use of Retrieval-Augmented Generation (RAG) and focuses on these topics:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Deploying a vector database<\/li>\n\n\n\n<li>Querying the AI model on additional data provided via RAG<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Power 10 MMA, LLM &amp; RAG Demo\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/tQm2r73RgxY?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">Quick 13-minute silent step-by-step demo walk-through <\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Getting-Started\">2 Getting Started<\/h2>\n\n\n\n<p><strong>Please Note<\/strong>:<\/p>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code><strong>Commands that you should execute are displayed in bold blue txt. Left click within this area to copy the command to you clipboard.<\/strong><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Example output from commands that have been executed are displayed with white txt with black background.<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Bullet items are required actions<\/li>\n<\/ul>\n\n\n\n<p>Standard paragraphs are for informational purposes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Assumptions\">2.1 Assumptions<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>You have access to an OpenShift Container Platform (OCP) environment running on IBM Power10 \n<ul class=\"wp-block-list\">\n<li>Refer to my colleague&#8217;s blog, &#8220;<a rel=\"noreferrer noopener\" href=\"https:\/\/community.ibm.com\/community\/user\/powerdeveloper\/blogs\/sebastian-lehrig\/2024\/03\/26\/sizing-for-ai\" target=\"_blank\"><strong>Sizing and configuring an LPAR for AI workloads<\/strong><\/a>&#8220;<\/li>\n\n\n\n<li><strong>Make a note of the OCP Console URL<\/strong><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>You have access to a RHEL based Bastion LPAR\n<ul class=\"wp-block-list\">\n<li><strong>Make a note of the Bastion IP address<\/strong><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>The OCP user that you use is \u201ccecuser\u201d, if not, then just substitute \u201ccecuser\u201d for your own OCP user name\n<ul class=\"wp-block-list\">\n<li><strong>Make a note the OCP user name, if not using \u201ccecuser\u201d<\/strong><\/li>\n\n\n\n<li><strong>Make a note of the \u201ccecuser\u201d password<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Connect-to-the-Bastion-CLI\">2.2 Connect to the Bastion CLI<\/h3>\n\n\n\n<p>Now we will go over the steps to connect to the CLI for the environment.<\/p>\n\n\n\n<p>I typically use the Putty application, but you are free to use your favourite terminal.<\/p>\n\n\n\n<p>Putty is available for download&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/www.putty.org\/\" target=\"_blank\">HERE<\/a><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Install Putty from the above link if desired.<\/li>\n\n\n\n<li>Open the Putty application.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"894\" height=\"367\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-icon.jpg\" alt=\"\" class=\"wp-image-1418\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-icon.jpg 894w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-icon-300x123.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-icon-768x315.jpg 768w\" sizes=\"auto, (max-width: 894px) 100vw, 894px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Fill the hostname with the IP address of your Bastion. This may be found in your assigned Project Kit if using IBM TechZone resources.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"573\" height=\"566\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty.jpg\" alt=\"\" class=\"wp-image-1419\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty.jpg 573w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-300x296.jpg 300w\" sizes=\"auto, (max-width: 573px) 100vw, 573px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Press Open<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"554\" height=\"546\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-Open.jpg\" alt=\"\" class=\"wp-image-1420\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-Open.jpg 554w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-Open-300x296.jpg 300w\" sizes=\"auto, (max-width: 554px) 100vw, 554px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Click Accept<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"810\" height=\"607\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-Accept.jpg\" alt=\"\" class=\"wp-image-1421\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-Accept.jpg 810w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-Accept-300x225.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-Accept-768x576.jpg 768w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You will see a \u201clogin as:\u201d prompt, type cecuser and press enter:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"948\" height=\"473\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-login.jpg\" alt=\"\" class=\"wp-image-1424\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-login.jpg 948w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-login-300x150.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Putty-login-768x383.jpg 768w\" sizes=\"auto, (max-width: 948px) 100vw, 948px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enter the password for the \u201ccecuser\u201d user. This will be the same for both the CLI and for the GUI.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Login-to-OpenShift\">2.3 Login to OpenShift<\/h3>\n\n\n\n<p>To login to the OpenShift environment from the command line, find the&nbsp;<strong><em>oc login<\/em><\/strong>&nbsp;command from your OpenShift GUI.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Point your browser to your OpenShift web console<\/li>\n\n\n\n<li>Accept the certificate warning if certificates have not been configured correctly on demo equipment.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"672\" height=\"541\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Warning.jpg\" alt=\"\" class=\"wp-image-1426\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Warning.jpg 672w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Warning-300x242.jpg 300w\" sizes=\"auto, (max-width: 672px) 100vw, 672px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Click on the htpasswd option:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"441\" height=\"218\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Login-1.jpg\" alt=\"\" class=\"wp-image-1397\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Login-1.jpg 441w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Login-1-300x148.jpg 300w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Add your user and password contained on the step 1 and Click login.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"250\" height=\"206\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Login-2-1.jpg\" alt=\"\" class=\"wp-image-1399\"\/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Familiarize yourself with the navigation for approximately 10 minutes if it\u2019s your first time. You can easily switch between Developer and Administrator views using the menu option located at the top left corner.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"481\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-1024x481.jpg\" alt=\"\" class=\"wp-image-1400\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-1024x481.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-300x141.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-768x360.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-1536x721.jpg 1536w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-2048x961.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Copy-login-command-for-CLI-use\">2.4 Copy login command for CLI use<\/h3>\n\n\n\n<p>If you need to login again to the CLI, for any reason, you can find the login command on main OpenShift web console page.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>On the top right side, you will see the cecuser drop down click on it and then on \u201cCopy login command\u201d<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"478\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Copy-Login-Cmd-1024x478.jpg\" alt=\"\" class=\"wp-image-1402\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Copy-Login-Cmd-1024x478.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Copy-Login-Cmd-300x140.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Copy-Login-Cmd-768x359.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Copy-Login-Cmd-1536x718.jpg 1536w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Copy-Login-Cmd-2048x957.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Once again click on the htpasswd option:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"441\" height=\"218\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Login-1.jpg\" alt=\"\" class=\"wp-image-1397\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Login-1.jpg 441w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Login-1-300x148.jpg 300w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Add your user and password contained on the step 1 and Click login.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"250\" height=\"206\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Login-2-1.jpg\" alt=\"\" class=\"wp-image-1399\"\/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Click on Display Token on the top left<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"155\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/display-token-1024x155.jpg\" alt=\"\" class=\"wp-image-1404\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/display-token-1024x155.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/display-token-300x45.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/display-token-768x116.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/display-token-1536x232.jpg 1536w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/display-token-2048x310.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You can use the oc login command whenever your Authorization is expired. You may need to use the API token for login in into the registry.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"242\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Token-1024x242.jpg\" alt=\"\" class=\"wp-image-1405\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Token-1024x242.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Token-300x71.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Token-768x181.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Token-1536x362.jpg 1536w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/Token-2048x483.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>As cecuser, copy and paste the oc login command from the web page into your Putty Session.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc login --token=&#91;Add your own token] server=&#91;Add you own server]<\/code><\/pre>\n\n\n\n<p>For example:<\/p>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>oc login --token=sha256~8HzJyuecqujfeCXsaDnAeUUJ9VMsLafr-cJk5yn8tGk --server=https:\/\/api.p1289.cecc.ihost.com:6443\n\nLogged into \"https:\/\/api.p1289.cecc.ihost.com:6443\" as \"cecuser\" using the token provided.You have access to 71 projects, the list has been suppressed. You can list all projects with 'oc projects'Using project \"default\".<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Working-with-the-inference-runtime-llama.cpp\">3 Working with the inference runtime llama.cpp <\/h2>\n\n\n\n<p>Do not mistake llama.cpp with the Llama models that Facebook made available. You may run Llama models into llama.cpp but you can also run other models like Mistral and Granite too.<\/p>\n\n\n\n<p>There are pre-built containers available to use with this lab too. If you want to perform the creation process of the Inference library llama.cpp container from scratch, follow through sections 3.1 to 3.3 <strong>(it requires additional memory in your bastion LPAR and takes about 10 minutes of extra lab time)<\/strong>. If you want to avoid that and use a pre-built container instead, skip sections 3.1 to 3.3 and go to section 3.4.<\/p>\n\n\n\n<p class=\"has-medium-font-size\">The easiest way to get up and running fast, is typically to head straight to <a href=\"#Build-and-deploy-llama.cpp-with-IBM-Granite-3-LLM-within-OpenShift\">Section 3.5<\/a>, where you initiate the build and deployment from the bastion, but within OpenShift.<strong>  <\/strong><\/p>\n\n\n\n<p class=\"has-large-font-size\"><strong>TechZone and TechXchange users should move directly to <a href=\"#Build-and-deploy-llama.cpp-with-IBM-Granite-3-LLM-within-OpenShift\">Section 3.5<\/a> as you will not have enough memory on your Bastion to compile the source and build the containers.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Building-an-inference-runtime-container-with-llama.cpp-library\">3.1 Building an inference runtime container with llama.cpp library<\/h3>\n\n\n\n<p>By following this chapter, you can build the runtime container from scratch using the Dockerfile bellow. This Dockerfile is already automatically loaded to your system when performing the git clone command below.<\/p>\n\n\n\n<p>Please note, that this exercise requires significant memory to complete. If you do not have enough memory on your bastion, then you can skip direct to section 3.4, where you are directed to build and deploy the application within OpenShift.  <\/p>\n\n\n\n<p class=\"has-medium-font-size\">Skip directly to <a href=\"#Build-and-deploy-llama.cpp-with-IBM-Granite-3-LLM-within-OpenShift\">Section 3.5<\/a> for the easiest way to build and deploy the llama.cpp application from within OpenShift. TechZone and TechXchange users should move directly to <a href=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-admin\/post.php?post=1518&amp;action=edit#Build-and-deploy-llama.cpp-with-IBM-Granite-3-LLM-within-OpenShift\">Section 3.5<\/a> as you will not have enough memory on your Bastion to compile the source and build the containers.<\/p>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>FROM registry.access.redhat.com\/ubi9\/ubi as builder\n\n#################################################\n<strong># Creating a compiler environment for the build <\/strong>\n#################################################\nRUN dnf update -y  &amp;&amp; dnf -y groupinstall 'Development Tools' &amp;&amp; dnf install -y \\\n  cmake git ninja-build \\\n  &amp;&amp; dnf clean all\n\n####################################################\n<strong># Downloading and compiling OpenBLAS for a compiler environment for the build #<\/strong>\n####################################################<strong>\n<\/strong>RUN git clone --recursive https:\/\/github.com\/DanielCasali\/OpenBLAS.git &amp;&amp; cd OpenBLAS &amp;&amp; \\\n    make -j$(nproc --all) TARGET=POWER10 DYNAMIC_ARCH=1 &amp;&amp; \\\n    make PREFIX=\/opt\/OpenBLAS install &amp;&amp; \\\n    cd \/\n\n############################################################\n<strong># Downloading and compiling llama.cpp using the OpenBLAS Library we just compiled:<\/strong>\n############################################################<strong>\n<\/strong>RUN git clone https:\/\/github.com\/DanielCasali\/llama.cpp.git &amp;&amp; cd llama.cpp &amp;&amp; sed -i \"s\/powerpc64le\/native -mvsx -mtune=native -D__POWER10_VECTOR__\/g\" ggml\/src\/CMakeLists.txt &amp;&amp; \\\n    mkdir build; \\\n    cd build; \\\n    cmake -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DBLAS_INCLUDE_DIRS=\/opt\/OpenBLAS\/include -G Ninja ..; \\\n    cmake --build . --config Release\n\nCMD bash\n########################################################\n<strong># Copying the built executable and libraries needed for the runtime on a simple ubi9 so it is a small container<\/strong>\n########################################################<strong>\n<\/strong>FROM registry.access.redhat.com\/ubi9\/ubi\n\nCOPY --from=builder --chmod=755 \/llama.cpp\/build\/bin\/llama-server \/usr\/local\/bin\nCOPY --from=builder --chmod=644 \/llama.cpp\/build\/src\/libllama.so \/llama.cpp\/build\/src\/libllama.so\nCOPY --from=builder --chmod=644 \/llama.cpp\/build\/ggml\/src\/libggml.so \/llama.cpp\/build\/ggml\/src\/libggml.so\n\nENTRYPOINT &#91; \"\/usr\/local\/bin\/llama-server\", \"--host\", \"0.0.0.0\"]<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Using the putty console you opened on step 2.3, you need to install git to clone the project that has the assets to help us through the Lab. Note that git may already be pre-installed in some of the lab environments.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>sudo dnf -y install git<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>After git gets successfully installed, clone the project from GitHub:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>git clone https:\/\/github.com\/DanielCasali\/mma-ai.git<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Cloning into 'mma-ai'...\nremote: Enumerating objects: 8, done.\nremote: Counting objects: 100% (8\/8), done.\nremote: Compressing objects: 100% (7\/7), done.\nremote: Total 8 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)\nReceiving objects: 100% (8\/8), 6.26 KiB | 6.26 MiB\/s, done.\nResolving deltas: 100% (1\/1), done.\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enter the mma-ai\/llama-runtime project:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>cd mma-ai\/llama-runtime\/<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build the container to use for the runtime. You will use the Dockerfile you saw on the beginning of this section. You will use the OpenShift internal registry, so the command inside \u201c$( )\u201d retrieves the OpenShift internal registry host so we can tag the image there:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>podman build . --tag $(oc get routes -A |grep image-registry|awk '{print $3}')\/ai\/llama-runtime-ubi:latest<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>&#91;1\/2] STEP 1\/5: FROM registry.access.redhat.com\/ubi9\/ubi AS builder\n&#91;1\/2] STEP 2\/5: RUN dnf update -y  &amp;&amp; dnf -y groupinstall 'Development Tools' &amp;&amp; dnf install -y   cmake git ninja-build   &amp;&amp; dnf clean all\nUpdating Subscription Management repositories.\nUnable to read consumer identity\nsubscription-manager is operating in container mode.\n.\n.\n.\n&#91;2\/2] STEP 1\/5: FROM registry.access.redhat.com\/ubi9\/ubi\n&#91;2\/2] STEP 2\/5: COPY --from=builder --chmod=755 \/llama.cpp\/build\/bin\/llama-server \/usr\/local\/bin\n--&gt; 2066dbbb20a8\n&#91;2\/2] STEP 3\/5: COPY --from=builder --chmod=644 \/llama.cpp\/build\/src\/libllama.so \/llama.cpp\/build\/src\/libllama.so\n--&gt; 06e978aead69\n&#91;2\/2] STEP 4\/5: COPY --from=builder --chmod=644 \/llama.cpp\/build\/ggml\/src\/libggml.so \/llama.cpp\/build\/ggml\/src\/libggml.so\n--&gt; 896f86c2f229\n&#91;2\/2] STEP 5\/5: ENTRYPOINT &#91; \"\/usr\/local\/bin\/llama-server\", \"--host\", \"0.0.0.0\"]\n&#91;2\/2] COMMIT default-route-OpenShift-image-registry.apps.p1325.cecc.ihost.com\/ai\/llama-runtime-ubi:latest\n--&gt; 1f99895367aa\nSuccessfully tagged default-route-OpenShift-image-registry.apps.p1325.cecc.ihost.com\/ai\/llama-runtime-ubi:latest\n<\/code><\/pre>\n\n\n\n<p>The previous step takes some time to download and compile Open BLAS and llama.cpp.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Pushing-the-built-image-to-OpenShift-on-a-new-project\">3.2 Pushing the built image to OpenShift on a new project<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create the ai project where you will run the container:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc new-project ai<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Now using project \"ai\" on server \"https:\/\/api.p1325.cecc.ihost.com:6443\".\n\nYou can add applications to this project with the 'new-app' command. For example, try:\n\n    oc new-app rails-postgresql-example\n\nto build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:\n\n    kubectl create deployment hello-node --image=registry.k8s.io\/e2e-test-images\/agnhost:2.43 -- \/agnhost serve-hostname\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>On the CLI, run the <em>podman login<\/em> command. The command within \u201c$( )\u201d retrieves the OpenShift internal registry host. You need to type the Username as <strong>cecuser.<\/strong><\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>podman login $(oc get routes -A |grep image-registry|awk '{print $3}') --tls-verify=false<\/code><\/pre>\n\n\n\n<p>Use the information you have from the OpenShift Token tab you left open on Firefox, copy the string right bellow \u201c<strong>Your API token is<\/strong>\u201d.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1008\" height=\"282\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/API-Token.jpg\" alt=\"\" class=\"wp-image-1520\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/API-Token.jpg 1008w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/API-Token-300x84.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/API-Token-768x215.jpg 768w\" sizes=\"auto, (max-width: 1008px) 100vw, 1008px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Paste the API token on the Password field and press enter.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Login Succeeded!<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use podman push to get the container into the internal registry:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>podman push $(oc get routes -A |grep image-registry|awk '{print $3}')\/ai\/llama-runtime-ubi:latest \\\n--tls-verify=false\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Getting image source signatures\nCopying blob 28a720baac6f skipped: already exists\nCopying blob 8f522959366b skipped: already exists\nCopying blob 8c672c500f73 skipped: already exists\nCopying blob fe14819e82ca skipped: already exists\nCopying config a85918e36b done   |\nWriting manifest to image destination\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Deploy-the-runtime-using-Mistral-model-on-the-namespace\">3.3 Deploy the runtime using Mistral model on the namespace<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply the deployment that runs the container just built and downloads the Mistral model from hugging face:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc apply -f mistral-deploy.yaml<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>deployment.apps\/llama-cpp-server created<\/code><\/pre>\n\n\n\n<p>The above yaml file is a Kubernetes deployment configuration for an application named &#8220;llama-cpp-server&#8221;. It specifies that there should be one replica of the application running. The application consists of two containers: &#8220;fetch-model-data&#8221; and &#8220;llama-cpp&#8221;.<\/p>\n\n\n\n<p>The &#8220;fetch-model-data&#8221; container is an init container that fetches a model file from a specified URL and saves it<br>to a volume named &#8220;llama-models&#8221;. This container does not start the main application.<\/p>\n\n\n\n<p>The &#8220;llama-cpp&#8221; container is the main application container. It uses the fetched model file and runs with certain arguments and resource limits. It listens on port 8080 for HTTP requests. The container has readiness and liveness probes to check its status.<\/p>\n\n\n\n<p>The deployment also specifies an emptyDir volume named &#8220;llama-models&#8221; for the containers to use.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply the service and the route to access the content of the llama.cpp runtime:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code><strong>oc create -f llama-svc.yaml<\/strong><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>service\/llama-service created<\/code><\/pre>\n\n\n\n<p>The above yaml file defines a Kubernetes Service named &#8220;llama-service&#8221; with the following specifications:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>It belongs to the &#8220;app&#8221; namespace and is labeled as &#8220;llama-service&#8221;.<\/li>\n\n\n\n<li>The service type is &#8220;ClusterIP&#8221;, meaning it is only accessible within the cluster.<\/li>\n\n\n\n<li>It exposes port 8080 (TCP) from the selected pods.<\/li>\n\n\n\n<li>The service selector matches pods with the label &#8220;app: llama-cpp-server&#8221;<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc create -f llama-route.yaml<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>route.route.OpenShift.io\/llama-cpp created<\/code><\/pre>\n\n\n\n<p>The above yaml file is a Route configuration for OpenShift. It creates a route named &#8220;llama-cpp&#8221; that directs traffic to the &#8220;llama-service&#8221; service, using the &#8220;llama-cpp-server&#8221; target port. The route does not use TLS (tls: null). The application associated with this route is labeled as &#8220;app: llama-service&#8221;.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You can verify the status of the runtime readiness by checking the pod status and its progress, as shown below.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc get pods<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>NAME                                READY   STATUS    RESTARTS   AGE\nllama-cpp-server-664bddbbcc-9fmf8   0\/1     Init:0\/1  0          1m3s\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You can repeat the command <strong>oc get pod<\/strong> until you see the READY <strong>1\/1<\/strong> (in <strong><em>bold italics <\/em><\/strong>bellow). It takes about 6 to 8 minutes for its completion.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc get pods<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>NAME                                READY   STATUS    RESTARTS   AGE\nllama-cpp-server-664bddbbcc-9fmf8   <strong><em>1\/1 <\/em><\/strong>    <strong><em>Running   <\/em><\/strong>0          5m43s\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use this time to read more about the llama.cpp runtime we are using: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Llama.cpp\">https:\/\/en.wikipedia.org\/wiki\/Llama.cpp<\/a><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Create-a-namespace-and-deploy-the-pre-built-Mistral-Model-container\">3.4 Create a namespace and deploy the pre-built Mistral Model container<\/h3>\n\n\n\n<p>If you created the Inference library llama.cpp container from scratch by executing the steps in sections 3.1 to 3.3, skip this and go to <a href=\"#Opening-the-Inference-runtime-UI\">section 4<\/a>.<\/p>\n\n\n\n<p>The steps below guide you to using the pre-built llama.cpp container.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Using the putty console you opened in step 2.3, install git to clone the project with the assets to help us through the Lab. Note that some lab environments might already have git installed.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>sudo dnf -y install git<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>git clone https:\/\/github.com\/DanielCasali\/mma-ai.git<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Cloning into 'mma-ai'...\nremote: Enumerating objects: 8, done.\nremote: Counting objects: 100% (8\/8), done.\nremote: Compressing objects: 100% (7\/7), done.\nremote: Total 8 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)\nReceiving objects: 100% (8\/8), 6.26 KiB | 6.26 MiB\/s, done.\nResolving deltas: 100% (1\/1), done.\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enter the mma-ai\/llama-runtime project:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>cd mma-ai\/llama-runtime\/<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create the ai project where we will run the container:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc new-project ai<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Now using project \"ai\" on server \"https:\/\/api.p1325.cecc.ihost.com:6443\".\n\nYou can add applications to this project with the 'new-app' command. For example, try:\n\n    oc new-app rails-postgresql-example\n\nto build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:\n\n    kubectl create deployment hello-node --image=registry.k8s.io\/e2e-test-images\/agnhost:2.43 -- \/agnhost serve-hostname\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply the deployment for the ready container runtime that pulls mistral model from Hugging Face. <\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc apply -f mistral-deploy-ready.yaml<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>deployment.apps\/llama-cpp-server created<\/code><\/pre>\n\n\n\n<p>The above describes a Kubernetes Deployment named &#8220;llama-cpp-server&#8221;. It has one replica and uses the latest version of the specified image for the container. The container is named &#8220;llama-cpp&#8221; and runs a command to load a model file from a volume. The container exposes port 8080 for HTTP traffic.<\/p>\n\n\n\n<p>The deployment includes an init container named &#8220;fetch-model-data&#8221; that fetches the model file from a URL if it doesn&#8217;t already exist in the specified volume.<\/p>\n\n\n\n<p>The container has readiness and liveness probes configured to check its status. The readiness probe waits 30 seconds before checking if the container is ready, while the liveness probe checks every 10 seconds. Both probes use an HTTP GET request to the root path of the container.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply the service and the route:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc apply  -f llama-svc.yaml<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>service\/llama-service created<\/code><\/pre>\n\n\n\n<p>The above YAML file defines a Kubernetes Service named &#8220;llama-service&#8221; with the type &#8220;ClusterIP&#8221;. It listens on port 8080 using TCP protocol and forwards traffic to pods selected by the service, which are those labeled as &#8220;app: llama-cpp-server&#8221;.<\/p>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc apply -f llama-route.yaml<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>route.route.openshift.io\/llama-cpp created<\/code><\/pre>\n\n\n\n<p>The above YAML file defines a Route resource in OpenShift, which is a way to expose a Service to the internet. The Route has the following characteristics:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>It&#8217;s named &#8220;llama-cpp&#8221;.<\/li>\n\n\n\n<li>It belongs to the application &#8220;llama-service&#8221;.<\/li>\n\n\n\n<li>It points to the Service named &#8220;llama-service&#8221;.<\/li>\n\n\n\n<li>It doesn&#8217;t use TLS (insecure).<\/li>\n\n\n\n<li>It uses port 80 (or the port specified by &#8220;targetPort&#8221; in the Service) and forwards traffic to the &#8220;llama-cpp-server&#8221; target.<\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You can verify the status of the runtime readiness by checking the pod status and its progress, as shown below.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc get pods<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>NAME                                READY   STATUS    RESTARTS   AGE\nllama-cpp-server-664bddbbcc-9fmf8   0\/1     Init:0\/1  0          1m3s\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You can repeat the command <strong>oc get pod<\/strong> until you see the READY <strong>1\/1<\/strong> (in <strong><em>bold italics<\/em><\/strong>). It takes about 6 to 8 minutes for its completion.<\/li>\n\n\n\n<li>Use this time to read more about the llama.cpp runtime we are using: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Llama.cpp\">https:\/\/en.wikipedia.org\/wiki\/Llama.cpp<\/a><\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc get pods<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>NAME                                READY   STATUS    RESTARTS   AGE\nllama-cpp-server-664bddbbcc-9fmf8   <strong><em>1\/1     Running<\/em><\/strong>   0          5m43s\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Build-and-deploy-llama.cpp-with-IBM-Granite-3-LLM-within-OpenShift\">3.5 Build and deploy llama.cpp with LLM and OpenShift<\/h3>\n\n\n\n<p>If you created the Inference library llama.cpp container in sections 3.1 to 3.4, skip this section and go directly to <a href=\"#Opening-the-Inference-runtime-UI\">section 4<\/a> to access the llama.cpp application.<\/p>\n\n\n\n<p>The steps below guide you to <strong>build and deploy<\/strong> the llama.cpp application within OpenShift. <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Using putty console you opened in step 2.3, install git to clone the project that has the assets to help us through the Lab. Note some lab environments might already have git installed on them.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>sudo dnf -y install git<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Updating Subscription Management repositories.\nLast metadata expiration check: 3:30:55 ago on Wed 13 Nov 2024 01:51:13 AM EST.\nDependencies resolved.\n================================================================================\n Package        Arch    Version         Repository                         Size\n================================================================================\nInstalling:\n git            ppc64le 2.43.5-1.el9_4  rhel-9-for-ppc64le-appstream-rpms  54 k\nInstalling dependencies:\n emacs-filesystem\n                noarch  1:27.2-10.el9_4 rhel-9-for-ppc64le-appstream-rpms 9.3 k\n git-core       ppc64le 2.43.5-1.el9_4  rhel-9-for-ppc64le-appstream-rpms 4.8 M\n git-core-doc   noarch  2.43.5-1.el9_4  rhel-9-for-ppc64le-appstream-rpms 2.9 M\n perl-DynaLoader\n                ppc64le 1.47-481.el9    rhel-9-for-ppc64le-appstream-rpms  26 k\n perl-Error     noarch  1:0.17029-7.el9 rhel-9-for-ppc64le-appstream-rpms  46 k\n perl-File-Find noarch  1.37-481.el9    rhel-9-for-ppc64le-appstream-rpms  26 k\n perl-Git       noarch  2.43.5-1.el9_4  rhel-9-for-ppc64le-appstream-rpms  39 k\n perl-TermReadKey\n                ppc64le 2.38-11.el9     rhel-9-for-ppc64le-appstream-rpms  41 k\n\nTransaction Summary\n================================================================================\nInstall  9 Packages\n\nTotal download size: 8.0 M\nInstalled size: 41 M\nDownloading Packages:\n(1\/9): perl-DynaLoader-1.47-481.el9.ppc64le.rpm 257 kB\/s |  26 kB     00:00\n(2\/9): perl-Error-0.17029-7.el9.noarch.rpm      414 kB\/s |  46 kB     00:00\n(3\/9): perl-TermReadKey-2.38-11.el9.ppc64le.rpm 336 kB\/s |  41 kB     00:00\n(4\/9): perl-File-Find-1.37-481.el9.noarch.rpm   487 kB\/s |  26 kB     00:00\n(5\/9): git-2.43.5-1.el9_4.ppc64le.rpm           867 kB\/s |  54 kB     00:00\n(6\/9): perl-Git-2.43.5-1.el9_4.noarch.rpm       591 kB\/s |  39 kB     00:00\n(7\/9): git-core-doc-2.43.5-1.el9_4.noarch.rpm    26 MB\/s | 2.9 MB     00:00\n(8\/9): git-core-2.43.5-1.el9_4.ppc64le.rpm       28 MB\/s | 4.8 MB     00:00\n(9\/9): emacs-filesystem-27.2-10.el9_4.noarch.rp 177 kB\/s | 9.3 kB     00:00\n--------------------------------------------------------------------------------\nTotal                                            27 MB\/s | 8.0 MB     00:00\nRunning transaction check\nTransaction check succeeded.\nRunning transaction test\nTransaction test succeeded.\nRunning transaction\n  Preparing        :                                                        1\/1\n  Installing       : git-core-2.43.5-1.el9_4.ppc64le                        1\/9\n  Installing       : git-core-doc-2.43.5-1.el9_4.noarch                     2\/9\n  Installing       : emacs-filesystem-1:27.2-10.el9_4.noarch                3\/9\n  Installing       : perl-File-Find-1.37-481.el9.noarch                     4\/9\n  Installing       : perl-DynaLoader-1.47-481.el9.ppc64le                   5\/9\n  Installing       : perl-TermReadKey-2.38-11.el9.ppc64le                   6\/9\n  Installing       : perl-Error-1:0.17029-7.el9.noarch                      7\/9\n  Installing       : git-2.43.5-1.el9_4.ppc64le                             8\/9\n  Installing       : perl-Git-2.43.5-1.el9_4.noarch                         9\/9\n  Running scriptlet: perl-Git-2.43.5-1.el9_4.noarch                         9\/9\n  Verifying        : perl-Error-1:0.17029-7.el9.noarch                      1\/9\n  Verifying        : perl-TermReadKey-2.38-11.el9.ppc64le                   2\/9\n  Verifying        : perl-DynaLoader-1.47-481.el9.ppc64le                   3\/9\n  Verifying        : perl-File-Find-1.37-481.el9.noarch                     4\/9\n  Verifying        : git-2.43.5-1.el9_4.ppc64le                             5\/9\n  Verifying        : git-core-2.43.5-1.el9_4.ppc64le                        6\/9\n  Verifying        : git-core-doc-2.43.5-1.el9_4.noarch                     7\/9\n  Verifying        : perl-Git-2.43.5-1.el9_4.noarch                         8\/9\n  Verifying        : emacs-filesystem-1:27.2-10.el9_4.noarch                9\/9\nInstalled products updated.\n\nInstalled:\n  emacs-filesystem-1:27.2-10.el9_4.noarch   git-2.43.5-1.el9_4.ppc64le\n  git-core-2.43.5-1.el9_4.ppc64le           git-core-doc-2.43.5-1.el9_4.noarch\n  perl-DynaLoader-1.47-481.el9.ppc64le      perl-Error-1:0.17029-7.el9.noarch\n  perl-File-Find-1.37-481.el9.noarch        perl-Git-2.43.5-1.el9_4.noarch\n  perl-TermReadKey-2.38-11.el9.ppc64le\n\nComplete!<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>git clone https:\/\/github.com\/DanielCasali\/mma-ai.git<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Cloning into 'mma-ai'...\nremote: Enumerating objects: 186, done.\nremote: Counting objects: 100% (16\/16), done.\nremote: Compressing objects: 100% (14\/14), done.\nremote: Total 186 (delta 2), reused 16 (delta 2), pack-reused 170 (from 1)\nReceiving objects: 100% (186\/186), 159.22 MiB | 51.42 MiB\/s, done.\nResolving deltas: 100% (81\/81), done.\nUpdating files: 100% (58\/58), done.\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enter the mma-ai\/llama-runtime project:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>cd mma-ai\/llama-runtime\/<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create the ai project where we will run the container:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc new-project ai<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Now using project \"ai\" on server \"https:\/\/api.p1325.cecc.ihost.com:6443\".\n\nYou can add applications to this project with the 'new-app' command. For example, try:\n\n    oc new-app rails-postgresql-example\n\nto build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:\n\n    kubectl create deployment hello-node --image=registry.k8s.io\/e2e-test-images\/agnhost:2.43 -- \/agnhost serve-hostname\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply the deployment for the ready container runtime that pulls mistral model from Hugging Face:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc apply -f .\/build-deploy-mistral.yaml<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>deployment.apps\/mma-ai created\nbuildconfig.build.openshift.io\/mma-ai created\nimagestream.image.openshift.io\/mma-ai created\nservice\/mma-ai created\nservice\/llama-service created\nroute.route.openshift.io\/mma-ai created\n<\/code><\/pre>\n\n\n\n<p>The above yaml file describes a Kubernetes Deployment named &#8220;mma-ai&#8221; with the following characteristics:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>It runs one replica of the container.<\/li>\n\n\n\n<li>The container uses the OpenShift image registry.<\/li>\n\n\n\n<li>The container uses the mistral-7b-instruct-v0.3.Q4_K_M.gguf LLM and exposes port 8080.<\/li>\n\n\n\n<li>The container has a health check with an HTTP GET request to the root path (&#8220;\/&#8221;).<\/li>\n\n\n\n<li>The application also includes a BuildConfig for Docker strategy named &#8220;mma-ai&#8221; that uses the source code from a Git repository and builds the Docker image tagged as &#8220;mma-ai:latest&#8221;.<\/li>\n\n\n\n<li>There is an ImageStream named &#8220;mma-ai&#8221; that looks up the local image.<\/li>\n\n\n\n<li>The application has two services, one internal named &#8220;mma-ai&#8221; and another named &#8220;llama-service,&#8221; both of which expose port 8080 internally.<\/li>\n\n\n\n<li>There is a Route named &#8220;mma-ai&#8221; that exposes the internal service externally with HTTPS termination and insecureEdgeTerminationPolicy set to &#8220;Redirect&#8221;.<\/li>\n<\/ol>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You can verify the status of the runtime readiness by checking the pod status. You can check its progress as shown below.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc get pods<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>NAME                      READY   STATUS     RESTARTS   AGE\nmma-ai-1-build            1\/1     Running    0          15s\nmma-ai-699d775754-wlgp4   0\/1     Init:0\/1   0          15s\n\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You can repeat the command <strong>oc get pod<\/strong> until you see the READY <strong>1\/1<\/strong> (in <strong><em>bold italics<\/em><\/strong>). It takes about 4 to 8 minutes for its completion.<\/li>\n\n\n\n<li>Use this time to read more about the llama.cpp runtime we are using: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Llama.cpp\">https:\/\/en.wikipedia.org\/wiki\/Llama.cpp<\/a><\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>watch oc get pods<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>Every 2.0s: oc get pods                  p1280-bastion: Wed Nov 13 06:09:26 2024\n\nNAME                      READY   STATUS      RESTARTS   AGE\nmma-ai-1-build            0\/1     Completed   0          4m41s\nmma-ai-699d775754-wlgp4   1\/1     <strong>Running     <\/strong>0          4m41s\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use CONTROL-C to break out if you previously used the watch command to monitor the pods building<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Opening-the-Inference-runtime-UI\">4 Opening the Inference runtime UI<\/h2>\n\n\n\n<p>Now that we deployed the Inference runtime, we can open and work it from the OpenShift Graphical User Interface.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>From the GUI click on the \u201cAdministrator\u201d drop down:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"481\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-1024x481.jpg\" alt=\"\" class=\"wp-image-1400\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-1024x481.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-300x141.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-768x360.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-1536x721.jpg 1536w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/10\/OCP-Admin-Dev-2048x961.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Toggle to the Developer view by clicking in \u201cDeveloper\u201d:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1008\" height=\"322\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Developer.jpg\" alt=\"\" class=\"wp-image-1522\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Developer.jpg 1008w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Developer-300x96.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Developer-768x245.jpg 768w\" sizes=\"auto, (max-width: 1008px) 100vw, 1008px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Click skip tour:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"788\" height=\"271\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/skip-tour.jpg\" alt=\"\" class=\"wp-image-1523\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/skip-tour.jpg 788w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/skip-tour-300x103.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/skip-tour-768x264.jpg 768w\" sizes=\"auto, (max-width: 788px) 100vw, 788px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Click on the AI project:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1008\" height=\"287\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/AI-Project.jpg\" alt=\"\" class=\"wp-image-1524\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/AI-Project.jpg 1008w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/AI-Project-300x85.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/AI-Project-768x219.jpg 768w\" sizes=\"auto, (max-width: 1008px) 100vw, 1008px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Click \u201cTopology\u201d:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1008\" height=\"311\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Topology.jpg\" alt=\"\" class=\"wp-image-1525\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Topology.jpg 1008w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Topology-300x93.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Topology-768x237.jpg 768w\" sizes=\"auto, (max-width: 1008px) 100vw, 1008px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Click the button to open the llama-cpp-server UI:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1008\" height=\"314\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/llama-cpp-server.jpg\" alt=\"\" class=\"wp-image-1526\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/llama-cpp-server.jpg 1008w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/llama-cpp-server-300x93.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/llama-cpp-server-768x239.jpg 768w\" sizes=\"auto, (max-width: 1008px) 100vw, 1008px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Test your Inferencing by querying the inferencing runtime at the \u201cType a message, (Shift + Enter to add a new line\u201d box)<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"529\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/what-is-llama.cpp_-1024x529.jpg\" alt=\"\" class=\"wp-image-1996\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/what-is-llama.cpp_-1024x529.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/what-is-llama.cpp_-300x155.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/what-is-llama.cpp_-768x397.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/what-is-llama.cpp_-1536x794.jpg 1536w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/what-is-llama.cpp_.jpg 1913w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Remember, this AI lab has no access to the system or to the Internet, so it does not know what day today is, or the time-of-day, or what is the weather is like. If you ask something like this, it will generate hallucinations, which will be funny to read. <\/p>\n\n\n\n<p>Some suggestions on how to experiment with this prompt is to ask it which languages it can translate sentences to and then asking it to translate a sentence.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You can experiment with some prompt engineering input to it and ask it the same question based on different viewpoints. For example, you may tell the model to pretend it\u2019s an Italian chef (tell it in the System Message box) and ask it what the best dish in the word is, as shown below. Then you can tell it to pretend it\u2019s a French chef and ask it the same question. If you don\u2019t see the System prompt anymore in the GUI, just refresh the URL in the browser.<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Select the &#8220;settings&#8221; gear to get to the System Message box<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"530\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-1024x530.jpg\" alt=\"\" class=\"wp-image-1997\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-1024x530.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-300x155.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-768x398.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-1536x795.jpg 1536w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings.jpg 1910w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Add your prompt, for example, &#8220;You are a British Chef&#8221;, and click save.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"830\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/prompt-british-chef.jpg\" alt=\"\" class=\"wp-image-1998\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/prompt-british-chef.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/prompt-british-chef-278x300.jpg 278w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start a new conversation, for the new setting to take effect, ask your question and review the answer<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"542\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/Best-Dish-British-1-1024x542.jpg\" alt=\"\" class=\"wp-image-2000\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/Best-Dish-British-1-1024x542.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/Best-Dish-British-1-300x159.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/Best-Dish-British-1-768x407.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/Best-Dish-British-1.jpg 1190w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">British Chef<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Change the prompt to a French Chef, by selecting the Settings gear, update and  save the new prompt. <\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"530\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-1024x530.jpg\" alt=\"\" class=\"wp-image-1997\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-1024x530.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-300x155.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-768x398.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings-1536x795.jpg 1536w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/llama.cpp-settings.jpg 1910w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Settings gear to change the prompt<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Save the new prompt<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"504\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/French-Prompt.jpg\" alt=\"\" class=\"wp-image-2002\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/French-Prompt.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/French-Prompt-300x197.jpg 300w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><figcaption class=\"wp-element-caption\">Set prompt to be French Chef<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start a new conversation for the new prompt to take effect<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"541\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/New-Conversation-1024x541.jpg\" alt=\"\" class=\"wp-image-2001\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/New-Conversation-1024x541.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/New-Conversation-300x158.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/New-Conversation-768x406.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/New-Conversation.jpg 1193w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Start new conversation<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ask the question and review the reply, using the new prompt.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"541\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/French-dish-1024x541.jpg\" alt=\"\" class=\"wp-image-2003\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/French-dish-1024x541.jpg 1024w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/French-dish-300x159.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/French-dish-768x406.jpg 768w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2025\/03\/French-dish.jpg 1190w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">French Dish<\/figcaption><\/figure>\n\n\n\n<p>Example questions:<\/p>\n\n\n\n<p>Prompt<\/p>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>You are an Italian Chef<\/code><\/pre>\n\n\n\n<p>Question<\/p>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>What is the best dish in the world?<\/code><\/pre>\n\n\n\n<p>Prompt<\/p>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>You are a British Chef<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>What is the best dish in the world?<\/code><\/pre>\n\n\n\n<p>Prompt<\/p>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>You are a computer architect<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>Why should I run AI workloads close to data source?<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>Why should I run AI workloads on IBM Power10?<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>What is IBM Power MMA technology?<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>What is Vector Scalar Extension (VSX)?<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Retrieval-Augmented-Generation\">5 Retrieval Augmented Generation<\/h2>\n\n\n\n<p>An approach for using Generative AI with additional or confidential data without using lots of money on expensive GPUs to re-train or fine-tune a model is applying Retrieval Augmented Generation (RAG) to it. RAG allows a pre-trained model to leverage new, updated data provided to it on the fly, and allows the model to generate the answer using that data primarily, as well as being able to tell us where the source for the answer was found.&nbsp;<\/p>\n\n\n\n<p>In the following sections, you will deploy a Vector Database and then use a python application that uses the vector database to perform the Retrieval Augmented Generation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Deploying-a-Vector-Database\">5.1 Deploying a Vector Database<\/h3>\n\n\n\n<p>We will use Milvus vector database as the foundation for this part of the lab.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Change path to the milvus directory:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>cd ..\/<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Apply all yaml files in the directory:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc apply -f milvus\/<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>deployment.apps\/etcd-deployment created\nservice\/etcd-service created\ndeployment.apps\/milvus-deployment created\nservice\/milvus-service created\nconfigmap\/milvus-config created\npersistentvolumeclaim\/minio-pvc created\ndeployment.apps\/minio-deployment created\nservice\/minio-service created\nroute.route.openshift.io\/minio-console created\n<\/code><\/pre>\n\n\n\n<p>This creates some more elements in your OpenShift AI project as shown by the figure below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"483\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Circles.jpg\" alt=\"\" class=\"wp-image-1530\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Circles.jpg 940w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Circles-300x154.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Circles-768x395.jpg 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Next, apply the streamlit definition:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc apply -f streamlit<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-base-color has-contrast-background-color has-text-color has-background\"><code>pod\/streamlit created\nroute.route.openshift.io\/streamlit created\nservice\/streamlit created\n<\/code><\/pre>\n\n\n\n<p>This creates the Streamlit service as shown in the figure below.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"947\" height=\"491\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Streamit.jpg\" alt=\"\" class=\"wp-image-1531\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Streamit.jpg 947w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Streamit-300x156.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Streamit-768x398.jpg 768w\" sizes=\"auto, (max-width: 947px) 100vw, 947px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You may then click on the Streamlit service endpoint to open its interface, as demonstrated in the figure below.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1008\" height=\"525\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/streamit-topology.jpg\" alt=\"\" class=\"wp-image-1532\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/streamit-topology.jpg 1008w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/streamit-topology-300x156.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/streamit-topology-768x400.jpg 768w\" sizes=\"auto, (max-width: 1008px) 100vw, 1008px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The Web GUI for the RAG model opens up, loads up the RAG model and such (takes about 3 minutes as it\u2019s building up the vector database), and after that you can interact with a prompt similarly to what\u2019s shown below:<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"672\" height=\"493\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/RAG-PDF.jpg\" alt=\"\" class=\"wp-image-1533\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/RAG-PDF.jpg 672w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/RAG-PDF-300x220.jpg 300w\" sizes=\"auto, (max-width: 672px) 100vw, 672px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">5.2 Querying the RAG model<\/h3>\n\n\n\n<p>At this point, the AI model is ready to be queried and was loaded with additional data that it was not trained on and had no idea about it. Specifically, it was loaded with a PDF story which you can find below:<\/p>\n\n\n\n<div data-wp-interactive=\"core\/file\" class=\"wp-block-file\"><object data-wp-bind--hidden=\"!state.hasPdfPreview\" hidden class=\"wp-block-file__embed\" data=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/The_Forgotten_Lighthouse_Book.pdf\" type=\"application\/pdf\" style=\"width:100%;height:950px\" aria-label=\"Embed of The_Forgotten_Lighthouse_Book.\"><\/object><a id=\"wp-block-file--media-e3a13df8-f878-41fd-a19d-1f18295b3fb6\" href=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/The_Forgotten_Lighthouse_Book.pdf\">The_Forgotten_Lighthouse_Book<\/a><a href=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/The_Forgotten_Lighthouse_Book.pdf\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-e3a13df8-f878-41fd-a19d-1f18295b3fb6\">Download<\/a><\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>One interesting question to ask the model is asking it who the starfish is and how does it know about it. To answer that question, the model will have to \u201cread\u201d the book (ie, it was given the PDF to load as additional input data) and inference on the information from it. As you can see in the figure above, Grandpa writes a letter to Sarah and calls her \u201cmy little starfish\u201d. So, the AI model should answer the question as Sarah and that it knows it from the letter Grandpa writes to her.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>Who is the starfish and how do you know it?<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"884\" height=\"200\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Enter-question.jpg\" alt=\"\" class=\"wp-image-1535\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Enter-question.jpg 884w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Enter-question-300x68.jpg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Enter-question-768x174.jpg 768w\" sizes=\"auto, (max-width: 884px) 100vw, 884px\" \/><\/figure>\n\n\n\n<p>Other questions that you may ask the model about the Forgotten Lighthouse book are available below<\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" href=\"https:\/\/raw.githubusercontent.com\/DanielCasali\/mma-ai\/main\/datasource\/The_Forgotten_Lighthouse_Question.pdf\" target=\"_blank\">https:\/\/raw.githubusercontent.com\/DanielCasali\/mma-ai\/main\/datasource\/The_Forgotten_Lighthouse_Question.pdf<\/a>. <\/p>\n\n\n\n<div data-wp-interactive=\"core\/file\" class=\"wp-block-file\"><object data-wp-bind--hidden=\"!state.hasPdfPreview\" hidden class=\"wp-block-file__embed\" data=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/The_Forgotten_Lighthouse_Question.pdf\" type=\"application\/pdf\" style=\"width:100%;height:953px\" aria-label=\"Embed of The_Forgotten_Lighthouse_Question.\"><\/object><a id=\"wp-block-file--media-f7dd1269-7e59-44db-9843-8facab755551\" href=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/The_Forgotten_Lighthouse_Question.pdf\">The_Forgotten_Lighthouse_Question<\/a><a href=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/The_Forgotten_Lighthouse_Question.pdf\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-f7dd1269-7e59-44db-9843-8facab755551\">Download<\/a><\/div>\n\n\n\n<p>The AI model has not been trained on any of that book\u2019s information, so all answers it provides uses RAG and demonstrates how a pre-trained AI model can be extended to inferencing on data without being trained on that data. In real life, this means that you can run a pre-trained AI model on your own data without having ever sending this data out of your premises for training the model itself, and you don\u2019t need to use expensive GPU accelerators for achieving that!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Next Steps<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Please let me know how you get on with this tutorial<\/li>\n\n\n\n<li>Contact me for help deploying OpenShift and AI on IBM Power.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Clean up<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Remove Git:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>sudo dnf remove git<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Remove local repository:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>cd; rm -rdf mma-ai<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Remove project:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-vivid-cyan-blue-color has-contrast-background-color has-text-color has-background\"><code>oc delete project ai<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-social-links aligncenter has-huge-icon-size has-icon-color is-style-default is-layout-flex wp-block-social-links-is-layout-flex\"><li style=\"color: #ffffff; \" class=\"wp-social-link wp-social-link-wordpress  wp-block-social-link\"><a rel=\"noopener nofollow\" target=\"_blank\" href=\"https:\/\/nas01.tallpaul.net\/wordpress\/\" class=\"wp-block-social-link-anchor\"><svg width=\"24\" height=\"24\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M12.158,12.786L9.46,20.625c0.806,0.237,1.657,0.366,2.54,0.366c1.047,0,2.051-0.181,2.986-0.51 c-0.024-0.038-0.046-0.079-0.065-0.124L12.158,12.786z M3.009,12c0,3.559,2.068,6.634,5.067,8.092L3.788,8.341 C3.289,9.459,3.009,10.696,3.009,12z M18.069,11.546c0-1.112-0.399-1.881-0.741-2.48c-0.456-0.741-0.883-1.368-0.883-2.109 c0-0.826,0.627-1.596,1.51-1.596c0.04,0,0.078,0.005,0.116,0.007C16.472,3.904,14.34,3.009,12,3.009 c-3.141,0-5.904,1.612-7.512,4.052c0.211,0.007,0.41,0.011,0.579,0.011c0.94,0,2.396-0.114,2.396-0.114 C7.947,6.93,8.004,7.642,7.52,7.699c0,0-0.487,0.057-1.029,0.085l3.274,9.739l1.968-5.901l-1.401-3.838 C9.848,7.756,9.389,7.699,9.389,7.699C8.904,7.67,8.961,6.93,9.446,6.958c0,0,1.484,0.114,2.368,0.114 c0.94,0,2.397-0.114,2.397-0.114c0.485-0.028,0.542,0.684,0.057,0.741c0,0-0.488,0.057-1.029,0.085l3.249,9.665l0.897-2.996 C17.841,13.284,18.069,12.316,18.069,11.546z M19.889,7.686c0.039,0.286,0.06,0.593,0.06,0.924c0,0.912-0.171,1.938-0.684,3.22 l-2.746,7.94c2.673-1.558,4.47-4.454,4.47-7.771C20.991,10.436,20.591,8.967,19.889,7.686z M12,22C6.486,22,2,17.514,2,12 C2,6.486,6.486,2,12,2c5.514,0,10,4.486,10,10C22,17.514,17.514,22,12,22z\"><\/path><\/svg><span class=\"wp-block-social-link-label screen-reader-text\">WordPress<\/span><\/a><\/li>\n\n<li style=\"color: #ffffff; \" class=\"wp-social-link wp-social-link-mail  wp-block-social-link\"><a rel=\"noopener nofollow\" target=\"_blank\" href=\"mailto:paulchapman@uk.ibm.com\" class=\"wp-block-social-link-anchor\"><svg width=\"24\" height=\"24\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M19,5H5c-1.1,0-2,.9-2,2v10c0,1.1.9,2,2,2h14c1.1,0,2-.9,2-2V7c0-1.1-.9-2-2-2zm.5,12c0,.3-.2.5-.5.5H5c-.3,0-.5-.2-.5-.5V9.8l7.5,5.6,7.5-5.6V17zm0-9.1L12,13.6,4.5,7.9V7c0-.3.2-.5.5-.5h14c.3,0,.5.2.5.5v.9z\"><\/path><\/svg><span class=\"wp-block-social-link-label screen-reader-text\">Mail<\/span><\/a><\/li>\n\n<li style=\"color: #ffffff; \" class=\"wp-social-link wp-social-link-youtube  wp-block-social-link\"><a rel=\"noopener nofollow\" target=\"_blank\" href=\"https:\/\/www.youtube.com\/@paulchapman1280\/videos\" class=\"wp-block-social-link-anchor\"><svg width=\"24\" height=\"24\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M21.8,8.001c0,0-0.195-1.378-0.795-1.985c-0.76-0.797-1.613-0.801-2.004-0.847c-2.799-0.202-6.997-0.202-6.997-0.202 h-0.009c0,0-4.198,0-6.997,0.202C4.608,5.216,3.756,5.22,2.995,6.016C2.395,6.623,2.2,8.001,2.2,8.001S2,9.62,2,11.238v1.517 c0,1.618,0.2,3.237,0.2,3.237s0.195,1.378,0.795,1.985c0.761,0.797,1.76,0.771,2.205,0.855c1.6,0.153,6.8,0.201,6.8,0.201 s4.203-0.006,7.001-0.209c0.391-0.047,1.243-0.051,2.004-0.847c0.6-0.607,0.795-1.985,0.795-1.985s0.2-1.618,0.2-3.237v-1.517 C22,9.62,21.8,8.001,21.8,8.001z M9.935,14.594l-0.001-5.62l5.404,2.82L9.935,14.594z\"><\/path><\/svg><span class=\"wp-block-social-link-label screen-reader-text\">YouTube<\/span><\/a><\/li>\n\n<li style=\"color: #ffffff; \" class=\"wp-social-link wp-social-link-linkedin  wp-block-social-link\"><a rel=\"noopener nofollow\" target=\"_blank\" href=\"https:\/\/www.linkedin.com\/in\/chapmanp\/\" class=\"wp-block-social-link-anchor\"><svg width=\"24\" height=\"24\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M19.7,3H4.3C3.582,3,3,3.582,3,4.3v15.4C3,20.418,3.582,21,4.3,21h15.4c0.718,0,1.3-0.582,1.3-1.3V4.3 C21,3.582,20.418,3,19.7,3z M8.339,18.338H5.667v-8.59h2.672V18.338z M7.004,8.574c-0.857,0-1.549-0.694-1.549-1.548 c0-0.855,0.691-1.548,1.549-1.548c0.854,0,1.547,0.694,1.547,1.548C8.551,7.881,7.858,8.574,7.004,8.574z M18.339,18.338h-2.669 v-4.177c0-0.996-0.017-2.278-1.387-2.278c-1.389,0-1.601,1.086-1.601,2.206v4.249h-2.667v-8.59h2.559v1.174h0.037 c0.356-0.675,1.227-1.387,2.526-1.387c2.703,0,3.203,1.779,3.203,4.092V18.338z\"><\/path><\/svg><span class=\"wp-block-social-link-label screen-reader-text\">LinkedIn<\/span><\/a><\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Credit<a href=\"https:\/\/www.linkedin.com\/in\/dancasali\/overlay\/about-this-profile\/\"><\/a><\/h2>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"115\" height=\"115\" data-id=\"956\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/09\/Daniel.jpg\" alt=\"\" class=\"wp-image-956\"\/><figcaption class=\"wp-element-caption\">Daniel Casali<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"499\" height=\"499\" data-id=\"1601\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Jason.jpeg\" alt=\"\" class=\"wp-image-1601\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Jason.jpeg 499w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Jason-300x300.jpeg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Jason-150x150.jpeg 150w\" sizes=\"auto, (max-width: 499px) 100vw, 499px\" \/><figcaption class=\"wp-element-caption\">Jason Liu<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"431\" height=\"431\" data-id=\"1604\" src=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Rodrigo.jpeg\" alt=\"\" class=\"wp-image-1604\" srcset=\"https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Rodrigo.jpeg 431w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Rodrigo-300x300.jpeg 300w, https:\/\/nas01.tallpaul.net\/wordpress\/wp-content\/uploads\/2024\/11\/Rodrigo-150x150.jpeg 150w\" sizes=\"auto, (max-width: 431px) 100vw, 431px\" \/><figcaption class=\"wp-element-caption\">Rodrigo Ceron<\/figcaption><\/figure>\n<\/figure>\n\n\n\n<p>\u2022 <a rel=\"noreferrer noopener\" href=\"https:\/\/www.linkedin.com\/in\/dancasali\/\" target=\"_blank\"><strong>Daniel de Souza Casali<\/strong><\/a><a href=\"https:\/\/www.linkedin.com\/in\/rceron\/overlay\/about-this-profile\/\"><\/a><\/p>\n\n\n\n<p>\u2022 <a rel=\"noreferrer noopener\" href=\"https:\/\/www.linkedin.com\/in\/xinliujason\/\" target=\"_blank\"><strong>Jason Liu<\/strong><\/a><\/p>\n\n\n\n<p>\u2022 <a rel=\"noreferrer noopener\" href=\"https:\/\/www.linkedin.com\/in\/rceron\/overlay\/about-this-profile\/\" target=\"_blank\"><strong>Rodrigo Ceron<\/strong><\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/www.linkedin.com\/in\/xinliujason\/overlay\/about-this-profile\/\"><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this lab you&#8217;ll use a pre-trained Large Language Model and deploy it on OpenShift. It will make use of the unique Power10 features such as the Vector Scalar Extension (VSX) as well as the newly introduced Matrix Math Accelerator (MMA) engines.<\/p>\n","protected":false},"author":1,"featured_media":1542,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[127,82,83,125,3,5,13,6,7],"tags":[],"class_list":["post-1518","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-open-source","category-ai","category-conference","category-guided-hands-on-lab","category-ibm","category-open-source","category-openshift","category-power-systems","category-red-hat"],"_links":{"self":[{"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/posts\/1518","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/comments?post=1518"}],"version-history":[{"count":0,"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/posts\/1518\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/media\/1542"}],"wp:attachment":[{"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/media?parent=1518"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/categories?post=1518"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nas01.tallpaul.net\/wordpress\/wp-json\/wp\/v2\/tags?post=1518"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}