Review the LLM Application
15 minutesIn the final step of the workshop, we’ll deploy an application to our OpenShift cluster that uses the instruct and embeddings models.
What is LangChain?
Like most applications that interact with LLMs, our application is written in Python. It also uses LangChain, which is an open-source orchestration framework that simplifies the development of applications powered by LLMs.
Application Overview
Connect to the LLMs
Our application starts by connecting to two LLMs that we’ll be using:
meta/llama-3.2-1b-instruct: used for responding to user promptsnvidia/llama-3.2-nv-embedqa-1b-v2: used to calculate embeddings
Why are there two models? Here’s a helpful analogy:
- The Embedding model is the “Librarian” (it helps find the right books),
- The Instruct model is the “Writer” (it reads the books and writes the answer).
Define the Prompt Template
The application then defines a prompt template that will be used in interactions
with the meta/llama-3.2-1b-instruct LLM:
Note how we’re explicitly instructing the LLM to just say it doesn’t know the answer if it doesn’t know, which helps minimize hallucinations. There’s also a placeholder for us to provide context that the LLM can use to answer the question.
Connect to the Vector Database
The application then connects to the vector database that was pre-populated with NVIDIA data sheet documents:
Define the Chain
The application uses LCEL (LangChain Expression Language) to define the chain.
The | (pipe) symbol works like an assembly line; the output of one step becomes
the input for the next.
Let’s break this down step-by-step:
- Step 1: The Input Map {…}: We are preparing the ingredients for our prompt.
- context: We turn our vector store into a retriever. This acts like a search engine that finds the most relevant snippets from our NVIDIA data sheets based on the user’s question.
- question: We use RunnablePassthrough() to ensure the user’s original question is passed directly into the prompt.
- Note: These keys (context and question) map directly to the {context} and {question} placeholders we defined in our prompt template earlier.
- Step 2: The prompt: This is the instruction manual. It takes the context and the question and formats them using the prompt template (e.g., “Answer the question using only the context…”).
- Step 3: The llm: This is the “Engine” (like GPT-4). It reads the formatted prompt and generates a response.
- Step 4: The StrOutputParser(): By default, AI models return complex objects. This “cleaner” ensures we get back a simple, readable string of text.
Invoke the Chain
Finally, the application invokes the chain by passing the end user’s question in as input:
This is the “Start” button. You drop the end users’ question into the beginning of the pipeline, and it flows through the retriever, the prompt, and the LLM until the answer comes out the other side.