How to Build an Enterprise Chatbot Using Qdrant and Llama 2
The entire code for the blog post can be found in the GitHub link here: https://github.com/vardhanam/enterprise_chatbot_qdrant/tree/main
Originally posted on medium: https://medium.com/@vardhanam.daga/how-to-build-an-enterprise-chatbot-using-qdrant-and-llama-2-d2af666942a4
—
The integration of enterprise chatbots with access to internal company data represents a pivotal advancement in streamlining business operations and enhancing employee productivity. By leveraging AI to navigate and retrieve information from vast internal databases, these chatbots can significantly reduce the time employees spend searching for information, whether it’s sales figures, inventory levels, or project status updates.
This immediate access to relevant data not only accelerates decision-making processes but also fosters a more agile and informed workforce capable of responding quickly to changing business conditions. Moreover, by automating routine inquiries and tasks, such chatbots allow employees to focus on more strategic and creative tasks, thereby boosting overall productivity and innovation within the company.
In an era where data is a critical asset, enterprise chatbots that can efficiently mine and manage this information are becoming an essential tool for companies aiming to maintain a competitive edge.
How Are Enterprise Chatbots Designed?
AI chatbots are built using a sophisticated combination of large language models (LLMs) and vector databases, harnessing the power of advanced artificial intelligence to understand and respond to user queries with high accuracy.
LLMs, such as GPT, Llama 2, and Mistral, are trained on vast datasets to comprehend and generate human-like text, enabling chatbots to process natural language queries and engage in conversations that feel intuitive to users.
To enhance their responsiveness and relevance, these chatbots utilize vector databases, which efficiently store and retrieve high-dimensional data vectors representing text. This setup allows for the quick matching of user queries with the most relevant information or responses by measuring the similarity between vectors.
Together, LLMs and vector databases form the backbone of AI chatbots, enabling them to deliver fast, accurate, and contextually aware interactions, transforming how businesses and consumers communicate.
Key Tools Used for Our Enterprise Chatbot
In this blog, we shall design a Streamlit-based enterprise chatbot powered by Llama 2 and Qdrant. Qdrant is an open-source vector database optimized for similarity search in high-dimensional data, supporting real-time updates and advanced filtering for dynamic AI applications.
Moreover, using Streamlit’s authentication modules, we shall also bake in a user authentication widget into our UI. This way only verified users from within the enterprise will have access to the chatbot. This makes the app secure from unauthorized users, ensuring that sensitive company information remains confidential.
Step-by-Step Implementation of the Code
Here’s the entire code for the enterprise chatbot. You can paste it in a file (e.g. app.py) and then run it by using the command streamlit run app.py.
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
BitsAndBytesConfig,
pipeline
)
import transformers
import torch
import streamlit as st
from langchain.llms import HuggingFacePipeline
import os
import time # Just for simulating a delay
from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Qdrant
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
#function to load the llama2 llm()
import streamlit_authenticator as stauth
import yaml
from yaml.loader import SafeLoader
UPLOAD_DIR = '/home/vardhanam/enterprise_chatbot/uploaded_pdfs'
def save_uploaded_file(uploaded_file):
try:
# Create a directory to save the file if it doesn't exist
# Save the file
with open(os.path.join(UPLOAD_DIR, uploaded_file.name), 'wb') as f:
f.write(uploaded_file.getbuffer())
return True
except Exception as e:
# If there's an error, print the exception
print(e)
return False
def generate_response(query):
return chain.invoke(query)
@st.cache_resource
def load_llm():
#Loading the Llama-2 Model
model_name='NousResearch/Llama-2-7b-chat-hf'
model_config = transformers.AutoConfig.from_pretrained(
model_name,
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
# Activate 4-bit precision base model loading
use_4bit = True
# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"
# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"
# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False
#################################################################
# Set up quantization config
#################################################################
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)
# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
major, _ = torch.cuda.get_device_capability()
if major >= 8:
print("=" * 80)
print("Your GPU supports bfloat16: accelerate training with bf16=True")
print("=" * 80)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
)
# Building a LLM text-generation pipeline
text_generation_pipeline = pipeline(
model=model,
tokenizer=tokenizer,
task="text-generation",
temperature=0.2,
repetition_penalty=1.1,
return_full_text=True,
max_new_tokens=1000,
)
llm = HuggingFacePipeline(pipeline= text_generation_pipeline)
return llm
@st.cache_resource()
def process_document(folder_name):
global text_splitter
# Simulate some document processing delay
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=20,
length_function=len,
is_separator_regex=False,
)
loader = DirectoryLoader(folder_name, loader_cls=PyPDFLoader)
docs = loader.load_and_split(text_splitter=text_splitter)
#Loading the embeddings model
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-mpnet-base-v2"
)
global qdrant_vectorstore
qdrant_vectorstore = Qdrant.from_documents(
docs,
embeddings,
location = ":memory:",
collection_name = "depp_heard_transcripts",
)
qdrant_retriever = qdrant_vectorstore.as_retriever()
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
global chain
chain = (
{"context": qdrant_retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
return chain
with st.spinner("Loading llm"):
llm = load_llm()
with st.spinner("Creating Vector DB"):
chain = process_document(UPLOAD_DIR)
with open('/home/vardhanam/enterprise_chatbot/config.yaml') as file:
config = yaml.load(file, Loader=SafeLoader)
authenticator = stauth.Authenticate(
config['credentials'],
config['cookie']['name'],
config['cookie']['key'],
config['cookie']['expiry_days'],
config['preauthorized']
)
authenticator.login()
if st.session_state["authentication_status"]:
authenticator.logout()
st.write(f'Welcome *{st.session_state["name"]}*')
# Streamlit app starts here
st.title(‘Documents Processing App')
with st.form("Upload Form", clear_on_submit= True):
# Use st.file_uploader to upload multiple files
uploaded_files = st.file_uploader("Upload Document PDF files:", type='pdf', accept_multiple_files=True)
submitted = st.form_submit_button("Submit")
if submitted:
# If files were uploaded, iterate over the list of uploaded files
if uploaded_files is not None:
for uploaded_file in uploaded_files:
# Save each uploaded file to disk
if save_uploaded_file(uploaded_file):
st.success(f"'{uploaded_file.name}' saved successfully!")
else:
st.error(f"Failed to save '{uploaded_file.name}'")
with st.spinner("Refreshing Vector DB"):
process_document.clear()
chain = process_document(UPLOAD_DIR)
uploaded_files = None
# Initialize chat history
if "messages" not in st.session_state:
st.session_state.messages = []
# Display chat messages from history on app rerun
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Accept user input
if prompt := st.chat_input("What would you like to know?"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message in chat message container
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
with st.spinner("Analyzing Query"):
stream = generate_response(prompt)
st.markdown(stream)
st.session_state.messages.append({"role": "assistant", "content": stream})
elif st.session_state["authentication_status"] is False:
st.error('Username/password is incorrect')
elif st.session_state["authentication_status"] is None:
st.warning('Please enter your username and password')
Here’s how the config.yaml file (where you store usernames and passwords of authorized users) will look like. You can tweak it for your use case.
credentials:
usernames:
vardhanam:
email: vardhanam@superteams.ai
name: Vardhanam Daga
password: vardhanam # To be replaced with hashed password
soum:
email: soum@superteams.ai
name: Soum Paul
password: soum # To be replaced with hashed password
debasri:
email: debasri@superteams.ai
name: Debasri Rakshit
password: debasri # To be replaced with hashed password
akriti:
email: akriti@superteams.ai
name: Akriti Upadhyay
password: akriti # To be replaced with hashed password
cookie:
expiry_days: 30
key: random_signature_key # Must be string
name: random_cookie_name
preauthorized:
emails:
- melsby@gmail.com
Let’s break down the code of our app in a step-by-step format.
1. Import Libraries: Essential libraries and modules from transformers
, torch
, streamlit
, streamlit_authenticator
, and yaml
are imported, alongside components for document loading, text splitting, vector storage, embedding, and prompt management from the langchain
library.
2. Upload Directory Setup: A variable UPLOAD_DIR
is defined to specify the directory where uploaded PDF documents are stored.
3. File Upload Function: save_uploaded_file
function is designed to save uploaded files to the specified directory.
4. Generate Response Function: generate_response
is a function that invokes a processing chain to generate responses to user queries.
5. Load Language Model with Caching: The @st.cache_resource
decorator is applied to the load_llm
function, which loads the Llama 2 model and configures it for efficiency with 4-bit quantization. This decorator ensures that the loaded model is cached, reducing load times for subsequent invocations.
6. Document Processing with Caching: Similarly, @st.cache_resource
is used for the process_document
function, which performs document loading, splitting, embedding, and updating the Qdrant vector store. Caching the results of this computationally intensive process improves the application’s responsiveness.
7. Streamlit Authentication: Utilizes streamlit_authenticator
to set up a secure login mechanism based on credentials stored in a config.yaml
file.
8. Streamlit Interface Setup: The user interface is created using Streamlit, starting with login verification. If authentication is successful, the user is greeted and presented with a form for uploading PDF files.
9. File Processing: Upon file submission, uploaded files are saved, and the document processing chain is refreshed to include the new data.
10. Chat Interface: A simple chat interface allows the user to submit queries, which are processed by the generated response function, and responses are displayed in the chat.
11. Session Management: The application manages user sessions to handle authentication states and chat histories, ensuring seamless user experience and secure access control.
Screenshots from Our App
The login screen
File uploading section
The chat interface
(I uploaded some legal documents, files available in the GitHub repo.)
Closing Words
Voila! We have reached the end of this blog post. You are now ready to build your very own enterprise chatbot and deploy it for the perusal of employees in your company. Let me know if you have any questions by commenting below. I hope you enjoyed reading this blog as much as I enjoyed working on it.