构建本地 RAG 应用

先决条件

本指南假设您熟悉以下概念

诸如 llama.cpp、Ollama 和 llamafile 等项目的流行凸显了在本地运行 LLM 的重要性。

本指南将展示如何通过一个提供商 Ollama 在本地（例如，在您的笔记本电脑上）运行 LLaMA 3.1，使用本地嵌入和本地 LLM。但是，您可以设置和交换其他本地提供商，例如 LlamaCPP（如果您愿意）。

注意：本指南使用聊天模型包装器，它负责为您正在使用的特定本地模型格式化输入提示。但是，如果您使用文本输入/文本输出 LLM 包装器直接提示本地模型，则可能需要使用针对您的特定模型的提示。这通常需要包含特殊标记。这是 LLaMA 2 的示例。

设置

首先，我们需要设置 Ollama。

说明在其 GitHub 仓库中提供详细信息，我们在此处总结一下

下载并运行其桌面应用
从命令行，从此选项列表中获取模型。对于本指南，您需要
- 一个通用模型，例如 llama3.1:8b，您可以使用类似 ollama pull llama3.1:8b 的命令获取
- 一个文本嵌入模型，例如 nomic-embed-text，您可以使用类似 ollama pull nomic-embed-text 的命令获取
当应用运行时，所有模型都会自动在 localhost:11434 上提供服务
请注意，您的模型选择将取决于您的硬件能力

接下来，安装本地嵌入、向量存储和推理所需的包。

# Document loading, retrieval methods and text splitting
%pip install -qU langchain langchain_community

# Local vector store via Chroma
%pip install -qU langchain_chroma

# Local inference and embeddings via Ollama
%pip install -qU langchain_ollama

# Web Loader
%pip install -qU beautifulsoup4

您还可以查看此页面以获取可用嵌入模型的完整列表。

文档加载

现在让我们加载并拆分一个示例文档。

我们将使用Lilian Weng关于代理的博文作为示例。

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

API 参考：WebBaseLoader | RecursiveCharacterTextSplitter

接下来，以下步骤将初始化您的向量存储。我们使用nomic-embed-text，但您也可以探索其他提供商或选项。

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

local_embeddings = OllamaEmbeddings(model="nomic-embed-text")

vectorstore = Chroma.from_documents(documents=all_splits, embedding=local_embeddings)

API 参考：OllamaEmbeddings

现在我们拥有了一个可用的向量存储！测试相似性搜索是否正常工作。

question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
len(docs)

docs[0]

Document(metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log"}, page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.')

接下来，设置模型。我们在这里使用带有llama3.1:8b的 Ollama，但您可以探索其他提供商或根据您的硬件设置选择模型选项。

from langchain_ollama import ChatOllama

model = ChatOllama(
    model="llama3.1:8b",
)

API 参考：ChatOllama

测试它以确保您已正确设置所有内容。

response_message = model.invoke(
    "Simulate a rap battle between Stephen Colbert and John Oliver"
)

print(response_message.content)

**The scene is set: a packed arena, the crowd on their feet. In the blue corner, we have Stephen Colbert, aka "The O'Reilly Factor" himself. In the red corner, the challenger, John Oliver. The judges are announced as Tina Fey, Larry Wilmore, and Patton Oswalt. The crowd roars as the two opponents face off.**

**Stephen Colbert (aka "The Truth with a Twist"):**
Yo, I'm the king of satire, the one they all fear
My show's on late, but my jokes are clear
I skewer the politicians, with precision and might
They tremble at my wit, day and night

**John Oliver:**
Hold up, Stevie boy, you may have had your time
But I'm the new kid on the block, with a different prime
Time to wake up from that 90s coma, son
My show's got bite, and my facts are never done

**Stephen Colbert:**
Oh, so you think you're the one, with the "Last Week" crown
But your jokes are stale, like the ones I wore down
I'm the master of absurdity, the lord of the spin
You're just a British import, trying to fit in

**John Oliver:**
Stevie, my friend, you may have been the first
But I've got the skill and the wit, that's never blurred
My show's not afraid, to take on the fray
I'm the one who'll make you think, come what may

**Stephen Colbert:**
Well, it's time for a showdown, like two old friends
Let's see whose satire reigns supreme, till the very end
But I've got a secret, that might just seal your fate
My humor's contagious, and it's already too late!

**John Oliver:**
Bring it on, Stevie! I'm ready for you
I'll take on your jokes, and show them what to do
My sarcasm's sharp, like a scalpel in the night
You're just a relic of the past, without a fight

**The judges deliberate, weighing the rhymes and the flow. Finally, they announce their decision:**

Tina Fey: I've got to go with John Oliver. His jokes were sharper, and his delivery was smoother.

Larry Wilmore: Agreed! But Stephen Colbert's still got that old-school charm.

Patton Oswalt: You know what? It's a tie. Both of them brought the heat!

**The crowd goes wild as both opponents take a bow. The rap battle may be over, but the satire war is just beginning...

在链中使用

我们可以通过传入检索到的文档和简单的提示来创建具有任一模型的摘要链。

它使用提供的输入键值格式化提示模板，并将格式化的字符串传递给指定的模型。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "Summarize the main themes in these retrieved docs: {docs}"
)


# Convert loaded documents into strings by concatenating their content
# and ignoring metadata
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


chain = {"docs": format_docs} | prompt | model | StrOutputParser()

question = "What are the approaches to Task Decomposition?"

docs = vectorstore.similarity_search(question)

chain.invoke(docs)

API 参考：StrOutputParser | ChatPromptTemplate

'The main themes in these documents are:\n\n1. **Task Decomposition**: The process of breaking down complex tasks into smaller, manageable subgoals is crucial for efficient task handling.\n2. **Autonomous Agent System**: A system powered by Large Language Models (LLMs) that can perform planning, reflection, and refinement to improve the quality of final results.\n3. **Challenges in Planning and Decomposition**:\n\t* Long-term planning and task decomposition are challenging for LLMs.\n\t* Adjusting plans when faced with unexpected errors is difficult for LLMs.\n\t* Humans learn from trial and error, making them more robust than LLMs in certain situations.\n\nOverall, the documents highlight the importance of task decomposition and planning in autonomous agent systems powered by LLMs, as well as the challenges that still need to be addressed.'

问答

您还可以使用您的本地模型和向量存储执行问答。这是一个使用简单字符串提示的示例。

from langchain_core.runnables import RunnablePassthrough

RAG_TEMPLATE = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

<context>
{context}
</context>

Answer the following question:

{question}"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

chain = (
    RunnablePassthrough.assign(context=lambda input: format_docs(input["context"]))
    | rag_prompt
    | model
    | StrOutputParser()
)

question = "What are the approaches to Task Decomposition?"

docs = vectorstore.similarity_search(question)

# Run
chain.invoke({"context": docs, "question": question})

API 参考：RunnablePassthrough

'Task decomposition can be done through (1) simple prompting using LLM, (2) task-specific instructions, or (3) human inputs. This approach helps break down large tasks into smaller, manageable subgoals for efficient handling of complex tasks. It enables agents to plan ahead and improve the quality of final results through reflection and refinement.'

带检索的问答

最后，您可以根据用户问题自动从我们的向量存储中检索文档，而不是手动传入文档。

retriever = vectorstore.as_retriever()

qa_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | model
    | StrOutputParser()
)

question = "What are the approaches to Task Decomposition?"

qa_chain.invoke(question)

'Task decomposition can be done through (1) simple prompting in Large Language Models (LLM), (2) using task-specific instructions, or (3) with human inputs. This process involves breaking down large tasks into smaller, manageable subgoals for efficient handling of complex tasks.'

后续步骤

您现在已经了解了如何使用所有本地组件构建 RAG 应用程序。RAG 是一个非常深入的话题，您可能对以下讨论和演示其他技术的指南感兴趣。

视频：使用 LLaMA 3 构建可靠的、完全本地的 RAG 代理，了解使用本地模型进行 RAG 的代理方法。
视频：使用开源的本地 LLM 从头开始构建纠正性 RAG。
关于检索的概念指南，概述了您可以应用以提高性能的各种检索技术。
关于 RAG 的操作指南，更深入地了解 RAG 周围的不同细节。
如何在本地运行模型，了解设置不同提供商的不同方法。

构建本地 RAG 应用

设置

文档加载

在链中使用

问答

带检索的问答

后续步骤

此页面是否有帮助？

您还可以留下详细的反馈在 GitHub 上.

设置​

文档加载​

在链中使用​

问答​

带检索的问答​

后续步骤​

此页面是否有帮助？

您还可以留下详细的反馈 在 GitHub 上.

设置

文档加载

在链中使用

问答

带检索的问答

后续步骤

您还可以留下详细的反馈在 GitHub 上.