构建检索增强生成 (RAG) 应用
LLM 启用的一种最强大的应用是复杂的问答 (Q&A) 聊天机器人。这些应用可以回答有关特定源信息的问题。这些应用使用一种称为检索增强生成或 RAG 的技术。
本教程将展示如何在文本数据源上构建一个简单的问答应用。在此过程中,我们将介绍典型的问答架构,并重点介绍用于更高级问答技术的其他资源。我们还将了解 LangSmith 如何帮助我们跟踪和理解我们的应用。随着应用复杂性的增加,LangSmith 将变得越来越有用。
如果您已经熟悉基本的检索,您可能也会对这篇 关于不同检索技术的概述 感兴趣。
什么是 RAG?
RAG 是一种使用额外数据增强 LLM 知识的技术。
LLM 能够推理各种主题,但它们的知识仅限于其训练时截止到特定时间点的公开数据。如果您想要构建能够推理私有数据或模型截止日期后引入的数据的 AI 应用,则需要使用特定信息来增强模型的知识。获取适当的信息并将其插入模型提示的过程称为检索增强生成 (RAG)。
LangChain 拥有许多旨在帮助构建问答应用以及更一般的 RAG 应用的组件。
**注意**:这里我们关注的是非结构化数据的问答。如果您对结构化数据的 RAG 感兴趣,请查看我们关于在 SQL 数据上进行问答 的教程。
概念
一个典型的 RAG 应用程序有两个主要组件
索引:用于从源获取数据并对其进行索引的管道。这通常在离线时发生。
检索和生成:实际的 RAG 链,它在运行时获取用户查询并从索引中检索相关数据,然后将其传递给模型。
从原始数据到答案的最常见的完整序列如下所示
索引
- 加载:首先我们需要加载我们的数据。这是使用文档加载器完成的。
- 分割:文本分割器将大型
文档
分解成较小的块。这对于索引数据和将其传递给模型都很有用,因为大型块更难以搜索并且不适合模型的有限上下文窗口。 - 存储:我们需要一个地方来存储和索引我们的分割,以便以后可以对其进行搜索。这通常使用向量存储和嵌入模型来完成。
检索和生成
设置
Jupyter Notebook
本指南(以及文档中的大多数其他指南)使用Jupyter notebook,并假设读者也使用它。Jupyter notebook非常适合学习如何使用 LLM 系统,因为很多时候事情可能会出错(意外输出、API 宕机等),并且在交互式环境中浏览指南是更好地理解它们的好方法。
本教程和其他教程可能最方便地在 Jupyter notebook 中运行。有关安装说明,请参见此处。
安装
本教程需要以下 langchain 依赖项
- Pip
- Conda
%pip install --quiet --upgrade langchain langchain-community langchain-chroma
conda install langchain langchain-community langchain-chroma -c conda-forge
有关更多详细信息,请参阅我们的安装指南。
LangSmith
您使用 LangChain 构建的许多应用程序都将包含多个步骤,以及对 LLM 调用进行多次调用。随着这些应用程序变得越来越复杂,能够检查链或代理内部到底发生了什么变得至关重要。最好的方法是使用LangSmith。
在您通过以上链接注册后,请确保设置您的环境变量以开始记录跟踪
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="..."
或者,如果在 notebook 中,您可以使用以下命令设置它们
import getpass
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()
预览
在本指南中,我们将构建一个应用程序,用于回答有关网站内容的问题。我们将使用的特定网站是 Lilian Weng 的LLM 支持的自主代理博客文章,这使我们能够询问有关文章内容的问题。
我们可以创建一个简单的索引管道和 RAG 链,在约 20 行代码中完成此操作
- OpenAI
- Anthropic
- Azure
- Cohere
- NVIDIA
- FireworksAI
- Groq
- MistralAI
- TogetherAI
pip install -qU langchain-openai
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass()
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
pip install -qU langchain-anthropic
import getpass
import os
os.environ["ANTHROPIC_API_KEY"] = getpass.getpass()
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-3-5-sonnet-20240620")
pip install -qU langchain-openai
import getpass
import os
os.environ["AZURE_OPENAI_API_KEY"] = getpass.getpass()
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)
pip install -qU langchain-google-vertexai
import getpass
import os
os.environ["GOOGLE_API_KEY"] = getpass.getpass()
from langchain_google_vertexai import ChatVertexAI
llm = ChatVertexAI(model="gemini-1.5-flash")
pip install -qU langchain-cohere
import getpass
import os
os.environ["COHERE_API_KEY"] = getpass.getpass()
from langchain_cohere import ChatCohere
llm = ChatCohere(model="command-r-plus")
pip install -qU langchain-nvidia-ai-endpoints
import getpass
import os
os.environ["NVIDIA_API_KEY"] = getpass.getpass()
from langchain import ChatNVIDIA
llm = ChatNVIDIA(model="meta/llama3-70b-instruct")
pip install -qU langchain-fireworks
import getpass
import os
os.environ["FIREWORKS_API_KEY"] = getpass.getpass()
from langchain_fireworks import ChatFireworks
llm = ChatFireworks(model="accounts/fireworks/models/llama-v3p1-70b-instruct")
pip install -qU langchain-groq
import getpass
import os
os.environ["GROQ_API_KEY"] = getpass.getpass()
from langchain_groq import ChatGroq
llm = ChatGroq(model="llama3-8b-8192")
pip install -qU langchain-mistralai
import getpass
import os
os.environ["MISTRAL_API_KEY"] = getpass.getpass()
from langchain_mistralai import ChatMistralAI
llm = ChatMistralAI(model="mistral-large-latest")
pip install -qU langchain-openai
import getpass
import os
os.environ["TOGETHER_API_KEY"] = getpass.getpass()
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://api.together.xyz/v1",
api_key=os.environ["TOGETHER_API_KEY"],
model="mistralai/Mixtral-8x7B-Instruct-v0.1",
)
import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
rag_chain.invoke("What is Task Decomposition?")
'Task Decomposition is a process where a complex task is broken down into smaller, simpler steps or subtasks. This technique is utilized to enhance model performance on complex tasks by making them more manageable. It can be done by using language models with simple prompting, task-specific instructions, or with human inputs.'
# cleanup
vectorstore.delete_collection()
查看LangSmith 跟踪。
详细演练
让我们逐步浏览以上代码,以真正理解正在发生的事情。
1. 索引:加载
我们首先需要加载博客文章内容。我们可以为此使用文档加载器,它们是从源加载数据并返回文档列表的对象。文档
是一个包含一些page_content
(str)和metadata
(dict)的对象。
在本例中,我们将使用WebBaseLoader,它使用urllib
从网络 URL 加载 HTML,并使用BeautifulSoup
将其解析为文本。我们可以通过将参数传递给BeautifulSoup
解析器(通过bs_kwargs
)(请参阅BeautifulSoup 文档)来自定义 HTML -> 文本解析。在本例中,只有类为“post-content”、“post-title”或“post-header”的 HTML 标签是相关的,因此我们将删除所有其他标签。
import bs4
from langchain_community.document_loaders import WebBaseLoader
# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()
len(docs[0].page_content)
43131
print(docs[0].page_content[:500])
LLM Powered Autonomous Agents
Date: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng
Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In
深入了解
文档加载器
:将数据从源加载为文档
列表的对象。
2. 索引:分割
我们加载的文档超过 42k 个字符。这太长了,无法放入许多模型的上下文窗口中。即使对于那些可以在其上下文窗口中容纳完整文章的模型,模型也可能难以在非常长的输入中查找信息。
为了处理这个问题,我们将文档
分割成块以进行嵌入和向量存储。这应该有助于我们在运行时仅检索博客文章中最相关的部分。
在本例中,我们将文档分割成 1000 个字符的块,块之间有 200 个字符的重叠。重叠有助于减轻将语句与其相关的重要上下文分开的可能性。我们使用递归字符文本分割器,它将使用常见的分割符(如换行符)递归地分割文档,直到每个块都达到适当的大小。这是通用文本用例的推荐文本分割器。
我们设置add_start_index=True
,以便每个分割文档在初始文档中开始的字符索引作为元数据属性“start_index”保留。
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)
len(all_splits)
66
len(all_splits[0].page_content)
969
all_splits[10].metadata
{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
'start_index': 7056}
深入了解
文本分割器
:将文档
列表分割成较小块的对象。文档转换器
的子类。
- 通过阅读操作指南,了解如何使用不同的方法分割文本
- 代码(py 或 js)
- 科学论文
- 接口:基本接口的 API 参考。
文档转换器
:对文档
对象列表执行转换的对象。
3. 索引:存储
现在我们需要索引我们的 66 个文本块,以便我们可以在运行时搜索它们。最常见的方法是嵌入每个文档分割的内容并将这些嵌入插入向量数据库(或向量存储)。当我们想要搜索我们的分割时,我们获取一个文本搜索查询,对其进行嵌入,并执行某种“相似性”搜索以识别与我们的查询嵌入最相似的存储分割。最简单的相似性度量是余弦相似性——我们测量每对嵌入(它们是高维向量)之间角度的余弦。
我们可以使用Chroma向量存储和OpenAIEmbeddings模型在一个命令中嵌入和存储所有文档分割。
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())
深入了解
嵌入
:文本嵌入模型的包装器,用于将文本转换为嵌入。
向量存储
:向量数据库的包装器,用于存储和查询嵌入。
这完成了管道中的索引部分。此时,我们有一个可查询的向量存储,其中包含博客文章的块内容。给定一个用户问题,我们应该能够理想地返回回答该问题的博客文章的片段。
4. 检索和生成:检索
现在让我们编写实际的应用程序逻辑。我们希望创建一个简单的应用程序,该应用程序接收用户问题,搜索与该问题相关的文档,将检索到的文档和初始问题传递给模型,并返回答案。
首先,我们需要定义搜索文档的逻辑。LangChain 定义了一个检索器接口,该接口包装了一个索引,该索引可以在给定字符串查询的情况下返回相关的文档
。
最常见的检索器
类型是向量存储检索器,它使用向量存储的相似性搜索功能来促进检索。任何向量存储
都可以轻松地使用VectorStore.as_retriever()
转换为检索器
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")
len(retrieved_docs)
6
print(retrieved_docs[0].page_content)
Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
深入了解
向量存储通常用于检索,但也有其他检索方法。
Retriever
:一个根据文本查询返回Document
的对象
- 文档:有关接口和内置检索技术的更多文档。其中包括
MultiQueryRetriever
生成输入问题的变体 以提高检索命中率。MultiVectorRetriever
则生成嵌入的变体,同样是为了提高检索命中率。最大边际相关性
从检索到的文档中选择相关性和多样性,以避免传递重复的上下文。- 可以使用元数据过滤器(例如使用Self Query Retriever)在向量存储检索期间过滤文档。
- 集成:与检索服务的集成。
- 接口:基本接口的 API 参考。
5. 检索和生成:生成
让我们将所有内容组合成一个链,该链接收一个问题,检索相关文档,构建提示,将其传递给模型,并解析输出。
我们将使用 gpt-4o-mini OpenAI 聊天模型,但可以替换任何 LangChain 的LLM
或ChatModel
。
- OpenAI
- Anthropic
- Azure
- Cohere
- NVIDIA
- FireworksAI
- Groq
- MistralAI
- TogetherAI
pip install -qU langchain-openai
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass()
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
pip install -qU langchain-anthropic
import getpass
import os
os.environ["ANTHROPIC_API_KEY"] = getpass.getpass()
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-3-sonnet-20240229", temperature=0.2, max_tokens=1024)
pip install -qU langchain-openai
import getpass
import os
os.environ["AZURE_OPENAI_API_KEY"] = getpass.getpass()
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)
pip install -qU langchain-google-vertexai
import getpass
import os
os.environ["GOOGLE_API_KEY"] = getpass.getpass()
from langchain_google_vertexai import ChatVertexAI
llm = ChatVertexAI(model="gemini-1.5-flash")
pip install -qU langchain-cohere
import getpass
import os
os.environ["COHERE_API_KEY"] = getpass.getpass()
from langchain_cohere import ChatCohere
llm = ChatCohere(model="command-r-plus")
pip install -qU langchain-nvidia-ai-endpoints
import getpass
import os
os.environ["NVIDIA_API_KEY"] = getpass.getpass()
from langchain import ChatNVIDIA
llm = ChatNVIDIA(model="meta/llama3-70b-instruct")
pip install -qU langchain-fireworks
import getpass
import os
os.environ["FIREWORKS_API_KEY"] = getpass.getpass()
from langchain_fireworks import ChatFireworks
llm = ChatFireworks(model="accounts/fireworks/models/llama-v3p1-70b-instruct")
pip install -qU langchain-groq
import getpass
import os
os.environ["GROQ_API_KEY"] = getpass.getpass()
from langchain_groq import ChatGroq
llm = ChatGroq(model="llama3-8b-8192")
pip install -qU langchain-mistralai
import getpass
import os
os.environ["MISTRAL_API_KEY"] = getpass.getpass()
from langchain_mistralai import ChatMistralAI
llm = ChatMistralAI(model="mistral-large-latest")
pip install -qU langchain-openai
import getpass
import os
os.environ["TOGETHER_API_KEY"] = getpass.getpass()
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://api.together.xyz/v1",
api_key=os.environ["TOGETHER_API_KEY"],
model="mistralai/Mixtral-8x7B-Instruct-v0.1",
)
我们将使用一个用于 RAG 的提示,该提示已检入 LangChain 提示中心(此处)。
from langchain import hub
prompt = hub.pull("rlm/rag-prompt")
example_messages = prompt.invoke(
{"context": "filler context", "question": "filler question"}
).to_messages()
example_messages
[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:")]
print(example_messages[0].content)
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question
Context: filler context
Answer:
我们将使用LCEL 可运行协议来定义链,这使我们能够
- 以透明的方式将组件和函数连接在一起
- 在 LangSmith 中自动跟踪我们的链
- 开箱即用地获得流式、异步和批处理调用。
以下是实现
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
for chunk in rag_chain.stream("What is Task Decomposition?"):
print(chunk, end="", flush=True)
Task Decomposition is a process where a complex task is broken down into smaller, more manageable steps or parts. This is often done using techniques like "Chain of Thought" or "Tree of Thoughts", which instruct a model to "think step by step" and transform large tasks into multiple simple tasks. Task decomposition can be prompted in a model, guided by task-specific instructions, or influenced by human inputs.
让我们剖析 LCEL 以了解正在发生的事情。
首先:这些组件(retriever
、prompt
、llm
等)都是Runnable 的实例。这意味着它们实现了相同的方法——例如同步和异步.invoke
、.stream
或.batch
——这使得它们更容易连接在一起。它们可以通过|
运算符连接到RunnableSequence(另一个 Runnable)中。
当遇到|
运算符时,LangChain 会自动将某些对象转换为可运行对象。在这里,format_docs
被转换为RunnableLambda,并且包含"context"
和"question"
的字典被转换为RunnableParallel。细节不如要点重要,要点是每个对象都是一个 Runnable。
让我们跟踪输入问题如何在上述可运行对象中流动。
如上所述,prompt
的输入预计是一个具有键"context"
和"question"
的字典。因此,此链的第一个元素构建了将根据输入问题计算这两个键的可运行对象
retriever | format_docs
将问题通过检索器传递,生成Document对象,然后传递到format_docs
以生成字符串;RunnablePassthrough()
不更改地传递输入问题。
也就是说,如果您构建了
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
)
然后chain.invoke(question)
将构建一个格式化的提示,准备进行推理。(注意:在使用 LCEL 进行开发时,使用这样的子链进行测试可能很实用。)
链的最后一步是llm
,它运行推理,以及StrOutputParser()
,它只是从 LLM 的输出消息中提取字符串内容。
您可以通过其LangSmith 跟踪分析此链的各个步骤。
内置链
如果需要,LangChain 包含实现上述 LCEL 的便捷函数。我们组合两个函数
- create_stuff_documents_chain 指定如何将检索到的上下文馈送到提示和 LLM 中。在这种情况下,我们将“填充”内容到提示中——即,我们将包含所有检索到的上下文,而无需任何摘要或其他处理。它主要实现了我们上面的
rag_chain
,输入键为context
和input
——它使用检索到的上下文和查询生成答案。 - create_retrieval_chain 添加了检索步骤并将检索到的上下文传播到链中,将其与最终答案一起提供。它具有输入键
input
,并在其输出中包含input
、context
和answer
。
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
system_prompt = (
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know. Use three sentences maximum and keep the "
"answer concise."
"\n\n"
"{context}"
)
prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt),
("human", "{input}"),
]
)
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
response = rag_chain.invoke({"input": "What is Task Decomposition?"})
print(response["answer"])
Task Decomposition is a process in which complex tasks are broken down into smaller and simpler steps. Techniques like Chain of Thought (CoT) and Tree of Thoughts are used to enhance model performance on these tasks. The CoT method instructs the model to think step by step, decomposing hard tasks into manageable ones, while Tree of Thoughts extends CoT by exploring multiple reasoning possibilities at each step, creating a tree structure of thoughts.
返回来源
在问答应用程序中,通常需要向用户显示用于生成答案的来源。LangChain 的内置create_retrieval_chain
将检索到的源文档传播到"context"
键中的输出
for document in response["context"]:
print(document)
print()
page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 1585}
page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 2192}
page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'start_index': 29630}
深入了解
选择模型
ChatModel
:一个基于 LLM 的聊天模型。接收一系列消息并返回一条消息。
LLM
:一个文本输入文本输出的 LLM。接收一个字符串并返回一个字符串。
请参阅有关使用本地运行模型的 RAG 的指南此处。
自定义提示
如上所示,我们可以从提示中心加载提示(例如此 RAG 提示)。提示也可以轻松自定义
from langchain_core.prompts import PromptTemplate
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| custom_rag_prompt
| llm
| StrOutputParser()
)
rag_chain.invoke("What is Task Decomposition?")
'Task decomposition is the process of breaking down a complex task into smaller, more manageable parts. Techniques like Chain of Thought (CoT) and Tree of Thoughts allow an agent to "think step by step" and explore multiple reasoning possibilities, respectively. This process can be executed by a Language Model with simple prompts, task-specific instructions, or human inputs. Thanks for asking!'
后续步骤
我们已经介绍了在数据上构建基本问答应用程序的步骤
在上述每个部分中,都有许多功能、集成和扩展可以探索。除了上面提到的深入了解来源外,良好的后续步骤还包括
- 返回来源:了解如何返回源文档
- 流式传输:了解如何流式传输输出和中间步骤
- 添加聊天历史记录:了解如何将聊天历史记录添加到您的应用程序中
- 检索概念指南:特定检索技术的概述
- 构建本地 RAG 应用程序:使用所有本地组件创建类似于上面应用程序的应用程序