构建检索增强生成 (RAG) 应用：第二部分

在许多问答应用中，我们希望允许用户进行来回对话，这意味着应用程序需要某种形式的“记忆”来记住过去的问题和答案，以及一些逻辑来将这些内容融入到当前的思考中。

这是多部分教程的第二部分

第一部分介绍了 RAG，并逐步讲解了最简化的实现。
第二部分（本指南）扩展了实现，以适应对话式交互和多步骤检索过程。

这里我们重点关注添加用于整合历史消息的逻辑。 这涉及到聊天记录的管理。

我们将介绍两种方法

链，其中我们最多执行一个检索步骤；
Agents，其中我们赋予 LLM 自主权来执行多个检索步骤。

注意

此处介绍的方法利用了现代聊天模型中的工具调用功能。有关支持工具调用功能的模型表格，请参阅此页面。

对于外部知识来源，我们将使用与 RAG 教程第一部分中相同的 Lilian Weng 的 LLM Powered Autonomous Agents 博客文章。

设置

组件

我们将需要从 LangChain 的集成套件中选择三个组件。

选择聊天模型

pip install -qU "langchain[openai]"

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

选择嵌入模型

pip install -qU langchain-openai

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

选择向量存储

pip install -qU langchain-core

from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

依赖项

此外，我们将使用以下软件包

%%capture --no-stderr
%pip install --upgrade --quiet langgraph langchain-community beautifulsoup4

LangSmith

您使用 LangChain 构建的许多应用程序将包含多个步骤，其中包含对 LLM 调用的多次调用。随着这些应用程序变得越来越复杂，能够检查您的链或 Agent 内部到底发生了什么是至关重要的。最好的方法是使用 LangSmith。

请注意，LangSmith 不是必需的，但它很有帮助。如果您想使用 LangSmith，在您在上面的链接注册后，请确保设置您的环境变量以开始记录追踪信息

os.environ["LANGSMITH_TRACING"] = "true"
if not os.environ.get("LANGSMITH_API_KEY"):
    os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

链

让我们首先回顾一下我们在第一部分中构建的向量存储，它索引了 Lilian Weng 的一篇 LLM Powered Autonomous Agents 博客文章。

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

API 参考：hub | WebBaseLoader | Document | RecursiveCharacterTextSplitter

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

在 RAG 教程的第一部分中，我们将用户输入、检索到的上下文和生成的答案表示为状态中的单独键。对话式体验可以使用一系列消息自然地表示。除了来自用户和助手的信息外，检索到的文档和其他工件可以通过工具消息合并到消息序列中。这促使我们使用消息序列来表示 RAG 应用程序的状态。具体来说，我们将有

用户输入作为 HumanMessage；
向量存储查询作为带有工具调用的 AIMessage；
检索到的文档作为 ToolMessage；
最终响应作为 AIMessage。

这种状态模型非常通用，LangGraph 提供了内置版本以方便使用

from langgraph.graph import MessagesState, StateGraph

graph_builder = StateGraph(MessagesState)

API 参考：StateGraph

利用工具调用与检索步骤交互还有另一个好处，那就是检索的查询是由我们的模型生成的。这在对话设置中尤其重要，在对话设置中，用户查询可能需要根据聊天历史记录进行情境化。例如，考虑以下交流

用户：“什么是任务分解？”

AI：“任务分解涉及将复杂任务分解为更小更简单的步骤，以使 Agent 或模型更容易管理。”

用户：“有哪些常见的方法？”

在这种情况下，模型可以生成诸如 "任务分解的常用方法" 之类的查询。工具调用自然地促进了这一点。正如 RAG 教程的查询分析部分中所述，这允许模型将用户查询重写为更有效的搜索查询。它还支持不涉及检索步骤的直接响应（例如，响应来自用户的通用问候）。

让我们将检索步骤转换为工具

from langchain_core.tools import tool


@tool(response_format="content_and_artifact")
def retrieve(query: str):
    """Retrieve information related to a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\n" f"Content: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

API 参考：tool

有关创建工具的更多详细信息，请参阅本指南。

我们的图将包含三个节点

一个节点，用于处理用户输入，要么生成检索器的查询，要么直接响应；
一个用于检索器工具的节点，用于执行检索步骤；
一个节点，用于使用检索到的上下文生成最终响应。

我们在下面构建它们。请注意，我们利用了另一个预构建的 LangGraph 组件 ToolNode，它执行工具并将结果作为 ToolMessage 添加到状态。

from langchain_core.messages import SystemMessage
from langgraph.prebuilt import ToolNode


# Step 1: Generate an AIMessage that may include a tool-call to be sent.
def query_or_respond(state: MessagesState):
    """Generate tool call for retrieval or respond."""
    llm_with_tools = llm.bind_tools([retrieve])
    response = llm_with_tools.invoke(state["messages"])
    # MessagesState appends messages to state instead of overwriting
    return {"messages": [response]}


# Step 2: Execute the retrieval.
tools = ToolNode([retrieve])


# Step 3: Generate a response using the retrieved content.
def generate(state: MessagesState):
    """Generate answer."""
    # Get generated ToolMessages
    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break
    tool_messages = recent_tool_messages[::-1]

    # Format into prompt
    docs_content = "\n\n".join(doc.content for doc in tool_messages)
    system_message_content = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise."
        "\n\n"
        f"{docs_content}"
    )
    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]
    prompt = [SystemMessage(system_message_content)] + conversation_messages

    # Run
    response = llm.invoke(prompt)
    return {"messages": [response]}

API 参考：SystemMessage | ToolNode

最后，我们将我们的应用程序编译成一个单独的 graph 对象。在本例中，我们只是将步骤连接成一个序列。我们还允许第一个 query_or_respond 步骤“短路”，并在不生成工具调用的情况下直接响应用户。这使我们的应用程序能够支持对话式体验——例如，响应可能不需要检索步骤的通用问候

from langgraph.graph import END
from langgraph.prebuilt import ToolNode, tools_condition

graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond",
    tools_condition,
    {END: END, "tools": "tools"},
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

graph = graph_builder.compile()

API 参考：ToolNode | tools_condition

from IPython.display import Image, display

display(Image(graph.get_graph().draw_mermaid_png()))

让我们测试一下我们的应用程序。

请注意，它可以适当地响应不需要额外检索步骤的消息

input_message = "Hello"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

Hello
==================================[1m Ai Message [0m==================================

Hello! How can I assist you today?

当执行搜索时，我们可以流式传输步骤以观察查询生成、检索和答案生成

input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

What is Task Decomposition?
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_dLjB3rkMoxZZxwUGXi33UBeh)
 Call ID: call_dLjB3rkMoxZZxwUGXi33UBeh
  Args:
    query: Task Decomposition
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
==================================[1m Ai Message [0m==================================

Task Decomposition is the process of breaking down a complicated task into smaller, manageable steps. It often involves techniques like Chain of Thought (CoT), which encourages models to think step by step, enhancing performance on complex tasks. This approach allows for a clearer understanding of the task and aids in structuring the problem-solving process.

查看 LangSmith 追踪此处。

聊天记录的状态管理

注意

本教程的这一部分之前使用了 RunnableWithMessageHistory 抽象。您可以在 v0.2 文档中访问该版本的文档。

截至 LangChain 的 v0.3 版本发布，我们建议 LangChain 用户利用 LangGraph 持久化将 memory 整合到新的 LangChain 应用程序中。

如果您的代码已经依赖于 RunnableWithMessageHistory 或 BaseChatMessageHistory，您无需进行任何更改。我们不计划在不久的将来弃用此功能，因为它适用于简单的聊天应用程序，并且任何使用 RunnableWithMessageHistory 的代码都将继续按预期工作。

有关更多详细信息，请参阅如何迁移到 LangGraph 内存。

在生产环境中，问答应用程序通常会将聊天记录持久化到数据库中，并且能够适当地读取和更新它。

LangGraph 实现了内置的持久化层，使其成为支持多轮对话的聊天应用程序的理想选择。

要管理多轮对话和线程，我们所要做的就是在编译我们的应用程序时指定一个检查点。由于我们图中的节点正在将消息附加到状态，因此我们将在多次调用中保持一致的聊天记录。

LangGraph 配备了一个简单的内存中检查点，我们在下面使用它。有关更多详细信息，包括如何使用不同的持久性后端（例如，SQLite 或 Postgres），请参阅其文档。

有关如何管理消息历史记录的详细步骤，请前往如何添加消息历史记录（记忆）指南。

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

# Specify an ID for the thread
config = {"configurable": {"thread_id": "abc123"}}

API 参考：MemorySaver

我们现在可以像以前一样调用

input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

What is Task Decomposition?
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_JZb6GLD812bW2mQsJ5EJQDnN)
 Call ID: call_JZb6GLD812bW2mQsJ5EJQDnN
  Args:
    query: Task Decomposition
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
==================================[1m Ai Message [0m==================================

Task Decomposition is a technique used to break down complicated tasks into smaller, manageable steps. It involves using methods like Chain of Thought (CoT) prompting, which encourages the model to think step by step, enhancing performance on complex tasks. This process helps to clarify the model's reasoning and makes it easier to tackle difficult problems.

input_message = "Can you look up some common ways of doing it?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    step["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

Can you look up some common ways of doing it?
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_kjRI4Y5cJOiB73yvd7dmb6ux)
 Call ID: call_kjRI4Y5cJOiB73yvd7dmb6ux
  Args:
    query: common methods of task decomposition
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
==================================[1m Ai Message [0m==================================

Common ways of performing Task Decomposition include: (1) using Large Language Models (LLMs) with simple prompts like "Steps for XYZ" or "What are the subgoals for achieving XYZ?", (2) employing task-specific instructions such as "Write a story outline" for specific tasks, and (3) incorporating human inputs to guide the decomposition process.

请注意，模型在第二个问题中生成的查询包含了对话上下文。

LangSmith 追踪在这里特别具有信息量，因为我们可以清楚地看到在每个步骤中哪些消息对我们的聊天模型是可见的。

Agents

Agents 利用 LLM 的推理能力在执行过程中做出决策。使用 Agents 允许您卸载检索过程的额外自主权。尽管它们的行为不如上面的“链”那样可预测，但它们能够执行多个检索步骤来服务于查询，或者迭代单个搜索。

下面我们组装一个最小的 RAG Agent。使用 LangGraph 的预构建 ReAct Agent 构造器，我们可以在一行中完成此操作。

提示

查看 LangGraph 的 Agentic RAG 教程，了解更高级的表述。

from langgraph.prebuilt import create_react_agent

agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory)

API 参考：create_react_agent

让我们检查一下图

display(Image(agent_executor.get_graph().draw_mermaid_png()))

与我们之前的实现的主要区别在于，这里的工具调用循环回到原始 LLM 调用，而不是结束运行的最终生成步骤。然后，模型可以使用检索到的上下文回答问题，或者生成另一个工具调用以获取更多信息。

让我们测试一下。我们构建一个通常需要迭代检索步骤序列才能回答的问题

config = {"configurable": {"thread_id": "def234"}}

input_message = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

for event in agent_executor.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
    config=config,
):
    event["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

What is the standard method for Task Decomposition?

Once you get the answer, look up common extensions of that method.
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_Y3YaIzL71B83Cjqa8d2G0O8N)
 Call ID: call_Y3YaIzL71B83Cjqa8d2G0O8N
  Args:
    query: standard method for Task Decomposition
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_2JntP1x4XQMWwgVpYurE12ff)
 Call ID: call_2JntP1x4XQMWwgVpYurE12ff
  Args:
    query: common extensions of Task Decomposition methods
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
==================================[1m Ai Message [0m==================================

The standard method for task decomposition involves using techniques such as Chain of Thought (CoT), where a model is instructed to "think step by step" to break down complex tasks into smaller, more manageable components. This approach enhances model performance by allowing for more thorough reasoning and planning. Task decomposition can be accomplished through various means, including:

1. Simple prompting (e.g., asking for steps to achieve a goal).
2. Task-specific instructions (e.g., asking for a story outline).
3. Human inputs to guide the decomposition process.

### Common Extensions of Task Decomposition Methods:

1. **Tree of Thoughts**: This extension builds on CoT by not only decomposing the problem into thought steps but also generating multiple thoughts at each step, creating a tree structure. The search process can employ breadth-first search (BFS) or depth-first search (DFS), with each state evaluated by a classifier or through majority voting.

These extensions aim to enhance reasoning capabilities and improve the effectiveness of task decomposition in various contexts.

请注意，Agent

生成查询以搜索任务分解的标准方法；
接收到答案后，生成第二个查询以搜索它的常用扩展；
在收到所有必要的上下文后，回答问题。

我们可以在 LangSmith 追踪中看到完整的步骤序列，以及延迟和其他元数据。

下一步

我们已经介绍了构建基本对话式问答应用程序的步骤

我们使用链来构建可预测的应用程序，该应用程序每个用户输入最多生成一个查询；
我们使用 Agents 构建一个可以迭代查询序列的应用程序。

要探索不同类型的检索器和检索策略，请访问操作指南的检索器部分。

有关 LangChain 对话记忆抽象的详细步骤，请访问如何添加消息历史记录（记忆）指南。

要了解有关 Agents 的更多信息，请查看概念指南和 LangGraph agent 架构页面。

设置​

组件​

依赖项​

LangSmith​

链​

聊天记录的状态管理​

Agents​

下一步​

此页面是否对您有帮助？

设置

组件

依赖项

LangSmith

链

聊天记录的状态管理

Agents

下一步