从 RefineDocumentsChain 迁移
RefineDocumentsChain 实现了一种分析长文本的策略。该策略如下:
- 将文本分割成较小的文档;
- 将一个过程应用于第一个文档;
- 根据下一个文档改进或更新结果;
- 重复执行文档序列,直到完成。
在此上下文中应用的一个常见过程是摘要,其中在处理长文本块时修改运行摘要。这对于与给定 LLM 的上下文窗口相比很大的文本特别有用。
LangGraph 实现为这个问题带来了一些优势:
循环改进摘要,而 LangGraph 实现允许您逐步执行,以便在需要时进行监控或控制。- LangGraph 实现支持执行步骤和单个令牌的流式传输。
- 因为它是由模块化组件组装而成,因此也很容易扩展或修改(例如,合并工具调用或其他行为)。
下面我们将通过一个简单的示例来说明 RefineDocumentsChain
和相应的 LangGraph 实现。
选择 聊天模型
pip install -qU langchain-openai
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
from langchain_core.documents import Document
documents = [
Document(page_content="Apples are red", metadata={"title": "apple_book"}),
Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
Document(page_content="Bananas are yelow", metadata={"title": "banana_book"}),
API 参考:文档 (Document)
下面展示了使用 RefineDocumentsChain
的实现。我们为初始摘要和后续改进定义提示模板,为这两个目的实例化单独的 LLMChain 对象,并使用这些组件实例化 RefineDocumentsChain
from langchain.chains import LLMChain, RefineDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI
# This controls how each document will be formatted. Specifically,
# it will be passed to `format_document` - see that function for more
# details.
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
document_variable_name = "context"
# The prompt here should take as an input variable the
# `document_variable_name`
summarize_prompt = ChatPromptTemplate(
("human", "Write a concise summary of the following: {context}"),
initial_llm_chain = LLMChain(llm=llm, prompt=summarize_prompt)
initial_response_name = "existing_answer"
# The prompt here should take as an input variable the
# `document_variable_name` as well as `initial_response_name`
refine_template = """
Produce a final summary.
Existing summary up to this point:
New context:
Given the new context, refine the original summary.
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_llm_chain = LLMChain(llm=llm, prompt=refine_prompt)
chain = RefineDocumentsChain(
result = chain.invoke(documents)
'Apples are typically red in color, blueberries are blue, and bananas are yellow.'
LangSmith 跟踪由三个 LLM 调用组成:一个用于初始摘要,另外两个用于更新该摘要。当我们使用最后一个文档的内容更新摘要时,该过程完成。
下面展示了此过程的 LangGraph 实现
- 我们使用与之前相同的两个模板。
- 我们为初始摘要生成一个简单的链,该链提取第一个文档,将其格式化为提示,并使用我们的 LLM 运行推理。
- 我们生成第二个
我们需要安装 langgraph
pip install -qU langgraph
import operator
from typing import List, Literal, TypedDict
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig
from langchain_openai import ChatOpenAI
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Initial summary
summarize_prompt = ChatPromptTemplate(
("human", "Write a concise summary of the following: {context}"),
initial_summary_chain = summarize_prompt | llm | StrOutputParser()
# Refining the summary with new docs
refine_template = """
Produce a final summary.
Existing summary up to this point:
New context:
Given the new context, refine the original summary.
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_summary_chain = refine_prompt | llm | StrOutputParser()
# For LangGraph, we will define the state of the graph to hold the query,
# destination, and final answer.
class State(TypedDict):
contents: List[str]
index: int
summary: str
# We define functions for each node, including a node that generates
# the initial summary:
async def generate_initial_summary(state: State, config: RunnableConfig):
summary = await initial_summary_chain.ainvoke(
return {"summary": summary, "index": 1}
# And a node that refines the summary based on the next document
async def refine_summary(state: State, config: RunnableConfig):
content = state["contents"][state["index"]]
summary = await refine_summary_chain.ainvoke(
{"existing_answer": state["summary"], "context": content},
return {"summary": summary, "index": state["index"] + 1}
# Here we implement logic to either exit the application or refine
# the summary.
def should_refine(state: State) -> Literal["refine_summary", END]:
if state["index"] >= len(state["contents"]):
return END
return "refine_summary"
graph = StateGraph(State)
graph.add_node("generate_initial_summary", generate_initial_summary)
graph.add_node("refine_summary", refine_summary)
graph.add_edge(START, "generate_initial_summary")
graph.add_conditional_edges("generate_initial_summary", should_refine)
graph.add_conditional_edges("refine_summary", should_refine)
app = graph.compile()
from IPython.display import Image
async for step in app.astream(
{"contents": [doc.page_content for doc in documents]},
if summary := step.get("summary"):
Apples are typically red in color.
Apples are typically red in color, while blueberries are blue.
Apples are typically red in color, blueberries are blue, and bananas are yellow.
在 LangSmith 跟踪中,我们再次恢复了三个 LLM 调用,执行与之前相同的功能。
async for event in app.astream_events(
{"contents": [doc.page_content for doc in documents]}, version="v2"
kind = event["event"]
if kind == "on_chat_model_stream":
content = event["data"]["chunk"].content
if content:
print(content, end="|")
elif kind == "on_chat_model_end":
Ap|ples| are| characterized| by| their| red| color|.|
Ap|ples| are| characterized| by| their| red| color|,| while| blueberries| are| known| for| their| blue| hue|.|
Ap|ples| are| characterized| by| their| red| color|,| blueberries| are| known| for| their| blue| hue|,| and| bananas| are| recognized| for| their| yellow| color|.|
有关更多基于 LLM 的摘要策略,请参阅本教程。
请查看 LangGraph 文档,了解有关使用 LangGraph 构建的详细信息。