从 MapRerankDocumentsChain 迁移
MapRerankDocumentsChain 实现了一种分析长文本的策略。该策略如下:
- 将文本拆分为较小的文档;
- 将一个过程映射到一组文档,其中该过程包括生成一个分数;
- 按分数对结果进行排名并返回最大值。
LangGraph 实现允许将工具调用和其他功能整合到此问题中。下面我们将通过一个简单的示例来说明 MapRerankDocumentsChain
和相应的 LangGraph 实现。
让我们看一个分析一组文档的示例。让我们使用以下 3 个文档
from langchain_core.documents import Document
documents = [
Document(page_content="Alice has blue eyes", metadata={"title": "book_chapter_2"}),
Document(page_content="Bob has brown eyes", metadata={"title": "book_chapter_1"}),
page_content="Charlie has green eyes", metadata={"title": "book_chapter_3"}
API 参考:Document
下面我们展示一个使用 MapRerankDocumentsChain
的实现。我们为问答任务定义了提示模板,并为此实例化一个 LLMChain 对象。我们定义了如何将文档格式化为提示,并确保各个提示中的键保持一致。
from langchain.chains import LLMChain, MapRerankDocumentsChain
from langchain.output_parsers.regex import RegexParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
document_variable_name = "context"
llm = OpenAI()
# The prompt here should take as an input variable the
# `document_variable_name`
# The actual prompt will need to be a lot more complex, this is just
# an example.
prompt_template = (
"What color are Bob's eyes? "
"Output both your answer and a score (1-10) of how confident "
"you are in the format: <Answer>\nScore: <Score>.\n\n"
"Provide no other commentary.\n\n"
"Context: {context}"
output_parser = RegexParser(
regex=r"(.*?)\nScore: (.*)",
output_keys=["answer", "score"],
prompt = PromptTemplate(
llm_chain = LLMChain(llm=llm, prompt=prompt)
chain = MapRerankDocumentsChain(
response = chain.invoke(documents)
/langchain/libs/langchain/langchain/chains/llm.py:369: UserWarning: The apply_and_parse method is deprecated, instead pass an output parser directly to LLMChain.
检查上述运行的 LangSmith 跟踪,我们可以看到三次 LLM 调用——每个文档一次——并且评分机制减轻了幻觉。
下面我们展示此过程的 LangGraph 实现。请注意,我们的模板已简化,因为我们将格式化指令通过 .with_structured_output 方法委托给聊天模型的工具调用功能。
在这里,我们遵循基本的 map-reduce 工作流程来并行执行 LLM 调用。
我们需要安装 langgraph
pip install -qU langgraph
import operator
from typing import Annotated, List, TypedDict
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph
class AnswerWithScore(TypedDict):
answer: str
score: Annotated[int, ..., "Score from 1-10."]
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt_template = "What color are Bob's eyes?\n\n" "Context: {context}"
prompt = ChatPromptTemplate.from_template(prompt_template)
# The below chain formats context from a document into a prompt, then
# generates a response structured according to the AnswerWithScore schema.
map_chain = prompt | llm.with_structured_output(AnswerWithScore)
# Below we define the components that will make up the graph
# This will be the overall state of the graph.
# It will contain the input document contents, corresponding
# answers with scores, and a final answer.
class State(TypedDict):
contents: List[str]
answers_with_scores: Annotated[list, operator.add]
answer: str
# This will be the state of the node that we will "map" all
# documents to in order to generate answers with scores
class MapState(TypedDict):
content: str
# Here we define the logic to map out over the documents
# We will use this an edge in the graph
def map_analyses(state: State):
# We will return a list of `Send` objects
# Each `Send` object consists of the name of a node in the graph
# as well as the state to send to that node
return [
Send("generate_analysis", {"content": content}) for content in state["contents"]
# Here we generate an answer with score, given a document
async def generate_analysis(state: MapState):
response = await map_chain.ainvoke(state["content"])
return {"answers_with_scores": [response]}
# Here we will select the top answer
def pick_top_ranked(state: State):
ranked_answers = sorted(
state["answers_with_scores"], key=lambda x: -int(x["score"])
return {"answer": ranked_answers[0]}
# Construct the graph: here we put everything together to construct our graph
graph = StateGraph(State)
graph.add_node("generate_analysis", generate_analysis)
graph.add_node("pick_top_ranked", pick_top_ranked)
graph.add_conditional_edges(START, map_analyses, ["generate_analysis"])
graph.add_edge("generate_analysis", "pick_top_ranked")
graph.add_edge("pick_top_ranked", END)
app = graph.compile()
from IPython.display import Image
result = await app.ainvoke({"contents": [doc.page_content for doc in documents]})
{'answer': 'Bob has brown eyes.', 'score': 10}
检查上述运行的 LangSmith 跟踪,我们可以看到与之前相同的三次 LLM 调用。使用模型的工具调用功能也使我们能够删除解析步骤。
请参阅这些 操作指南,了解有关使用 RAG 进行问答任务的更多信息。
查看 LangGraph 文档,详细了解如何使用 LangGraph 构建,包括 本指南,了解 LangGraph 中 map-reduce 的详细信息。