如何修剪消息
所有模型都有有限的上下文窗口,这意味着它们可以作为输入接收的令牌数量有限。如果您的消息非常长或链/代理积累了很长的消息历史,您需要管理传递给模型的这些消息的长度。
trim_messages 可用于将聊天历史记录的大小减小到指定的令牌数或指定的消息数。
如果将截断的聊天历史记录直接传回聊天模型,则截断的聊天历史记录应满足以下属性:
-
生成的聊天历史记录应是有效的。通常这意味着应满足以下属性:
- 聊天历史记录开头要么是 (1)
HumanMessage
,要么是 (2) SystemMessage 后跟一个HumanMessage
。 - 聊天历史记录结尾要么是
HumanMessage
,要么是ToolMessage
。 ToolMessage
只能出现在涉及工具调用的AIMessage
之后。
这可以通过设置
start_on="human"
和ends_on=("human", "tool")
来实现。 - 聊天历史记录开头要么是 (1)
-
它包含最近的消息并删除聊天历史记录中的旧消息。这可以通过设置
strategy="last"
来实现。 -
通常,如果原始聊天历史记录中存在
SystemMessage
,则新的聊天历史记录应包含它,因为SystemMessage
包含给聊天模型的特殊指令。如果存在,SystemMessage
几乎总是历史记录中的第一条消息。这可以通过设置include_system=True
来实现。
根据令牌计数截断
在这里,我们将根据令牌计数截断聊天历史记录。截断的聊天历史记录将生成一个包含 SystemMessage
的有效聊天历史记录。
为了保留最近的消息,我们设置 strategy="last"
。我们还将设置 include_system=True
以包含 SystemMessage
,并设置 start_on="human"
以确保生成的聊天历史记录有效。
这是使用基于令牌计数的 trim_messages
时的良好默认配置。请记住根据您的用例调整 token_counter
和 max_tokens
。
请注意,对于我们的 token_counter
,我们可以传入一个函数(下面将详细介绍)或一个语言模型(因为语言模型具有消息令牌计数方法)。当您截断消息以适应特定模型的上下文窗口时,传入模型是有意义的。
pip install -qU langchain-openai
from langchain_core.messages import (
AIMessage,
HumanMessage,
SystemMessage,
ToolMessage,
trim_messages,
)
from langchain_core.messages.utils import count_tokens_approximately
messages = [
SystemMessage("you're a good assistant, you always respond with a joke."),
HumanMessage("i wonder why it's called langchain"),
AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
HumanMessage("and who is harrison chasing anyways"),
AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
HumanMessage("what do you call a speechless parrot"),
]
trim_messages(
messages,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# Remember to adjust based on your model
# or else pass a custom token_counter
token_counter=count_tokens_approximately,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
# Remember to adjust based on the desired conversation
# length
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
allow_partial=False,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
根据消息计数截断
或者,我们可以通过设置 token_counter=len
来根据消息计数截断聊天历史记录。在这种情况下,每条消息将算作一个令牌,max_tokens
将控制最大消息数。
这是使用基于消息计数的 trim_messages
时的良好默认配置。请记住根据您的用例调整 max_tokens
。
trim_messages(
messages,
# Keep the last <= n_count tokens of the messages.
strategy="last",
token_counter=len,
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=5,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='and who is harrison chasing anyways', additional_kwargs={}, response_metadata={}),
AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
高级用法
您可以将 trim_messages
用作构建块,以创建更复杂的处理逻辑。
如果我们想允许拆分消息内容,我们可以指定 allow_partial=True
trim_messages(
messages,
max_tokens=56,
strategy="last",
token_counter=count_tokens_approximately,
include_system=True,
allow_partial=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
AIMessage(content="\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
默认情况下,将不包含 SystemMessage
,因此您可以通过设置 include_system=False
或删除 include_system
参数来删除它。
trim_messages(
messages,
max_tokens=45,
strategy="last",
token_counter=count_tokens_approximately,
)
[AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
我们可以通过指定 strategy="first"
来执行获取前 max_tokens
的相反操作。
trim_messages(
messages,
max_tokens=45,
strategy="first",
token_counter=count_tokens_approximately,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content="i wonder why it's called langchain", additional_kwargs={}, response_metadata={})]
使用 ChatModel
作为令牌计数器
您可以将 ChatModel 作为令牌计数器传入。这将使用 ChatModel.get_num_tokens_from_messages
。让我们演示如何在 OpenAI 中使用它。
from langchain_openai import ChatOpenAI
trim_messages(
messages,
max_tokens=45,
strategy="first",
token_counter=ChatOpenAI(model="gpt-4o"),
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content="i wonder why it's called langchain", additional_kwargs={}, response_metadata={})]
编写自定义令牌计数器
我们可以编写一个自定义令牌计数器函数,它接受一个消息列表并返回一个整数。
pip install -qU tiktoken
from typing import List
import tiktoken
from langchain_core.messages import BaseMessage, ToolMessage
def str_token_counter(text: str) -> int:
enc = tiktoken.get_encoding("o200k_base")
return len(enc.encode(text))
def tiktoken_counter(messages: List[BaseMessage]) -> int:
"""Approximately reproduce https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
For simplicity only supports str Message.contents.
"""
num_tokens = 3 # every reply is primed with <|start|>assistant<|message|>
tokens_per_message = 3
tokens_per_name = 1
for msg in messages:
if isinstance(msg, HumanMessage):
role = "user"
elif isinstance(msg, AIMessage):
role = "assistant"
elif isinstance(msg, ToolMessage):
role = "tool"
elif isinstance(msg, SystemMessage):
role = "system"
else:
raise ValueError(f"Unsupported messages type {msg.__class__}")
num_tokens += (
tokens_per_message
+ str_token_counter(role)
+ str_token_counter(msg.content)
)
if msg.name:
num_tokens += tokens_per_name + str_token_counter(msg.name)
return num_tokens
trim_messages(
messages,
token_counter=tiktoken_counter,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
链式调用
trim_messages
可以命令式(如上所示)或声明式地使用,使其易于与其他链中的组件组合。
llm = ChatOpenAI(model="gpt-4o")
# Notice we don't pass in messages. This creates
# a RunnableLambda that takes messages as input
trimmer = trim_messages(
token_counter=llm,
# Keep the last <= n_count tokens of the messages.
strategy="last",
# When token_counter=len, each message
# will be counted as a single token.
# Remember to adjust for your use case
max_tokens=45,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
start_on="human",
# Most chat models expect that chat history ends with either:
# (1) a HumanMessage or
# (2) a ToolMessage
end_on=("human", "tool"),
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
)
chain = trimmer | llm
chain.invoke(messages)
AIMessage(content='A "polly-no-wanna-cracker"!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 32, 'total_tokens': 43, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_90d33c15d4', 'finish_reason': 'stop', 'logprobs': None}, id='run-b1f8b63b-6bc2-4df4-b3b9-dfc4e3e675fe-0', usage_metadata={'input_tokens': 32, 'output_tokens': 11, 'total_tokens': 43, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})
查看 LangSmith 跟踪,我们可以看到在消息传递给模型之前,它们首先被截断:https://smith.langchain.com/public/65af12c4-c24d-4824-90f0-6547566e59bb/r
只看截断器,我们可以看到它是一个 Runnable 对象,可以像所有 Runnables 一样被调用。
trimmer.invoke(messages)
[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
HumanMessage(content='what do you call a speechless parrot', additional_kwargs={}, response_metadata={})]
与 ChatMessageHistory 一起使用
当处理聊天历史时,截断消息特别有用,因为聊天历史可以任意长。
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
chat_history = InMemoryChatMessageHistory(messages=messages[:-1])
def dummy_get_session_history(session_id):
if session_id != "1":
return InMemoryChatMessageHistory()
return chat_history
trimmer = trim_messages(
max_tokens=45,
strategy="last",
token_counter=llm,
# Usually, we want to keep the SystemMessage
# if it's present in the original history.
# The SystemMessage has special instructions for the model.
include_system=True,
# Most chat models expect that chat history starts with either:
# (1) a HumanMessage or
# (2) a SystemMessage followed by a HumanMessage
# start_on="human" makes sure we produce a valid chat history
start_on="human",
)
chain = trimmer | llm
chain_with_history = RunnableWithMessageHistory(chain, dummy_get_session_history)
chain_with_history.invoke(
[HumanMessage("what do you call a speechless parrot")],
config={"configurable": {"session_id": "1"}},
)
AIMessage(content='A "polygon"!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 32, 'total_tokens': 36, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_c17d3befe7', 'finish_reason': 'stop', 'logprobs': None}, id='run-71d9fce6-bb0c-4bb3-acc8-d5eaee6ae7bc-0', usage_metadata={'input_tokens': 32, 'output_tokens': 4, 'total_tokens': 36})
查看 LangSmith 跟踪,我们可以看到我们检索了所有消息,但在消息传递给模型之前,它们被截断为只有系统消息和最后一条人类消息:https://smith.langchain.com/public/17dd700b-9994-44ca-930c-116e00997315/r
API 参考
有关所有参数的完整描述,请参阅 API 参考:https://python.langchain.ac.cn/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html