如何创建自定义聊天模型类

先决条件

本指南假定您熟悉以下概念

聊天模型

在本指南中，我们将学习如何使用 LangChain 抽象来创建自定义聊天模型。

使用标准 BaseChatModel 接口封装您的 LLM，可以最大限度地减少代码修改，从而在现有 LangChain 程序中使用您的 LLM！

此外，您的 LLM 将自动成为 LangChain Runnable，并能即刻受益于一些优化（例如，通过线程池进行批处理）、异步支持、astream_events API 等。

输入和输出

首先，我们需要讨论**消息**，它们是聊天模型的输入和输出。

消息

聊天模型接收消息作为输入，并返回消息作为输出。

LangChain 有几种内置消息类型

消息类型	描述
`SystemMessage`	用于预设 AI 行为，通常作为输入消息序列中的第一条消息传入。
`HumanMessage`	表示与聊天模型交互的人发出的消息。
`AIMessage`	表示来自聊天模型的消息。这可以是文本，也可以是调用工具的请求。
`FunctionMessage` / `ToolMessage`	用于将工具调用结果传回模型的消息。
`AIMessageChunk` / `HumanMessageChunk` / ...	每种消息类型的分块变体。

注意

ToolMessage 和 FunctionMessage 严格遵循 OpenAI 的 function 和 tool 角色。

这是一个快速发展的领域，随着更多模型增加函数调用能力，预计此架构将会有所增加。

from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    FunctionMessage,
    HumanMessage,
    SystemMessage,
    ToolMessage,
)

流式变体

所有聊天消息都有一个流式变体，其名称中包含 Chunk。

from langchain_core.messages import (
    AIMessageChunk,
    FunctionMessageChunk,
    HumanMessageChunk,
    SystemMessageChunk,
    ToolMessageChunk,
)

API 参考:AIMessageChunk | FunctionMessageChunk | HumanMessageChunk | SystemMessageChunk | ToolMessageChunk

这些分块在从聊天模型流式传输输出时使用，并且它们都定义了可加性！

AIMessageChunk(content="Hello") + AIMessageChunk(content=" World!")

AIMessageChunk(content='Hello World!')

基础聊天模型

让我们实现一个聊天模型，它将提示中最后一条消息的前 n 个字符回显回来！

为此，我们将继承自 BaseChatModel，并且需要实现以下内容：

方法/属性	描述	必选/可选
`_generate`	用于从提示生成聊天结果	必选
`_llm_type` (属性)	用于唯一标识模型类型。用于日志记录。	必选
`_identifying_params` (属性)	表示用于追踪目的的模型参数化。	可选
`_stream`	用于实现流式传输。	可选
`_agenerate`	用于实现原生的异步方法。	可选
`_astream`	用于实现 `_stream` 的异步版本。	可选

提示

如果实现了 _stream，_astream 实现会使用 run_in_executor 在单独的线程中启动同步的 _stream；否则，它会回退到使用 _agenerate。

如果您想重用 _stream 实现，可以使用这个技巧，但如果您能够实现原生异步代码，那将是更好的解决方案，因为该代码运行开销更小。

实现

from typing import Any, Dict, Iterator, List, Optional

from langchain_core.callbacks import (
    CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import (
    AIMessage,
    AIMessageChunk,
    BaseMessage,
)
from langchain_core.messages.ai import UsageMetadata
from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
from pydantic import Field


class ChatParrotLink(BaseChatModel):
    """A custom chat model that echoes the first `parrot_buffer_length` characters
    of the input.

    When contributing an implementation to LangChain, carefully document
    the model including the initialization parameters, include
    an example of how to initialize the model and include any relevant
    links to the underlying models documentation or API.

    Example:

        .. code-block:: python

            model = ChatParrotLink(parrot_buffer_length=2, model="bird-brain-001")
            result = model.invoke([HumanMessage(content="hello")])
            result = model.batch([[HumanMessage(content="hello")],
                                 [HumanMessage(content="world")]])
    """

    model_name: str = Field(alias="model")
    """The name of the model"""
    parrot_buffer_length: int
    """The number of characters from the last message of the prompt to be echoed."""
    temperature: Optional[float] = None
    max_tokens: Optional[int] = None
    timeout: Optional[int] = None
    stop: Optional[List[str]] = None
    max_retries: int = 2

    def _generate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        """Override the _generate method to implement the chat model logic.

        This can be a call to an API, a call to a local model, or any other
        implementation that generates a response to the input prompt.

        Args:
            messages: the prompt composed of a list of messages.
            stop: a list of strings on which the model should stop generating.
                  If generation stops due to a stop token, the stop token itself
                  SHOULD BE INCLUDED as part of the output. This is not enforced
                  across models right now, but it's a good practice to follow since
                  it makes it much easier to parse the output of the model
                  downstream and understand why generation stopped.
            run_manager: A run manager with callbacks for the LLM.
        """
        # Replace this with actual logic to generate a response from a list
        # of messages.
        last_message = messages[-1]
        tokens = last_message.content[: self.parrot_buffer_length]
        ct_input_tokens = sum(len(message.content) for message in messages)
        ct_output_tokens = len(tokens)
        message = AIMessage(
            content=tokens,
            additional_kwargs={},  # Used to add additional payload to the message
            response_metadata={  # Use for response metadata
                "time_in_seconds": 3,
                "model_name": self.model_name,
            },
            usage_metadata={
                "input_tokens": ct_input_tokens,
                "output_tokens": ct_output_tokens,
                "total_tokens": ct_input_tokens + ct_output_tokens,
            },
        )
        ##

        generation = ChatGeneration(message=message)
        return ChatResult(generations=[generation])

    def _stream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[ChatGenerationChunk]:
        """Stream the output of the model.

        This method should be implemented if the model can generate output
        in a streaming fashion. If the model does not support streaming,
        do not implement it. In that case streaming requests will be automatically
        handled by the _generate method.

        Args:
            messages: the prompt composed of a list of messages.
            stop: a list of strings on which the model should stop generating.
                  If generation stops due to a stop token, the stop token itself
                  SHOULD BE INCLUDED as part of the output. This is not enforced
                  across models right now, but it's a good practice to follow since
                  it makes it much easier to parse the output of the model
                  downstream and understand why generation stopped.
            run_manager: A run manager with callbacks for the LLM.
        """
        last_message = messages[-1]
        tokens = str(last_message.content[: self.parrot_buffer_length])
        ct_input_tokens = sum(len(message.content) for message in messages)

        for token in tokens:
            usage_metadata = UsageMetadata(
                {
                    "input_tokens": ct_input_tokens,
                    "output_tokens": 1,
                    "total_tokens": ct_input_tokens + 1,
                }
            )
            ct_input_tokens = 0
            chunk = ChatGenerationChunk(
                message=AIMessageChunk(content=token, usage_metadata=usage_metadata)
            )

            if run_manager:
                # This is optional in newer versions of LangChain
                # The on_llm_new_token will be called automatically
                run_manager.on_llm_new_token(token, chunk=chunk)

            yield chunk

        # Let's add some other information (e.g., response metadata)
        chunk = ChatGenerationChunk(
            message=AIMessageChunk(
                content="",
                response_metadata={"time_in_sec": 3, "model_name": self.model_name},
            )
        )
        if run_manager:
            # This is optional in newer versions of LangChain
            # The on_llm_new_token will be called automatically
            run_manager.on_llm_new_token(token, chunk=chunk)
        yield chunk

    @property
    def _llm_type(self) -> str:
        """Get the type of language model used by this chat model."""
        return "echoing-chat-model-advanced"

    @property
    def _identifying_params(self) -> Dict[str, Any]:
        """Return a dictionary of identifying parameters.

        This information is used by the LangChain callback system, which
        is used for tracing purposes make it possible to monitor LLMs.
        """
        return {
            # The model name allows users to specify custom token counting
            # rules in LLM monitoring applications (e.g., in LangSmith users
            # can provide per token pricing for their model and monitor
            # costs for the given LLM.)
            "model_name": self.model_name,
        }

让我们测试一下 🧪

聊天模型将实现 LangChain 的标准 Runnable 接口，LangChain 的许多抽象都支持该接口！

model = ChatParrotLink(parrot_buffer_length=3, model="my_custom_model")

model.invoke(
    [
        HumanMessage(content="hello!"),
        AIMessage(content="Hi there human!"),
        HumanMessage(content="Meow!"),
    ]
)

AIMessage(content='Meo', additional_kwargs={}, response_metadata={'time_in_seconds': 3}, id='run-cf11aeb6-8ab6-43d7-8c68-c1ef89b6d78e-0', usage_metadata={'input_tokens': 26, 'output_tokens': 3, 'total_tokens': 29})

model.invoke("hello")

AIMessage(content='hel', additional_kwargs={}, response_metadata={'time_in_seconds': 3}, id='run-618e5ed4-d611-4083-8cf1-c270726be8d9-0', usage_metadata={'input_tokens': 5, 'output_tokens': 3, 'total_tokens': 8})

model.batch(["hello", "goodbye"])

[AIMessage(content='hel', additional_kwargs={}, response_metadata={'time_in_seconds': 3}, id='run-eea4ed7d-d750-48dc-90c0-7acca1ff388f-0', usage_metadata={'input_tokens': 5, 'output_tokens': 3, 'total_tokens': 8}),
 AIMessage(content='goo', additional_kwargs={}, response_metadata={'time_in_seconds': 3}, id='run-07cfc5c1-3c62-485f-b1e0-3d46e1547287-0', usage_metadata={'input_tokens': 7, 'output_tokens': 3, 'total_tokens': 10})]

for chunk in model.stream("cat"):
    print(chunk.content, end="|")

c|a|t||

请参阅模型中 _astream 的实现！如果您不实现它，则不会有输出流式传输！

async for chunk in model.astream("cat"):
    print(chunk.content, end="|")

c|a|t||

让我们尝试使用 astream events API，它也将有助于再次检查所有回调是否都已实现！

async for event in model.astream_events("cat", version="v1"):
    print(event)

{'event': 'on_chat_model_start', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'name': 'ChatParrotLink', 'tags': [], 'metadata': {}, 'data': {'input': 'cat'}, 'parent_ids': []}
{'event': 'on_chat_model_stream', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'name': 'ChatParrotLink', 'data': {'chunk': AIMessageChunk(content='c', additional_kwargs={}, response_metadata={}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a', usage_metadata={'input_tokens': 3, 'output_tokens': 1, 'total_tokens': 4})}, 'parent_ids': []}
{'event': 'on_chat_model_stream', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'name': 'ChatParrotLink', 'data': {'chunk': AIMessageChunk(content='a', additional_kwargs={}, response_metadata={}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a', usage_metadata={'input_tokens': 0, 'output_tokens': 1, 'total_tokens': 1})}, 'parent_ids': []}
{'event': 'on_chat_model_stream', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'name': 'ChatParrotLink', 'data': {'chunk': AIMessageChunk(content='t', additional_kwargs={}, response_metadata={}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a', usage_metadata={'input_tokens': 0, 'output_tokens': 1, 'total_tokens': 1})}, 'parent_ids': []}
{'event': 'on_chat_model_stream', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'name': 'ChatParrotLink', 'data': {'chunk': AIMessageChunk(content='', additional_kwargs={}, response_metadata={'time_in_sec': 3}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a')}, 'parent_ids': []}
{'event': 'on_chat_model_end', 'name': 'ChatParrotLink', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'data': {'output': AIMessageChunk(content='cat', additional_kwargs={}, response_metadata={'time_in_sec': 3}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a', usage_metadata={'input_tokens': 3, 'output_tokens': 3, 'total_tokens': 6})}, 'parent_ids': []}

贡献

我们感谢所有聊天模型集成贡献。

以下是一份清单，可帮助确保您的贡献被添加到 LangChain 中

文档

模型包含所有初始化参数的文档字符串，因为这些参数将在 API 参考中显示。
如果模型由服务提供支持，则模型的类文档字符串包含指向模型 API 的链接。

测试

为被覆盖的方法添加单元测试或集成测试。如果您覆盖了相应的代码，请验证 invoke、ainvoke、batch、stream 是否正常工作。

流式传输（如果您正在实现）

实现 _stream 方法以使流式传输工作

停止令牌行为

应遵守停止令牌
停止令牌应包含在响应中

秘密 API 密钥

如果您的模型连接到 API，它很可能会在初始化时接受 API 密钥。对密钥使用 Pydantic 的 SecretStr 类型，这样它们就不会在人们打印模型时意外打印出来。

识别参数

在识别参数中包含 model_name

优化

考虑提供原生异步支持以减少模型的开销！

提供 _agenerate 的原生异步（由 ainvoke 使用）
提供 _astream 的原生异步（由 astream 使用）

下一步

您现在已经学会了如何创建自己的自定义聊天模型。

接下来，查看本节中其他关于聊天模型的操作指南，例如如何让模型返回结构化输出或如何跟踪聊天模型令牌使用情况。

输入和输出​

消息​

流式变体​

基础聊天模型​

实现​

让我们测试一下 🧪​

贡献​

下一步​

输入和输出

消息

流式变体

基础聊天模型

实现

让我们测试一下 🧪

贡献

下一步