如何直接将多模态数据传递给模型
在此,我们将演示如何将 多模态 输入直接传递给模型。我们目前期望所有输入都以与 OpenAI 期望的格式相同的格式传递。对于其他支持多模态输入的模型提供商,我们在类内部添加了逻辑以转换为预期格式。
在本示例中,我们将要求 模型 描述图像。
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o")
API 参考:HumanMessage | ChatOpenAI
最常用的传入图像方式是以字节字符串的形式传入。这应该适用于大多数模型集成。
import base64
import httpx
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
],
)
response = model.invoke([message])
print(response.content)
The weather in the image appears to be clear and pleasant. The sky is mostly blue with scattered, light clouds, suggesting a sunny day with minimal cloud cover. There is no indication of rain or strong winds, and the overall scene looks bright and calm. The lush green grass and clear visibility further indicate good weather conditions.
我们可以将图像 URL 直接以 “image_url” 类型的 content block 传入。请注意,只有部分模型提供商支持此功能。
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{"type": "image_url", "image_url": {"url": image_url}},
],
)
response = model.invoke([message])
print(response.content)
The weather in the image appears to be clear and sunny. The sky is mostly blue with a few scattered clouds, suggesting good visibility and a likely pleasant temperature. The bright sunlight is casting distinct shadows on the grass and vegetation, indicating it is likely daytime, possibly late morning or early afternoon. The overall ambiance suggests a warm and inviting day, suitable for outdoor activities.
我们还可以传入多张图像。
message = HumanMessage(
content=[
{"type": "text", "text": "are these two images the same?"},
{"type": "image_url", "image_url": {"url": image_url}},
{"type": "image_url", "image_url": {"url": image_url}},
],
)
response = model.invoke([message])
print(response.content)
Yes, the two images are the same. They both depict a wooden boardwalk extending through a grassy field under a blue sky with light clouds. The scenery, lighting, and composition are identical.
工具调用
一些多模态模型也支持工具调用功能。要使用此类模型调用工具,只需以常用方式将工具绑定到它们,并使用所需类型的 content block(例如,包含图像数据)调用模型。
from typing import Literal
from langchain_core.tools import tool
@tool
def weather_tool(weather: Literal["sunny", "cloudy", "rainy"]) -> None:
"""Describe the weather"""
pass
model_with_tools = model.bind_tools([weather_tool])
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{"type": "image_url", "image_url": {"url": image_url}},
],
)
response = model_with_tools.invoke([message])
print(response.tool_calls)
API 参考:tool
[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'call_BSX4oq4SKnLlp2WlzDhToHBr'}]