如何直接将多模态数据传递给模型
这里我们演示如何将多模态输入直接传递给模型。我们目前期望所有输入都以与OpenAI期望的格式相同的方式传递。对于支持多模态输入的其他模型提供商,我们在类内部添加了逻辑以转换为预期的格式。
在这个例子中,我们将要求一个模型描述一张图片。
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o")
API 参考:HumanMessage | ChatOpenAI
最常用的传递图像的方式是将其作为字节字符串传递。这应该适用于大多数模型集成。
import base64
import httpx
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
],
)
response = model.invoke([message])
print(response.content)
The weather in the image appears to be clear and pleasant. The sky is mostly blue with scattered, light clouds, suggesting a sunny day with minimal cloud cover. There is no indication of rain or strong winds, and the overall scene looks bright and calm. The lush green grass and clear visibility further indicate good weather conditions.
我们可以将图像URL直接输入类型为“image_url”的内容块。请注意,只有一些模型提供商支持此功能。
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{"type": "image_url", "image_url": {"url": image_url}},
],
)
response = model.invoke([message])
print(response.content)
The weather in the image appears to be clear and sunny. The sky is mostly blue with a few scattered clouds, suggesting good visibility and a likely pleasant temperature. The bright sunlight is casting distinct shadows on the grass and vegetation, indicating it is likely daytime, possibly late morning or early afternoon. The overall ambiance suggests a warm and inviting day, suitable for outdoor activities.
我们还可以传入多张图片。
message = HumanMessage(
content=[
{"type": "text", "text": "are these two images the same?"},
{"type": "image_url", "image_url": {"url": image_url}},
{"type": "image_url", "image_url": {"url": image_url}},
],
)
response = model.invoke([message])
print(response.content)
Yes, the two images are the same. They both depict a wooden boardwalk extending through a grassy field under a blue sky with light clouds. The scenery, lighting, and composition are identical.
工具调用
一些多模态模型也支持工具调用功能。要使用此类模型调用工具,只需按照通常的方式将工具绑定到它们,并使用所需类型(例如,包含图像数据)的内容块调用模型。
from typing import Literal
from langchain_core.tools import tool
@tool
def weather_tool(weather: Literal["sunny", "cloudy", "rainy"]) -> None:
"""Describe the weather"""
pass
model_with_tools = model.bind_tools([weather_tool])
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{"type": "image_url", "image_url": {"url": image_url}},
],
)
response = model_with_tools.invoke([message])
print(response.tool_calls)
API 参考:tool
[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'call_BSX4oq4SKnLlp2WlzDhToHBr'}]