跳到主要内容
Open In ColabOpen on GitHub

ChatGoogleGenerativeAI

通过 Gemini API 或使用 Google AI Studio 快速实验,直接访问 Google 的生成式 AI 模型,包括 Gemini 系列。langchain-google-genai 包提供了 LangChain 对这些模型的集成。这通常是个人开发者的最佳起点。

有关最新模型、其功能、上下文窗口等信息,请参阅 Google AI 文档。所有示例均使用 gemini-2.0-flash 模型。Gemini 2.5 Pro 和 2.5 Flash 可以通过 gemini-2.5-pro-preview-03-25gemini-2.5-flash-preview-04-17 使用。所有模型 ID 都可以在 Gemini API 文档中找到。

集成详情

类别本地可序列化JS 支持包下载量最新包版本
ChatGoogleGenerativeAIlangchain-google-genai测试版PyPI - DownloadsPyPI - Version

模型特性

工具调用结构化输出JSON 模式图片输入音频输入视频输入逐令牌流式传输原生异步令牌使用量对数概率

设置

要访问 Google AI 模型,您需要创建一个 Google 帐号,获取一个 Google AI API 密钥,并安装 langchain-google-genai 集成包。

1. 安装

%pip install -U langchain-google-genai

2. 凭据

前往 https://ai.google.dev/gemini-api/docs/api-key(或通过 Google AI Studio)生成 Google AI API 密钥。

聊天模型

使用 ChatGoogleGenerativeAI 类与 Google 的聊天模型进行交互。有关完整详细信息,请参阅 API 参考

import getpass
import os

if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google AI API key: ")

要启用模型调用的自动跟踪,请设置您的 LangSmith API 密钥

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

实例化

现在我们可以实例化模型对象并生成聊天补全

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
model="gemini-2.0-flash",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
# other params...
)

调用

messages = [
(
"system",
"You are a helpful assistant that translates English to French. Translate the user sentence.",
),
("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg
AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-3b28d4b8-8a62-4e6c-ad4e-b53e6e825749-0', usage_metadata={'input_tokens': 20, 'output_tokens': 7, 'total_tokens': 27, 'input_token_details': {'cache_read': 0}})
print(ai_msg.content)
J'adore la programmation.

链式调用

我们可以像这样将模型与提示模板链式连接起来

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant that translates {input_language} to {output_language}.",
),
("human", "{input}"),
]
)

chain = prompt | llm
chain.invoke(
{
"input_language": "English",
"output_language": "German",
"input": "I love programming.",
}
)
API 参考:ChatPromptTemplate
AIMessage(content='Ich liebe Programmieren.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-e5561c6b-2beb-4411-9210-4796b576a7cd-0', usage_metadata={'input_tokens': 15, 'output_tokens': 7, 'total_tokens': 22, 'input_token_details': {'cache_read': 0}})

多模态用法

Gemini 模型可以接受多模态输入(文本、图像、音频、视频),并且对于某些模型,还可以生成多模态输出。

图像输入

使用 HumanMessage 并以列表内容格式提供图像输入以及文本。gemini-2.0-flash 模型可以处理图像。

import base64

from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

# Example using a public URL (remains the same)
message_url = HumanMessage(
content=[
{
"type": "text",
"text": "Describe the image at the URL.",
},
{"type": "image_url", "image_url": "https://picsum.photos/seed/picsum/200/300"},
]
)
result_url = llm.invoke([message_url])
print(f"Response for URL image: {result_url.content}")

# Example using a local image file encoded in base64
image_file_path = "/Users/philschmid/projects/google-gemini/langchain/docs/static/img/agents_vs_chains.png"

with open(image_file_path, "rb") as image_file:
encoded_image = base64.b64encode(image_file.read()).decode("utf-8")

message_local = HumanMessage(
content=[
{"type": "text", "text": "Describe the local image."},
{"type": "image_url", "image_url": f"data:image/png;base64,{encoded_image}"},
]
)
result_local = llm.invoke([message_local])
print(f"Response for local image: {result_local.content}")

其他支持的 image_url 格式

  • Google Cloud Storage URI(gs://...)。确保服务帐号具有访问权限。
  • 一个 PIL 图像对象(库处理编码)。

音频输入

提供音频文件输入以及文本。使用像 gemini-2.0-flash 这样的模型。

import base64

from langchain_core.messages import HumanMessage

# Ensure you have an audio file named 'example_audio.mp3' or provide the correct path.
audio_file_path = "example_audio.mp3"
audio_mime_type = "audio/mpeg"


with open(audio_file_path, "rb") as audio_file:
encoded_audio = base64.b64encode(audio_file.read()).decode("utf-8")

message = HumanMessage(
content=[
{"type": "text", "text": "Transcribe the audio."},
{
"type": "media",
"data": encoded_audio, # Use base64 string directly
"mime_type": audio_mime_type,
},
]
)
response = llm.invoke([message]) # Uncomment to run
print(f"Response for audio: {response.content}")
API 参考:HumanMessage

视频输入

提供视频文件输入以及文本。使用像 gemini-2.0-flash 这样的模型。

import base64

from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

# Ensure you have a video file named 'example_video.mp4' or provide the correct path.
video_file_path = "example_video.mp4"
video_mime_type = "video/mp4"


with open(video_file_path, "rb") as video_file:
encoded_video = base64.b64encode(video_file.read()).decode("utf-8")

message = HumanMessage(
content=[
{"type": "text", "text": "Describe the first few frames of the video."},
{
"type": "media",
"data": encoded_video, # Use base64 string directly
"mime_type": video_mime_type,
},
]
)
response = llm.invoke([message]) # Uncomment to run
print(f"Response for video: {response.content}")

图像生成(多模态输出)

gemini-2.0-flash 模型可以内联生成文本和图像(图像生成是实验性的)。您需要指定所需的 response_modalities

import base64

from IPython.display import Image, display
from langchain_core.messages import AIMessage
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="models/gemini-2.0-flash-preview-image-generation")

message = {
"role": "user",
"content": "Generate a photorealistic image of a cuddly cat wearing a hat.",
}

response = llm.invoke(
[message],
generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)


def _get_image_base64(response: AIMessage) -> None:
image_block = next(
block
for block in response.content
if isinstance(block, dict) and block.get("image_url")
)
return image_block["image_url"].get("url").split(",")[-1]


image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

图像和文本到图像

您可以在多轮对话中对图像进行迭代,如下所示

next_message = {
"role": "user",
"content": "Can you take the same image and make the cat black?",
}

response = llm.invoke(
[message, response, next_message],
generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)

image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

您还可以通过在数据 URI 方案中编码 base64 数据,在单个消息中表示输入图像和查询

message = {
"role": "user",
"content": [
{
"type": "text",
"text": "Can you make this cat orange?",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"},
},
],
}

response = llm.invoke(
[message],
generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)

image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

您还可以像本教程中所示,使用 LangGraph 为您管理对话历史。

工具调用

您可以为模型配备可调用的工具。

from langchain_core.tools import tool
from langchain_google_genai import ChatGoogleGenerativeAI


# Define the tool
@tool(description="Get the current weather in a given location")
def get_weather(location: str) -> str:
return "It's sunny."


# Initialize the model and bind the tool
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
llm_with_tools = llm.bind_tools([get_weather])

# Invoke the model with a query that should trigger the tool
query = "What's the weather in San Francisco?"
ai_msg = llm_with_tools.invoke(query)

# Check the tool calls in the response
print(ai_msg.tool_calls)

# Example tool call message would be needed here if you were actually running the tool
from langchain_core.messages import ToolMessage

tool_message = ToolMessage(
content=get_weather(*ai_msg.tool_calls[0]["args"]),
tool_call_id=ai_msg.tool_calls[0]["id"],
)
llm_with_tools.invoke([ai_msg, tool_message]) # Example of passing tool result back
[{'name': 'get_weather', 'args': {'location': 'San Francisco'}, 'id': 'a6248087-74c5-4b7c-9250-f335e642927c', 'type': 'tool_call'}]
AIMessage(content="OK. It's sunny in San Francisco.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-ac5bb52c-e244-4c72-9fbc-fb2a9cd7a72e-0', usage_metadata={'input_tokens': 29, 'output_tokens': 11, 'total_tokens': 40, 'input_token_details': {'cache_read': 0}})

结构化输出

强制模型使用 Pydantic 模型以特定结构响应。

from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_google_genai import ChatGoogleGenerativeAI


# Define the desired structure
class Person(BaseModel):
"""Information about a person."""

name: str = Field(..., description="The person's name")
height_m: float = Field(..., description="The person's height in meters")


# Initialize the model
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
structured_llm = llm.with_structured_output(Person)

# Invoke the model with a query asking for structured information
result = structured_llm.invoke(
"Who was the 16th president of the USA, and how tall was he in meters?"
)
print(result)
name='Abraham Lincoln' height_m=1.93

令牌使用跟踪

从响应元数据中访问令牌使用信息。

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

result = llm.invoke("Explain the concept of prompt engineering in one sentence.")

print(result.content)
print("\nUsage Metadata:")
print(result.usage_metadata)
Prompt engineering is the art and science of crafting effective text prompts to elicit desired and accurate responses from large language models.

Usage Metadata:
{'input_tokens': 10, 'output_tokens': 24, 'total_tokens': 34, 'input_token_details': {'cache_read': 0}}

内置工具

Google Gemini 支持多种内置工具(Google 搜索代码执行),这些工具可以按常规方式绑定到模型。

from google.ai.generativelanguage_v1beta.types import Tool as GenAITool

resp = llm.invoke(
"When is the next total solar eclipse in US?",
tools=[GenAITool(google_search={})],
)

print(resp.content)
The next total solar eclipse visible in the United States will occur on August 23, 2044. However, the path of totality will only pass through Montana, North Dakota, and South Dakota.

For a total solar eclipse that crosses a significant portion of the continental U.S., you'll have to wait until August 12, 2045. This eclipse will start in California and end in Florida.
from google.ai.generativelanguage_v1beta.types import Tool as GenAITool

resp = llm.invoke(
"What is 2*2, use python",
tools=[GenAITool(code_execution={})],
)

for c in resp.content:
if isinstance(c, dict):
if c["type"] == "code_execution_result":
print(f"Code execution result: {c['code_execution_result']}")
elif c["type"] == "executable_code":
print(f"Executable code: {c['executable_code']}")
else:
print(c)
Executable code: print(2*2)

Code execution result: 4

2*2 is 4.
``````output
/Users/philschmid/projects/google-gemini/langchain/.venv/lib/python3.9/site-packages/langchain_google_genai/chat_models.py:580: UserWarning:
⚠️ Warning: Output may vary each run.
- 'executable_code': Always present.
- 'execution_result' & 'image_url': May be absent for some queries.

Validate before using in production.

warnings.warn(

原生异步

使用异步方法进行非阻塞调用。

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")


async def run_async_calls():
# Async invoke
result_ainvoke = await llm.ainvoke("Why is the sky blue?")
print("Async Invoke Result:", result_ainvoke.content[:50] + "...")

# Async stream
print("\nAsync Stream Result:")
async for chunk in llm.astream(
"Write a short poem about asynchronous programming."
):
print(chunk.content, end="", flush=True)
print("\n")

# Async batch
results_abatch = await llm.abatch(["What is 1+1?", "What is 2+2?"])
print("Async Batch Results:", [res.content for res in results_abatch])


await run_async_calls()
Async Invoke Result: The sky is blue due to a phenomenon called **Rayle...

Async Stream Result:
The thread is free, it does not wait,
For answers slow, or tasks of fate.
A promise made, a future bright,
It moves ahead, with all its might.

A callback waits, a signal sent,
When data's read, or job is spent.
Non-blocking code, a graceful dance,
Responsive apps, a fleeting glance.

Async Batch Results: ['1 + 1 = 2', '2 + 2 = 4']

安全设置

Gemini 模型具有可以覆盖的默认安全设置。如果您从模型接收到大量“安全警告”,您可以尝试调整模型的 safety_settings 属性。例如,要关闭对危险内容的安全阻止,您可以按如下方式构建您的 LLM

from langchain_google_genai import (
ChatGoogleGenerativeAI,
HarmBlockThreshold,
HarmCategory,
)

llm = ChatGoogleGenerativeAI(
model="gemini-1.5-pro",
safety_settings={
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
},
)

有关可用类别和阈值的枚举,请参阅 Google 的安全设置类型

API 参考

有关 ChatGoogleGenerativeAI 所有功能和配置的详细文档,请参阅 API 参考:https://python.langchain.ac.cn/api_reference/google_genai/chat_models/langchain_google_genai.chat_models.ChatGoogleGenerativeAI.html