ChatGoogleGenerativeAI

通过 Gemini API 或使用 Google AI Studio 快速实验，直接访问 Google 的生成式 AI 模型，包括 Gemini 系列。langchain-google-genai 包提供了 LangChain 对这些模型的集成。这通常是个人开发者的最佳起点。

有关最新模型、其功能、上下文窗口等信息，请参阅 Google AI 文档。所有示例均使用 gemini-2.0-flash 模型。Gemini 2.5 Pro 和 2.5 Flash 可以通过 gemini-2.5-pro-preview-03-25 和 gemini-2.5-flash-preview-04-17 使用。所有模型 ID 都可以在 Gemini API 文档中找到。

集成详情

类别	包	本地	可序列化	JS 支持	包下载量	最新包版本
ChatGoogleGenerativeAI	langchain-google-genai	❌	测试版	✅

模型特性

工具调用	结构化输出	JSON 模式	图片输入	音频输入	视频输入	逐令牌流式传输	原生异步	令牌使用量	对数概率
✅	✅	❌	✅	✅	✅	✅	✅	✅	❌

设置

要访问 Google AI 模型，您需要创建一个 Google 帐号，获取一个 Google AI API 密钥，并安装 langchain-google-genai 集成包。

1. 安装

%pip install -U langchain-google-genai

2. 凭据

前往 https://ai.google.dev/gemini-api/docs/api-key（或通过 Google AI Studio）生成 Google AI API 密钥。

聊天模型

使用 ChatGoogleGenerativeAI 类与 Google 的聊天模型进行交互。有关完整详细信息，请参阅 API 参考。

import getpass
import os

if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google AI API key: ")

要启用模型调用的自动跟踪，请设置您的 LangSmith API 密钥

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

实例化

现在我们可以实例化模型对象并生成聊天补全

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

API 参考：ChatGoogleGenerativeAI

调用

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg

AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-3b28d4b8-8a62-4e6c-ad4e-b53e6e825749-0', usage_metadata={'input_tokens': 20, 'output_tokens': 7, 'total_tokens': 27, 'input_token_details': {'cache_read': 0}})

print(ai_msg.content)

J'adore la programmation.

链式调用

我们可以像这样将模型与提示模板链式连接起来

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that translates {input_language} to {output_language}.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm
chain.invoke(
    {
        "input_language": "English",
        "output_language": "German",
        "input": "I love programming.",
    }
)

API 参考：ChatPromptTemplate

AIMessage(content='Ich liebe Programmieren.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-e5561c6b-2beb-4411-9210-4796b576a7cd-0', usage_metadata={'input_tokens': 15, 'output_tokens': 7, 'total_tokens': 22, 'input_token_details': {'cache_read': 0}})

多模态用法

Gemini 模型可以接受多模态输入（文本、图像、音频、视频），并且对于某些模型，还可以生成多模态输出。

图像输入

使用 HumanMessage 并以列表内容格式提供图像输入以及文本。gemini-2.0-flash 模型可以处理图像。

import base64

from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

# Example using a public URL (remains the same)
message_url = HumanMessage(
    content=[
        {
            "type": "text",
            "text": "Describe the image at the URL.",
        },
        {"type": "image_url", "image_url": "https://picsum.photos/seed/picsum/200/300"},
    ]
)
result_url = llm.invoke([message_url])
print(f"Response for URL image: {result_url.content}")

# Example using a local image file encoded in base64
image_file_path = "/Users/philschmid/projects/google-gemini/langchain/docs/static/img/agents_vs_chains.png"

with open(image_file_path, "rb") as image_file:
    encoded_image = base64.b64encode(image_file.read()).decode("utf-8")

message_local = HumanMessage(
    content=[
        {"type": "text", "text": "Describe the local image."},
        {"type": "image_url", "image_url": f"data:image/png;base64,{encoded_image}"},
    ]
)
result_local = llm.invoke([message_local])
print(f"Response for local image: {result_local.content}")

API 参考：HumanMessage | ChatGoogleGenerativeAI

其他支持的 image_url 格式

Google Cloud Storage URI（gs://...）。确保服务帐号具有访问权限。
一个 PIL 图像对象（库处理编码）。

音频输入

提供音频文件输入以及文本。使用像 gemini-2.0-flash 这样的模型。

import base64

from langchain_core.messages import HumanMessage

# Ensure you have an audio file named 'example_audio.mp3' or provide the correct path.
audio_file_path = "example_audio.mp3"
audio_mime_type = "audio/mpeg"


with open(audio_file_path, "rb") as audio_file:
    encoded_audio = base64.b64encode(audio_file.read()).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Transcribe the audio."},
        {
            "type": "media",
            "data": encoded_audio,  # Use base64 string directly
            "mime_type": audio_mime_type,
        },
    ]
)
response = llm.invoke([message])  # Uncomment to run
print(f"Response for audio: {response.content}")

API 参考：HumanMessage

视频输入

提供视频文件输入以及文本。使用像 gemini-2.0-flash 这样的模型。

import base64

from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

# Ensure you have a video file named 'example_video.mp4' or provide the correct path.
video_file_path = "example_video.mp4"
video_mime_type = "video/mp4"


with open(video_file_path, "rb") as video_file:
    encoded_video = base64.b64encode(video_file.read()).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Describe the first few frames of the video."},
        {
            "type": "media",
            "data": encoded_video,  # Use base64 string directly
            "mime_type": video_mime_type,
        },
    ]
)
response = llm.invoke([message])  # Uncomment to run
print(f"Response for video: {response.content}")

API 参考：HumanMessage | ChatGoogleGenerativeAI

图像生成（多模态输出）

gemini-2.0-flash 模型可以内联生成文本和图像（图像生成是实验性的）。您需要指定所需的 response_modalities。

import base64

from IPython.display import Image, display
from langchain_core.messages import AIMessage
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="models/gemini-2.0-flash-preview-image-generation")

message = {
    "role": "user",
    "content": "Generate a photorealistic image of a cuddly cat wearing a hat.",
}

response = llm.invoke(
    [message],
    generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)


def _get_image_base64(response: AIMessage) -> None:
    image_block = next(
        block
        for block in response.content
        if isinstance(block, dict) and block.get("image_url")
    )
    return image_block["image_url"].get("url").split(",")[-1]


image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

API 参考：AIMessage | ChatGoogleGenerativeAI

图像和文本到图像

您可以在多轮对话中对图像进行迭代，如下所示

next_message = {
    "role": "user",
    "content": "Can you take the same image and make the cat black?",
}

response = llm.invoke(
    [message, response, next_message],
    generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)

image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

您还可以通过在数据 URI 方案中编码 base64 数据，在单个消息中表示输入图像和查询

message = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Can you make this cat orange?",
        },
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/png;base64,{image_base64}"},
        },
    ],
}

response = llm.invoke(
    [message],
    generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)

image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

您还可以像本教程中所示，使用 LangGraph 为您管理对话历史。

工具调用

您可以为模型配备可调用的工具。

from langchain_core.tools import tool
from langchain_google_genai import ChatGoogleGenerativeAI


# Define the tool
@tool(description="Get the current weather in a given location")
def get_weather(location: str) -> str:
    return "It's sunny."


# Initialize the model and bind the tool
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
llm_with_tools = llm.bind_tools([get_weather])

# Invoke the model with a query that should trigger the tool
query = "What's the weather in San Francisco?"
ai_msg = llm_with_tools.invoke(query)

# Check the tool calls in the response
print(ai_msg.tool_calls)

# Example tool call message would be needed here if you were actually running the tool
from langchain_core.messages import ToolMessage

tool_message = ToolMessage(
    content=get_weather(*ai_msg.tool_calls[0]["args"]),
    tool_call_id=ai_msg.tool_calls[0]["id"],
)
llm_with_tools.invoke([ai_msg, tool_message])  # Example of passing tool result back

API 参考：tool | ChatGoogleGenerativeAI | ToolMessage

[{'name': 'get_weather', 'args': {'location': 'San Francisco'}, 'id': 'a6248087-74c5-4b7c-9250-f335e642927c', 'type': 'tool_call'}]

AIMessage(content="OK. It's sunny in San Francisco.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-ac5bb52c-e244-4c72-9fbc-fb2a9cd7a72e-0', usage_metadata={'input_tokens': 29, 'output_tokens': 11, 'total_tokens': 40, 'input_token_details': {'cache_read': 0}})

结构化输出

强制模型使用 Pydantic 模型以特定结构响应。

from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_google_genai import ChatGoogleGenerativeAI


# Define the desired structure
class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The person's name")
    height_m: float = Field(..., description="The person's height in meters")


# Initialize the model
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
structured_llm = llm.with_structured_output(Person)

# Invoke the model with a query asking for structured information
result = structured_llm.invoke(
    "Who was the 16th president of the USA, and how tall was he in meters?"
)
print(result)

API 参考：ChatGoogleGenerativeAI

name='Abraham Lincoln' height_m=1.93

令牌使用跟踪

从响应元数据中访问令牌使用信息。

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

result = llm.invoke("Explain the concept of prompt engineering in one sentence.")

print(result.content)
print("\nUsage Metadata:")
print(result.usage_metadata)

API 参考：ChatGoogleGenerativeAI

Prompt engineering is the art and science of crafting effective text prompts to elicit desired and accurate responses from large language models.

Usage Metadata:
{'input_tokens': 10, 'output_tokens': 24, 'total_tokens': 34, 'input_token_details': {'cache_read': 0}}

内置工具

Google Gemini 支持多种内置工具（Google 搜索、代码执行），这些工具可以按常规方式绑定到模型。

from google.ai.generativelanguage_v1beta.types import Tool as GenAITool

resp = llm.invoke(
    "When is the next total solar eclipse in US?",
    tools=[GenAITool(google_search={})],
)

print(resp.content)

The next total solar eclipse visible in the United States will occur on August 23, 2044. However, the path of totality will only pass through Montana, North Dakota, and South Dakota.

For a total solar eclipse that crosses a significant portion of the continental U.S., you'll have to wait until August 12, 2045. This eclipse will start in California and end in Florida.

from google.ai.generativelanguage_v1beta.types import Tool as GenAITool

resp = llm.invoke(
    "What is 2*2, use python",
    tools=[GenAITool(code_execution={})],
)

for c in resp.content:
    if isinstance(c, dict):
        if c["type"] == "code_execution_result":
            print(f"Code execution result: {c['code_execution_result']}")
        elif c["type"] == "executable_code":
            print(f"Executable code: {c['executable_code']}")
    else:
        print(c)

Executable code: print(2*2)

Code execution result: 4

2*2 is 4.
``````output
/Users/philschmid/projects/google-gemini/langchain/.venv/lib/python3.9/site-packages/langchain_google_genai/chat_models.py:580: UserWarning: 
        ⚠️ Warning: Output may vary each run.  
        - 'executable_code': Always present.  
        - 'execution_result' & 'image_url': May be absent for some queries.  

        Validate before using in production.

  warnings.warn(

原生异步

使用异步方法进行非阻塞调用。

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")


async def run_async_calls():
    # Async invoke
    result_ainvoke = await llm.ainvoke("Why is the sky blue?")
    print("Async Invoke Result:", result_ainvoke.content[:50] + "...")

    # Async stream
    print("\nAsync Stream Result:")
    async for chunk in llm.astream(
        "Write a short poem about asynchronous programming."
    ):
        print(chunk.content, end="", flush=True)
    print("\n")

    # Async batch
    results_abatch = await llm.abatch(["What is 1+1?", "What is 2+2?"])
    print("Async Batch Results:", [res.content for res in results_abatch])


await run_async_calls()

API 参考：ChatGoogleGenerativeAI

Async Invoke Result: The sky is blue due to a phenomenon called **Rayle...

Async Stream Result:
The thread is free, it does not wait,
For answers slow, or tasks of fate.
A promise made, a future bright,
It moves ahead, with all its might.

A callback waits, a signal sent,
When data's read, or job is spent.
Non-blocking code, a graceful dance,
Responsive apps, a fleeting glance.

Async Batch Results: ['1 + 1 = 2', '2 + 2 = 4']

安全设置

Gemini 模型具有可以覆盖的默认安全设置。如果您从模型接收到大量“安全警告”，您可以尝试调整模型的 safety_settings 属性。例如，要关闭对危险内容的安全阻止，您可以按如下方式构建您的 LLM

from langchain_google_genai import (
    ChatGoogleGenerativeAI,
    HarmBlockThreshold,
    HarmCategory,
)

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    safety_settings={
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
    },
)

API 参考：ChatGoogleGenerativeAI | HarmBlockThreshold | HarmCategory

有关可用类别和阈值的枚举，请参阅 Google 的安全设置类型。

API 参考

有关 ChatGoogleGenerativeAI 所有功能和配置的详细文档，请参阅 API 参考：https://python.langchain.ac.cn/api_reference/google_genai/chat_models/langchain_google_genai.chat_models.ChatGoogleGenerativeAI.html

聊天模型概念指南
聊天模型操作指南

集成详情​

模型特性​

设置​

聊天模型​

实例化​

调用​

链式调用​

多模态用法​

图像输入​

音频输入​

视频输入​

图像生成（多模态输出）​

图像和文本到图像​

工具调用​

结构化输出​

令牌使用跟踪​

内置工具​

原生异步​

安全设置​

API 参考​

相关​

集成详情

模型特性

设置

聊天模型

实例化

调用

链式调用

多模态用法

图像输入

音频输入

视频输入

图像生成（多模态输出）

图像和文本到图像

工具调用

结构化输出

令牌使用跟踪

内置工具

原生异步

安全设置

API 参考

相关