跳至主要内容

Google 生成式 AI 嵌入

使用 GoogleGenerativeAIEmbeddings 类连接到 Google 的生成式 AI 嵌入服务,该类位于 langchain-google-genai 包中。 (Connect to Google's generative AI embeddings service using the GoogleGenerativeAIEmbeddings class, found in the langchain-google-genai package.)

安装

%pip install --upgrade --quiet  langchain-google-genai

凭据

import getpass
import os

if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass("Provide your Google API key here")

使用

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector = embeddings.embed_query("hello, world!")
vector[:5]
[0.05636945, 0.0048285457, -0.0762591, -0.023642512, 0.05329321]

批处理

您也可以一次嵌入多个字符串,以加快处理速度。 (You can also embed multiple strings at once for a processing speedup)

vectors = embeddings.embed_documents(
[
"Today is Monday",
"Today is Tuesday",
"Today is April Fools day",
]
)
len(vectors), len(vectors[0])
(3, 768)

任务类型

GoogleGenerativeAIEmbeddings 可选地支持 task_type,目前必须是以下之一 (GoogleGenerativeAIEmbeddings optionally support a task_type, which currently must be one of)

  • task_type_unspecified
  • retrieval_query
  • retrieval_document
  • semantic_similarity
  • classification
  • clustering

默认情况下,我们在 embed_documents 方法中使用 retrieval_document,在 embed_query 方法中使用 retrieval_query。如果您提供任务类型,我们将对所有方法使用该类型。 (By default, we use retrieval_document in the embed_documents method and retrieval_query in the embed_query method. If you provide a task type, we will use that for all methods.)

%pip install --upgrade --quiet  matplotlib scikit-learn
Note: you may need to restart the kernel to use updated packages.
query_embeddings = GoogleGenerativeAIEmbeddings(
model="models/embedding-001", task_type="retrieval_query"
)
doc_embeddings = GoogleGenerativeAIEmbeddings(
model="models/embedding-001", task_type="retrieval_document"
)

所有这些都将使用 'retrieval_query' 任务集进行嵌入 (All of these will be embedded with the 'retrieval_query' task set)

query_vecs = [query_embeddings.embed_query(q) for q in [query, query_2, answer_1]]

所有这些都将使用 'retrieval_document' 任务集进行嵌入 (All of these will be embedded with the 'retrieval_document' task set)

doc_vecs = [doc_embeddings.embed_query(q) for q in [query, query_2, answer_1]]

在检索中,相对距离很重要。在上图中,您可以看到“相关文档”和“相似文档”之间的相似度评分差异。在后一种情况下,相似查询与相关文档之间的相似度差异更大。 (In retrieval, relative distance matters. In the image above, you can see the difference in similarity scores between the "relevant doc" and "simil stronger delta between the similar query and relevant doc on the latter case.)

其他配置

您可以将以下参数传递给 ChatGoogleGenerativeAI 以自定义 SDK 的行为 (You can pass the following parameters to ChatGoogleGenerativeAI in order to customize the SDK's behavior)

  • client_options: 客户端选项 传递给 Google API 客户端,例如自定义 client_options["api_endpoint"] (client_options: Client Options to pass to the Google API Client, such as a custom client_options["api_endpoint"])
  • transport: 要使用的传输方法,例如 restgrpcgrpc_asyncio。 (transport: The transport method to use, such as rest, grpc, or grpc_asyncio.)

此页面对您有帮助吗? (Was this page helpful?)


您也可以在 GitHub 上留下详细的反馈 (You can also leave detailed feedback) 在 GitHub 上 (on GitHub).