Google 生成式 AI 嵌入
使用 GoogleGenerativeAIEmbeddings
类连接到 Google 的生成式 AI 嵌入服务,该类位于 langchain-google-genai 包中。 (Connect to Google's generative AI embeddings service using the GoogleGenerativeAIEmbeddings
class, found in the langchain-google-genai package.)
安装
%pip install --upgrade --quiet langchain-google-genai
凭据
import getpass
import os
if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass("Provide your Google API key here")
使用
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector = embeddings.embed_query("hello, world!")
vector[:5]
[0.05636945, 0.0048285457, -0.0762591, -0.023642512, 0.05329321]
批处理
您也可以一次嵌入多个字符串,以加快处理速度。 (You can also embed multiple strings at once for a processing speedup)
vectors = embeddings.embed_documents(
[
"Today is Monday",
"Today is Tuesday",
"Today is April Fools day",
]
)
len(vectors), len(vectors[0])
(3, 768)
任务类型
GoogleGenerativeAIEmbeddings
可选地支持 task_type
,目前必须是以下之一 (GoogleGenerativeAIEmbeddings
optionally support a task_type
, which currently must be one of)
- task_type_unspecified
- retrieval_query
- retrieval_document
- semantic_similarity
- classification
- clustering
默认情况下,我们在 embed_documents
方法中使用 retrieval_document
,在 embed_query
方法中使用 retrieval_query
。如果您提供任务类型,我们将对所有方法使用该类型。 (By default, we use retrieval_document
in the embed_documents
method and retrieval_query
in the embed_query
method. If you provide a task type, we will use that for all methods.)
%pip install --upgrade --quiet matplotlib scikit-learn
Note: you may need to restart the kernel to use updated packages.
query_embeddings = GoogleGenerativeAIEmbeddings(
model="models/embedding-001", task_type="retrieval_query"
)
doc_embeddings = GoogleGenerativeAIEmbeddings(
model="models/embedding-001", task_type="retrieval_document"
)
所有这些都将使用 'retrieval_query' 任务集进行嵌入 (All of these will be embedded with the 'retrieval_query' task set)
query_vecs = [query_embeddings.embed_query(q) for q in [query, query_2, answer_1]]
所有这些都将使用 'retrieval_document' 任务集进行嵌入 (All of these will be embedded with the 'retrieval_document' task set)
doc_vecs = [doc_embeddings.embed_query(q) for q in [query, query_2, answer_1]]
在检索中,相对距离很重要。在上图中,您可以看到“相关文档”和“相似文档”之间的相似度评分差异。在后一种情况下,相似查询与相关文档之间的相似度差异更大。 (In retrieval, relative distance matters. In the image above, you can see the difference in similarity scores between the "relevant doc" and "simil stronger delta between the similar query and relevant doc on the latter case.)
其他配置
您可以将以下参数传递给 ChatGoogleGenerativeAI 以自定义 SDK 的行为 (You can pass the following parameters to ChatGoogleGenerativeAI in order to customize the SDK's behavior)
client_options
: 客户端选项 传递给 Google API 客户端,例如自定义client_options["api_endpoint"]
(client_options
: Client Options to pass to the Google API Client, such as a customclient_options["api_endpoint"]
)transport
: 要使用的传输方法,例如rest
、grpc
或grpc_asyncio
。 (transport
: The transport method to use, such asrest
,grpc
, orgrpc_asyncio
.)
相关
- 嵌入模型 概念指南 (Embedding model conceptual guide)
- 嵌入模型 操作指南 (Embedding model how-to guides)