GreenNodeEmbeddings
GreenNode 是一家全球AI解决方案提供商,也是 **NVIDIA 首选合作伙伴**,为美国、中东和非洲、亚太地区的企业提供从基础设施到应用的全面AI能力。GreenNode 在 **世界一流的基础设施** (LEED Gold, TIA‑942, Uptime Tier III) 上运营,为企业、初创公司和研究人员提供了一整套全面的AI服务
本笔记本提供了 `GreenNodeEmbeddings` 的入门指南。通过生成高质量的文本向量表示,它使您能够使用各种内置连接器或您自己的自定义数据源执行语义文档搜索。
概述
集成详情
提供商 | 包 |
---|---|
GreenNode | langchain-greennode |
设置
要访问 GreenNode 嵌入模型,您需要创建一个 GreenNode 账户,获取 API 密钥,并安装 `langchain-greennode` 集成包。
凭证
GreenNode 需要 API 密钥进行身份验证,该密钥可以在初始化时作为 `api_key` 参数提供,或者设置为环境变量 `GREENNODE_API_KEY`。您可以通过在 GreenNode 无服务器 AI 上注册账户来获取 API 密钥。
import getpass
import os
if not os.getenv("GREENNODE_API_KEY"):
os.environ["GREENNODE_API_KEY"] = getpass.getpass("Enter your GreenNode API key: ")
如果您希望对模型调用进行自动化追踪,也可以通过取消注释下方内容来设置您的 LangSmith API 密钥。
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
安装
LangChain GreenNode 集成位于 `langchain-greennode` 包中
%pip install -qU langchain-greennode
Note: you may need to restart the kernel to use updated packages.
实例化
`GreenNodeEmbeddings` 类可以通过 API 密钥和模型名称的可选参数进行实例化
from langchain_greennode import GreenNodeEmbeddings
# Initialize the embeddings model
embeddings = GreenNodeEmbeddings(
# api_key="YOUR_API_KEY", # You can pass the API key directly
model="BAAI/bge-m3" # The default embedding model
)
索引和检索
嵌入模型通过实现内容的索引和高效检索,在检索增强生成 (RAG) 工作流中扮演着关键角色。下面,您将看到如何使用我们上面初始化的 `embeddings` 对象来索引和检索数据。在此示例中,我们将在 `InMemoryVectorStore` 中索引和检索一个示例文档。
# Create a vector store with a sample text
from langchain_core.vectorstores import InMemoryVectorStore
text = "LangChain is the framework for building context-aware reasoning applications"
vectorstore = InMemoryVectorStore.from_texts(
[text],
embedding=embeddings,
)
# Use the vectorstore as a retriever
retriever = vectorstore.as_retriever()
# Retrieve the most similar text
retrieved_documents = retriever.invoke("What is LangChain?")
# show the retrieved document's content
retrieved_documents[0].page_content
'LangChain is the framework for building context-aware reasoning applications'
直接用法
`GreenNodeEmbeddings` 类可以独立用于生成文本嵌入,而无需向量存储。这对于相似度评分、聚类或自定义处理管道等任务非常有用。
嵌入单个文本
您可以使用 embed_query
嵌入单个文本或文档
single_vector = embeddings.embed_query(text)
print(str(single_vector)[:100]) # Show the first 100 characters of the vector
[-0.01104736328125, -0.0281982421875, 0.0035858154296875, -0.0311279296875, -0.0106201171875, -0.039
嵌入多个文本
您可以使用 embed_documents
嵌入多个文本
text2 = (
"LangGraph is a library for building stateful, multi-actor applications with LLMs"
)
two_vectors = embeddings.embed_documents([text, text2])
for vector in two_vectors:
print(str(vector)[:100]) # Show the first 100 characters of the vector
[-0.01104736328125, -0.0281982421875, 0.0035858154296875, -0.0311279296875, -0.0106201171875, -0.039
[-0.07177734375, -0.00017452239990234375, -0.002044677734375, -0.0299072265625, -0.0184326171875, -0
异步支持
GreenNodeEmbeddings 支持异步操作
import asyncio
async def generate_embeddings_async():
# Embed a single query
query_result = await embeddings.aembed_query("What is the capital of France?")
print(f"Async query embedding dimension: {len(query_result)}")
# Embed multiple documents
docs = [
"Paris is the capital of France",
"Berlin is the capital of Germany",
"Rome is the capital of Italy",
]
docs_result = await embeddings.aembed_documents(docs)
print(f"Async document embeddings count: {len(docs_result)}")
await generate_embeddings_async()
Async query embedding dimension: 1024
Async document embeddings count: 3
文档相似度示例
import numpy as np
from scipy.spatial.distance import cosine
# Create some documents
documents = [
"Machine learning algorithms build mathematical models based on sample data",
"Deep learning uses neural networks with many layers",
"Climate change is a major global environmental challenge",
"Neural networks are inspired by the human brain's structure",
]
# Embed the documents
embeddings_list = embeddings.embed_documents(documents)
# Function to calculate similarity
def calculate_similarity(embedding1, embedding2):
return 1 - cosine(embedding1, embedding2)
# Print similarity matrix
print("Document Similarity Matrix:")
for i, emb_i in enumerate(embeddings_list):
similarities = []
for j, emb_j in enumerate(embeddings_list):
similarity = calculate_similarity(emb_i, emb_j)
similarities.append(f"{similarity:.4f}")
print(f"Document {i + 1}: {similarities}")
Document Similarity Matrix:
Document 1: ['1.0000', '0.6005', '0.3542', '0.5788']
Document 2: ['0.6005', '1.0000', '0.4154', '0.6170']
Document 3: ['0.3542', '0.4154', '1.0000', '0.3528']
Document 4: ['0.5788', '0.6170', '0.3528', '1.0000']
API 参考
有关 GreenNode 无服务器 AI API 的更多详细信息,请访问 GreenNode 无服务器 AI 文档。