跳到主要内容
Open In ColabOpen on GitHub

Azure Cosmos DB for Apache Gremlin

Azure Cosmos DB for Apache Gremlin 是一种图形数据库服务,可用于存储包含数十亿个顶点和边的大型图形。您可以以毫秒级的延迟查询图形,并轻松演变图形结构。

Gremlin 是一种图形遍历语言和虚拟机,由 Apache Software FoundationApache TinkerPop 开发。

本笔记本展示了如何使用 LLM 为图形数据库提供自然语言界面,您可以使用 Gremlin 查询语言查询该数据库。

设置

安装库

!pip3 install gremlinpython

您将需要一个 Azure CosmosDB Graph 数据库实例。一种选择是在 Azure 中创建一个免费的 CosmosDB Graph 数据库实例

当您创建 Cosmos DB 帐户和 Graph 时,请使用 /type 作为分区键。

cosmosdb_name = "mycosmosdb"
cosmosdb_db_id = "graphtesting"
cosmosdb_db_graph_id = "mygraph"
cosmosdb_access_Key = "longstring=="
import nest_asyncio
from langchain_community.chains.graph_qa.gremlin import GremlinQAChain
from langchain_community.graphs import GremlinGraph
from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document
from langchain_openai import AzureChatOpenAI
graph = GremlinGraph(
url=f"wss://{cosmosdb_name}.gremlin.cosmos.azure.com:443/",
username=f"/dbs/{cosmosdb_db_id}/colls/{cosmosdb_db_graph_id}",
password=cosmosdb_access_Key,
)

为数据库播种

假设您的数据库为空,您可以使用 GraphDocuments 填充它

对于 Gremlin,始终为每个节点添加名为“label”的属性。如果未设置标签,则 Node.type 将用作标签。对于使用自然 ID 的 cosmos 来说,这是有意义的,因为它们在图形浏览器中可见。

source_doc = Document(
page_content="Matrix is a movie where Keanu Reeves, Laurence Fishburne and Carrie-Anne Moss acted."
)
movie = Node(id="The Matrix", properties={"label": "movie", "title": "The Matrix"})
actor1 = Node(id="Keanu Reeves", properties={"label": "actor", "name": "Keanu Reeves"})
actor2 = Node(
id="Laurence Fishburne", properties={"label": "actor", "name": "Laurence Fishburne"}
)
actor3 = Node(
id="Carrie-Anne Moss", properties={"label": "actor", "name": "Carrie-Anne Moss"}
)
rel1 = Relationship(
id=5, type="ActedIn", source=actor1, target=movie, properties={"label": "ActedIn"}
)
rel2 = Relationship(
id=6, type="ActedIn", source=actor2, target=movie, properties={"label": "ActedIn"}
)
rel3 = Relationship(
id=7, type="ActedIn", source=actor3, target=movie, properties={"label": "ActedIn"}
)
rel4 = Relationship(
id=8,
type="Starring",
source=movie,
target=actor1,
properties={"label": "Strarring"},
)
rel5 = Relationship(
id=9,
type="Starring",
source=movie,
target=actor2,
properties={"label": "Strarring"},
)
rel6 = Relationship(
id=10,
type="Straring",
source=movie,
target=actor3,
properties={"label": "Strarring"},
)
graph_doc = GraphDocument(
nodes=[movie, actor1, actor2, actor3],
relationships=[rel1, rel2, rel3, rel4, rel5, rel6],
source=source_doc,
)
# The underlying python-gremlin has a problem when running in notebook
# The following line is a workaround to fix the problem
nest_asyncio.apply()

# Add the document to the CosmosDB graph.
graph.add_graph_documents([graph_doc])

刷新图表架构信息

如果数据库的架构发生更改(更新后),您可以刷新架构信息。

graph.refresh_schema()
print(graph.schema)

查询图表

我们现在可以使用 gremlin QA 链来询问有关图表的问题

chain = GremlinQAChain.from_llm(
AzureChatOpenAI(
temperature=0,
azure_deployment="gpt-4-turbo",
),
graph=graph,
verbose=True,
)
chain.invoke("Who played in The Matrix?")
chain.run("How many people played in The Matrix?")

此页是否对您有帮助?