跳到主要内容
Open on GitHub

Graph RAG

本指南介绍了 Graph RAG。有关所有支持的功能和配置的详细文档,请参阅 Graph RAG 项目页面

概述

来自 langchain-graph-retriever 包的 GraphRetriever 提供了一个 LangChain 检索器,它结合了向量上的非结构化相似性搜索和元数据属性的结构化遍历。这支持在现有向量存储上进行基于图的检索。

集成详情

检索器来源PyPI 包最新项目页面
GraphRetrievergithub.com/datastax/graph-raglangchain-graph-retrieverPyPI - VersionGraph RAG

优势

设置

安装

此检索器位于 langchain-graph-retriever 包中。

pip install -qU langchain-graph-retriever

实例化

以下示例将展示如何对一些关于动物的示例文档执行图遍历。

先决条件

切换查看详情
  1. 确保您已安装 Python 3.10+

  2. 安装以下提供示例数据的包。

    pip install -qU graph_rag_example_helpers
  3. 下载测试文档

    from graph_rag_example_helpers.datasets.animals import fetch_documents
    animals = fetch_documents()
  4. 选择 嵌入模型
  5. OpenAI
  6. Azure
  7. Google
  8. AWS
  9. HuggingFace
  10. Ollama
  11. Cohere
  12. MistralAI
  13. Nomic
  14. NVIDIA
  15. Voyage AI
  16. IBM watsonx
  17. Fake
pip install -qU langchain-openai
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

填充向量存储

本节介绍如何使用示例数据填充各种向量存储。

如需帮助选择以下向量存储之一,或添加对您的向量存储的支持,请查阅关于 适配器和支持的存储 的文档。

使用 astra extra 安装 langchain-graph-retriever

pip install "langchain-graph-retriever[astra]"

然后创建一个向量存储并加载测试文档

from langchain_astradb import AstraDBVectorStore

vector_store = AstraDBVectorStore.from_documents(
documents=animals,
embedding=embeddings,
collection_name="animals",
api_endpoint=ASTRA_DB_API_ENDPOINT,
token=ASTRA_DB_APPLICATION_TOKEN,
)

有关 ASTRA_DB_API_ENDPOINTASTRA_DB_APPLICATION_TOKEN 凭据,请参阅 AstraDB 向量存储指南

注意

为了更快地进行初始测试,请考虑使用 InMemory 向量存储。

图遍历

此图检索器从与查询最匹配的单个动物开始,然后遍历到共享相同 habitat 和/或 origin 的其他动物。

from graph_retriever.strategies import Eager
from langchain_graph_retriever import GraphRetriever

traversal_retriever = GraphRetriever(
store = vector_store,
edges = [("habitat", "habitat"), ("origin", "origin")],
strategy = Eager(k=5, start_k=1, max_depth=2),
)

上面创建了一个图遍历检索器,它从最接近的动物 (start_k=1) 开始,检索 5 个文档 (k=5),并将搜索限制为距离第一只动物最多 2 步的文档 (max_depth=2)。

edges 定义了元数据值如何用于遍历。在这种情况下,每种动物都与其他具有相同 habitat 和/或 origin 的动物相连。

results = traversal_retriever.invoke("what animals could be found near a capybara?")

for doc in results:
print(f"{doc.id}: {doc.page_content}")
capybara: capybaras are the largest rodents in the world and are highly social animals.
heron: herons are wading birds known for their long legs and necks, often seen near water.
crocodile: crocodiles are large reptiles with powerful jaws and a long lifespan, often living over 70 years.
frog: frogs are amphibians known for their jumping ability and croaking sounds.
duck: ducks are waterfowl birds known for their webbed feet and quacking sounds.

图遍历通过利用数据中的结构化关系来提高检索质量。与标准相似性搜索(见下文)不同,它为文档的选择提供了清晰、可解释的理由。

在这种情况下,文档 capybaraheronfrogcrocodilenewt 都共享相同的 habitat=wetlands,如其元数据所定义。这应该提高文档相关性以及 LLM 回答的质量。

与标准检索的比较

max_depth=0 时,图遍历检索器的行为类似于标准检索器

standard_retriever = GraphRetriever(
store = vector_store,
edges = [("habitat", "habitat"), ("origin", "origin")],
strategy = Eager(k=5, start_k=5, max_depth=0),
)

这创建了一个检索器,它从最接近的 5 种动物 (start_k=5) 开始,并返回它们,而无需任何遍历 (max_depth=0)。在这种情况下,边缘定义将被忽略。

这本质上与

standard_retriever = vector_store.as_retriever(search_kwargs={"k":5})

对于任一情况,调用检索器都会返回

results = standard_retriever.invoke("what animals could be found near a capybara?")

for doc in results:
print(f"{doc.id}: {doc.page_content}")
capybara: capybaras are the largest rodents in the world and are highly social animals.
iguana: iguanas are large herbivorous lizards often found basking in trees and near water.
guinea pig: guinea pigs are small rodents often kept as pets due to their gentle and social nature.
hippopotamus: hippopotamuses are large semi-aquatic mammals known for their massive size and territorial behavior.
boar: boars are wild relatives of pigs, known for their tough hides and tusks.

这些文档仅基于相似性连接。存储中存在的任何结构化数据都将被忽略。与图检索相比,这可能会降低文档相关性,因为返回的结果对回答查询的帮助可能性较低。

用法

按照上面的示例,.invoke 用于启动对查询的检索。

在链中使用

与其他检索器一样,GraphRetriever 可以通过 集成到 LLM 应用程序中。

pip install -qU "langchain[openai]"
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

prompt = ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.

Context: {context}

Question: {question}"""
)

def format_docs(docs):
return "\n\n".join(f"text: {doc.page_content} metadata: {doc.metadata}" for doc in docs)

chain = (
{"context": traversal_retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke("what animals could be found near a capybara?")
Animals that could be found near a capybara include herons, crocodiles, frogs,
and ducks, as they all inhabit wetlands.

API 参考

要探索所有可用参数和高级配置,请参阅 Graph RAG API 参考


此页面是否对您有帮助?