跳到主要内容
Open In ColabOpen on GitHub

Astra DB (Cassandra)

DataStax Astra DB 是一个基于 Cassandra 构建的无服务器向量数据库,可通过易于使用的 JSON API 方便地使用。

在演练中,我们将演示带有 Astra DB 向量存储的 SelfQueryRetriever

创建 Astra DB 向量存储

首先,我们要创建一个 Astra DB VectorStore 并使用一些数据进行播种。我们创建了一个小型演示文档集,其中包含电影摘要。

注意:自查询检索器需要您安装 lark (pip install lark)。我们还需要 astrapy 包。

%pip install --upgrade --quiet lark astrapy langchain-openai

我们要使用 OpenAIEmbeddings,因此我们必须获取 OpenAI API 密钥。

import os
from getpass import getpass

from langchain_openai.embeddings import OpenAIEmbeddings

if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key:")

embeddings = OpenAIEmbeddings()
API 参考:OpenAIEmbeddings

创建 Astra DB VectorStore

  • API 端点看起来像 https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com
  • 令牌看起来像 AstraCS:6gBhNmsk135....
ASTRA_DB_API_ENDPOINT = input("ASTRA_DB_API_ENDPOINT = ")
ASTRA_DB_APPLICATION_TOKEN = getpass("ASTRA_DB_APPLICATION_TOKEN = ")
from langchain_community.vectorstores import AstraDB
from langchain_core.documents import Document

docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "science fiction",
"rating": 9.9,
},
),
]

vectorstore = AstraDB.from_documents(
docs,
embeddings,
collection_name="astra_self_query_demo",
api_endpoint=ASTRA_DB_API_ENDPOINT,
token=ASTRA_DB_APPLICATION_TOKEN,
)
API 参考:AstraDB | Document

创建我们的自查询检索器

现在我们可以实例化我们的检索器。为此,我们需要预先提供一些关于我们的文档支持的元数据字段的信息以及文档内容的简短描述。

from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import OpenAI

metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie",
type="string or list[string]",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
llm = OpenAI(temperature=0)

retriever = SelfQueryRetriever.from_llm(
llm, vectorstore, document_content_description, metadata_field_info, verbose=True
)

测试一下

现在我们可以尝试实际使用我们的检索器了!

# This example only specifies a relevant query
retriever.invoke("What are some movies about dinosaurs?")
# This example specifies a filter
retriever.invoke("I want to watch a movie rated higher than 8.5")
# This example only specifies a query and a filter
retriever.invoke("Has Greta Gerwig directed any movies about women")
# This example specifies a composite filter
retriever.invoke("What's a highly rated (above 8.5), science fiction movie ?")
# This example specifies a query and composite filter
retriever.invoke(
"What's a movie about toys after 1990 but before 2005, and is animated"
)

过滤 k

我们还可以使用自查询检索器来指定 k:要获取的文档数量。

我们可以通过将 enable_limit=True 传递给构造函数来做到这一点。

retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
verbose=True,
enable_limit=True,
)
# This example only specifies a relevant query
retriever.invoke("What are two movies about dinosaurs?")

清理

如果您想从 Astra DB 实例中完全删除集合,请运行此命令。

(您将丢失存储在其中的数据。)

vectorstore.delete_collection()

此页是否对您有帮助?