跳到主要内容

SAP HANA Cloud 向量引擎

SAP HANA Cloud 向量引擎是一个完全集成到 SAP HANA Cloud 数据库中的向量存储。

您需要安装 langchain-community,使用 pip install -qU langchain-community 来使用此集成

设置

安装 HANA 数据库驱动程序。

# Pip install necessary package
%pip install --upgrade --quiet hdbcli

对于 OpenAIEmbeddings,我们使用来自环境的 OpenAI API 密钥。

import os
# Use OPENAI_API_KEY env variable
# os.environ["OPENAI_API_KEY"] = "Your OpenAI API key"

创建到 HANA Cloud 实例的数据库连接。

from dotenv import load_dotenv
from hdbcli import dbapi

load_dotenv()
# Use connection settings from the environment
connection = dbapi.connect(
address=os.environ.get("HANA_DB_ADDRESS"),
port=os.environ.get("HANA_DB_PORT"),
user=os.environ.get("HANA_DB_USER"),
password=os.environ.get("HANA_DB_PASSWORD"),
autocommit=True,
sslValidateCertificate=False,
)

示例

加载示例文档 “state_of_the_union.txt” 并从中创建块。

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores.hanavector import HanaDB
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

text_documents = TextLoader("../../how_to/state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
text_chunks = text_splitter.split_documents(text_documents)
print(f"Number of document chunks: {len(text_chunks)}")

embeddings = OpenAIEmbeddings()
Number of document chunks: 88

为 HANA 数据库创建一个 LangChain VectorStore 接口,并指定用于访问向量嵌入的表(集合)

db = HanaDB(
embedding=embeddings, connection=connection, table_name="STATE_OF_THE_UNION"
)

将加载的文档块添加到表中。对于此示例,我们删除可能来自先前运行的表中的任何先前内容。

# Delete already existing documents from the table
db.delete(filter={})

# add the loaded document chunks
db.add_documents(text_chunks)
[]

执行查询,以从上一步添加的文档块中获取两个最佳匹配的文档块。默认情况下,“余弦相似度”用于搜索。

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query, k=2)

for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.

使用“欧几里得距离”查询相同的内容。结果应与“余弦相似度”相同。

from langchain_community.vectorstores.utils import DistanceStrategy

db = HanaDB(
embedding=embeddings,
connection=connection,
distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
table_name="STATE_OF_THE_UNION",
)

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query, k=2)
for doc in docs:
print("-" * 80)
print(doc.page_content)
API 参考:DistanceStrategy
--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.

最大边际相关性搜索 (MMR)

最大边际相关性优化了与查询的相似性以及所选文档之间的多样性。将从数据库中检索前 20 个 (fetch_k) 项目。然后,MMR 算法将找到最佳 2 个 (k) 匹配项。

docs = db.max_marginal_relevance_search(query, k=2, fetch_k=20)
for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.

创建 HNSW 向量索引

向量索引可以显着加快向量的 top-k 最近邻查询。用户可以使用 create_hnsw_index 函数创建分层可导航小世界 (HNSW) 向量索引。

有关在数据库级别创建索引的更多信息,请参阅官方文档

# HanaDB instance uses cosine similarity as default:
db_cosine = HanaDB(
embedding=embeddings, connection=connection, table_name="STATE_OF_THE_UNION"
)

# Attempting to create the HNSW index with default parameters
db_cosine.create_hnsw_index() # If no other parameters are specified, the default values will be used
# Default values: m=64, ef_construction=128, ef_search=200
# The default index name will be: STATE_OF_THE_UNION_COSINE_SIMILARITY_IDX (verify this naming pattern in HanaDB class)


# Creating a HanaDB instance with L2 distance as the similarity function and defined values
db_l2 = HanaDB(
embedding=embeddings,
connection=connection,
table_name="STATE_OF_THE_UNION",
distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE, # Specify L2 distance
)

# This will create an index based on L2 distance strategy.
db_l2.create_hnsw_index(
index_name="STATE_OF_THE_UNION_L2_index",
m=100, # Max number of neighbors per graph node (valid range: 4 to 1000)
ef_construction=200, # Max number of candidates during graph construction (valid range: 1 to 100000)
ef_search=500, # Min number of candidates during the search (valid range: 1 to 100000)
)

# Use L2 index to perform MMR
docs = db_l2.max_marginal_relevance_search(query, k=2, fetch_k=20)
for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.

In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.

Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.

要点:

  • 相似度函数:索引的相似度函数默认为余弦相似度。如果要使用其他相似度函数(例如,L2 距离),则需要在初始化 HanaDB 实例时指定它。
  • 默认参数:在 create_hnsw_index 函数中,如果用户未提供 mef_constructionef_search 等参数的自定义值,则将自动使用默认值(例如,m=64ef_construction=128ef_search=200)。这些值可确保在无需用户干预的情况下以合理的性能创建索引。

基本向量存储操作

db = HanaDB(
connection=connection, embedding=embeddings, table_name="LANGCHAIN_DEMO_BASIC"
)

# Delete already existing documents from the table
db.delete(filter={})
True

我们可以将简单的文本文档添加到现有表中。

docs = [Document(page_content="Some text"), Document(page_content="Other docs")]
db.add_documents(docs)
[]

添加带有元数据的文档。

docs = [
Document(
page_content="foo",
metadata={"start": 100, "end": 150, "doc_name": "foo.txt", "quality": "bad"},
),
Document(
page_content="bar",
metadata={"start": 200, "end": 250, "doc_name": "bar.txt", "quality": "good"},
),
]
db.add_documents(docs)
[]

查询具有特定元数据的文档。

docs = db.similarity_search("foobar", k=2, filter={"quality": "bad"})
# With filtering on "quality"=="bad", only one document should be returned
for doc in docs:
print("-" * 80)
print(doc.page_content)
print(doc.metadata)
--------------------------------------------------------------------------------
foo
{'start': 100, 'end': 150, 'doc_name': 'foo.txt', 'quality': 'bad'}

删除具有特定元数据的文档。

db.delete(filter={"quality": "bad"})

# Now the similarity search with the same filter will return no results
docs = db.similarity_search("foobar", k=2, filter={"quality": "bad"})
print(len(docs))
0

高级筛选

除了基本的基于值的筛选功能外,还可以使用更高级的筛选。下表显示了可用的筛选运算符。

运算符语义
$eq相等 (==)
$ne不相等 (!=)
$lt小于 (<)
$lte小于或等于 (<=)
$gt大于 (>)
$gte大于或等于 (>=)
$in包含在给定值集合中 (in)
$nin不包含在给定值集合中 (not in)
$between介于两个边界值的范围内
$like基于 SQL 中的“LIKE”语义的文本相等性(使用“%”作为通配符)
$and逻辑“与”,支持 2 个或更多操作数
$or逻辑“或”,支持 2 个或更多操作数
# Prepare some test documents
docs = [
Document(
page_content="First",
metadata={"name": "adam", "is_active": True, "id": 1, "height": 10.0},
),
Document(
page_content="Second",
metadata={"name": "bob", "is_active": False, "id": 2, "height": 5.7},
),
Document(
page_content="Third",
metadata={"name": "jane", "is_active": True, "id": 3, "height": 2.4},
),
]

db = HanaDB(
connection=connection,
embedding=embeddings,
table_name="LANGCHAIN_DEMO_ADVANCED_FILTER",
)

# Delete already existing documents from the table
db.delete(filter={})
db.add_documents(docs)


# Helper function for printing filter results
def print_filter_result(result):
if len(result) == 0:
print("<empty result>")
for doc in result:
print(doc.metadata)

使用 $ne$gt$gte$lt$lte 筛选

advanced_filter = {"id": {"$ne": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$gt": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$gte": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$lt": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"id": {"$lte": 1}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))
Filter: {'id': {'$ne': 1}}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}
Filter: {'id': {'$gt': 1}}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}
Filter: {'id': {'$gte': 1}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}
Filter: {'id': {'$lt': 1}}
<empty result>
Filter: {'id': {'$lte': 1}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}

使用 $between$in$nin 筛选

advanced_filter = {"id": {"$between": (1, 2)}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$in": ["adam", "bob"]}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$nin": ["adam", "bob"]}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))
Filter: {'id': {'$between': (1, 2)}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'name': {'$in': ['adam', 'bob']}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'name': {'$nin': ['adam', 'bob']}}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}

使用 $like 进行文本筛选

advanced_filter = {"name": {"$like": "a%"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"name": {"$like": "%a%"}}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))
Filter: {'name': {'$like': 'a%'}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
Filter: {'name': {'$like': '%a%'}}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}

使用 $and$or 进行组合筛选

advanced_filter = {"$or": [{"id": 1}, {"name": "bob"}]}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"$and": [{"id": 1}, {"id": 2}]}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))

advanced_filter = {"$or": [{"id": 1}, {"id": 2}, {"id": 3}]}
print(f"Filter: {advanced_filter}")
print_filter_result(db.similarity_search("just testing", k=5, filter=advanced_filter))
Filter: {'$or': [{'id': 1}, {'name': 'bob'}]}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
Filter: {'$and': [{'id': 1}, {'id': 2}]}
<empty result>
Filter: {'$or': [{'id': 1}, {'id': 2}, {'id': 3}]}
{'name': 'adam', 'is_active': True, 'id': 1, 'height': 10.0}
{'name': 'bob', 'is_active': False, 'id': 2, 'height': 5.7}
{'name': 'jane', 'is_active': True, 'id': 3, 'height': 2.4}

在链中使用 VectorStore 作为检索器进行检索增强生成 (RAG)

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI

# Access the vector DB with a new table
db = HanaDB(
connection=connection,
embedding=embeddings,
table_name="LANGCHAIN_DEMO_RETRIEVAL_CHAIN",
)

# Delete already existing entries from the table
db.delete(filter={})

# add the loaded document chunks from the "State Of The Union" file
db.add_documents(text_chunks)

# Create a retriever instance of the vector store
retriever = db.as_retriever()

定义提示。

from langchain_core.prompts import PromptTemplate

prompt_template = """
You are an expert in state of the union topics. You are provided multiple context items that are related to the prompt you have to answer.
Use the following pieces of context to answer the question at the end.

'''
{context}
'''

Question: {question}
"""

PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT}
API 参考:PromptTemplate

创建 ConversationalRetrievalChain,它处理聊天历史记录和检索要添加到提示的类似文档块。

from langchain.chains import ConversationalRetrievalChain

llm = ChatOpenAI(model="gpt-3.5-turbo")
memory = ConversationBufferMemory(
memory_key="chat_history", output_key="answer", return_messages=True
)
qa_chain = ConversationalRetrievalChain.from_llm(
llm,
db.as_retriever(search_kwargs={"k": 5}),
return_source_documents=True,
memory=memory,
verbose=False,
combine_docs_chain_kwargs={"prompt": PROMPT},
)

提出第一个问题(并验证使用了多少文本块)。

question = "What about Mexico and Guatemala?"

result = qa_chain.invoke({"question": question})
print("Answer from LLM:")
print("================")
print(result["answer"])

source_docs = result["source_documents"]
print("================")
print(f"Number of used source document chunks: {len(source_docs)}")
Answer from LLM:
================
The United States has set up joint patrols with Mexico and Guatemala to catch more human traffickers. This collaboration is part of the efforts to address immigration issues and secure the borders in the region.
================
Number of used source document chunks: 5

详细检查链中已使用的块。检查排名最高的块是否包含问题中提到的关于“墨西哥和危地马拉”的信息。

for doc in source_docs:
print("-" * 80)
print(doc.page_content)
print(doc.metadata)

在同一个对话链上提出另一个问题。答案应该与之前给出的答案相关。

question = "What about other countries?"

result = qa_chain.invoke({"question": question})
print("Answer from LLM:")
print("================")
print(result["answer"])
Answer from LLM:
================
Mexico and Guatemala are involved in joint patrols to catch human traffickers.

标准表与带有向量数据的“自定义”表

作为默认行为,用于嵌入的表将创建 3 列

  • 一个名为 VEC_TEXT 的列,其中包含文档的文本
  • 一个名为 VEC_META 的列,其中包含文档的元数据
  • 一个名为 VEC_VECTOR 的列,其中包含文档文本的嵌入向量
# Access the vector DB with a new table
db = HanaDB(
connection=connection, embedding=embeddings, table_name="LANGCHAIN_DEMO_NEW_TABLE"
)

# Delete already existing entries from the table
db.delete(filter={})

# Add a simple document with some metadata
docs = [
Document(
page_content="A simple document",
metadata={"start": 100, "end": 150, "doc_name": "simple.txt"},
)
]
db.add_documents(docs)
[]

显示表 "LANGCHAIN_DEMO_NEW_TABLE" 中的列

cur = connection.cursor()
cur.execute(
"SELECT COLUMN_NAME, DATA_TYPE_NAME FROM SYS.TABLE_COLUMNS WHERE SCHEMA_NAME = CURRENT_SCHEMA AND TABLE_NAME = 'LANGCHAIN_DEMO_NEW_TABLE'"
)
rows = cur.fetchall()
for row in rows:
print(row)
cur.close()
('VEC_META', 'NCLOB')
('VEC_TEXT', 'NCLOB')
('VEC_VECTOR', 'REAL_VECTOR')

显示插入的文档在三列中的值

cur = connection.cursor()
cur.execute(
"SELECT VEC_TEXT, VEC_META, TO_NVARCHAR(VEC_VECTOR) FROM LANGCHAIN_DEMO_NEW_TABLE LIMIT 1"
)
rows = cur.fetchall()
print(rows[0][0]) # The text
print(rows[0][1]) # The metadata
print(rows[0][2]) # The vector
cur.close()

自定义表必须至少有三列与标准表的语义相匹配

  • 一个类型为 NCLOBNVARCHAR 的列,用于存储嵌入的文本/上下文
  • 一个类型为 NCLOBNVARCHAR 的列,用于存储元数据
  • 一个类型为 REAL_VECTOR 的列,用于存储嵌入向量

该表可以包含额外的列。当新的文档插入到表中时,这些额外的列必须允许 NULL 值。

# Create a new table "MY_OWN_TABLE_ADD" with three "standard" columns and one additional column
my_own_table_name = "MY_OWN_TABLE_ADD"
cur = connection.cursor()
cur.execute(
(
f"CREATE TABLE {my_own_table_name} ("
"SOME_OTHER_COLUMN NVARCHAR(42), "
"MY_TEXT NVARCHAR(2048), "
"MY_METADATA NVARCHAR(1024), "
"MY_VECTOR REAL_VECTOR )"
)
)

# Create a HanaDB instance with the own table
db = HanaDB(
connection=connection,
embedding=embeddings,
table_name=my_own_table_name,
content_column="MY_TEXT",
metadata_column="MY_METADATA",
vector_column="MY_VECTOR",
)

# Add a simple document with some metadata
docs = [
Document(
page_content="Some other text",
metadata={"start": 400, "end": 450, "doc_name": "other.txt"},
)
]
db.add_documents(docs)

# Check if data has been inserted into our own table
cur.execute(f"SELECT * FROM {my_own_table_name} LIMIT 1")
rows = cur.fetchall()
print(rows[0][0]) # Value of column "SOME_OTHER_DATA". Should be NULL/None
print(rows[0][1]) # The text
print(rows[0][2]) # The metadata
print(rows[0][3]) # The vector

cur.close()
None
Some other text
{"start": 400, "end": 450, "doc_name": "other.txt"}
<memory at 0x7f5edcb18d00>

添加另一个文档,并在自定义表上执行相似性搜索。

docs = [
Document(
page_content="Some more text",
metadata={"start": 800, "end": 950, "doc_name": "more.txt"},
)
]
db.add_documents(docs)

query = "What's up?"
docs = db.similarity_search(query, k=2)
for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
Some other text
--------------------------------------------------------------------------------
Some more text

使用自定义列优化过滤器性能

为了允许灵活的元数据值,默认情况下,所有元数据都作为 JSON 存储在元数据列中。如果某些使用的元数据键和值类型是已知的,则可以将它们存储在额外的列中,方法是创建以键名作为列名的目标表,并通过 specific_metadata_columns 列表将其传递给 HanaDB 构造函数。匹配这些值的元数据键在插入过程中会被复制到特殊列中。对于 specific_metadata_columns 列表中的键,过滤器使用特殊列而不是元数据 JSON 列。

# Create a new table "PERFORMANT_CUSTOMTEXT_FILTER" with three "standard" columns and one additional column
my_own_table_name = "PERFORMANT_CUSTOMTEXT_FILTER"
cur = connection.cursor()
cur.execute(
(
f"CREATE TABLE {my_own_table_name} ("
"CUSTOMTEXT NVARCHAR(500), "
"MY_TEXT NVARCHAR(2048), "
"MY_METADATA NVARCHAR(1024), "
"MY_VECTOR REAL_VECTOR )"
)
)

# Create a HanaDB instance with the own table
db = HanaDB(
connection=connection,
embedding=embeddings,
table_name=my_own_table_name,
content_column="MY_TEXT",
metadata_column="MY_METADATA",
vector_column="MY_VECTOR",
specific_metadata_columns=["CUSTOMTEXT"],
)

# Add a simple document with some metadata
docs = [
Document(
page_content="Some other text",
metadata={
"start": 400,
"end": 450,
"doc_name": "other.txt",
"CUSTOMTEXT": "Filters on this value are very performant",
},
)
]
db.add_documents(docs)

# Check if data has been inserted into our own table
cur.execute(f"SELECT * FROM {my_own_table_name} LIMIT 1")
rows = cur.fetchall()
print(
rows[0][0]
) # Value of column "CUSTOMTEXT". Should be "Filters on this value are very performant"
print(rows[0][1]) # The text
print(
rows[0][2]
) # The metadata without the "CUSTOMTEXT" data, as this is extracted into a sperate column
print(rows[0][3]) # The vector

cur.close()
Filters on this value are very performant
Some other text
{"start": 400, "end": 450, "doc_name": "other.txt", "CUSTOMTEXT": "Filters on this value are very performant"}
<memory at 0x7f5edcb193c0>

特殊列对于 langchain 接口的其余部分是完全透明的。一切都像以前一样工作,只是性能更高。

docs = [
Document(
page_content="Some more text",
metadata={
"start": 800,
"end": 950,
"doc_name": "more.txt",
"CUSTOMTEXT": "Another customtext value",
},
)
]
db.add_documents(docs)

advanced_filter = {"CUSTOMTEXT": {"$like": "%value%"}}
query = "What's up?"
docs = db.similarity_search(query, k=2, filter=advanced_filter)
for doc in docs:
print("-" * 80)
print(doc.page_content)
--------------------------------------------------------------------------------
Some other text
--------------------------------------------------------------------------------
Some more text

此页是否对您有帮助?