PGVector
使用
postgres
作为后端并利用pgvector
扩展的 LangChain 向量存储抽象的实现。
代码位于名为 langchain_postgres 的集成包中。
状态
此代码已从 langchain_community
移植到名为 langchain-postgres
的专用包中。已进行以下更改
- langchain_postgres 仅适用于 psycopg3。请将您的连接字符串从
postgresql+psycopg2://...
更新为postgresql+psycopg://langchain:langchain@...
(是的,驱动程序名称是psycopg
而不是psycopg3
,但它将使用psycopg3
。) - 嵌入存储和集合的模式已更改,以使 add_documents 能够使用用户指定的 ID 正确工作。
- 现在必须传递一个显式连接对象。
目前,**没有机制**支持在模式更改时轻松迁移数据。因此,向量存储中的任何模式更改都需要用户重新创建表并重新添加文档。如果这是您担心的事情,请使用不同的向量存储。如果没有,此实现应该适合您的用例。
设置
首先下载合作伙伴包
pip install -qU langchain_postgres
您可以运行以下命令启动带有 pgvector
扩展的 postgres 容器
%docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16
凭据
运行此笔记本不需要任何凭据,只需确保您下载了 langchain_postgres
包并正确启动了 postgres 容器即可。
如果您想获得对模型调用的最佳自动化跟踪,您还可以通过取消注释以下内容来设置您的 LangSmith API 密钥
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
实例化
- OpenAI
- HuggingFace
- 虚假嵌入
pip install -qU langchain-openai
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass()
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
pip install -qU langchain-huggingface
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model="sentence-transformers/all-mpnet-base-v2")
pip install -qU langchain-core
from langchain_core.embeddings import FakeEmbeddings
embeddings = FakeEmbeddings(size=4096)
from langchain_core.documents import Document
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain" # Uses psycopg3!
collection_name = "my_docs"
vector_store = PGVector(
embeddings=embeddings,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
管理向量存储
将项目添加到向量存储
请注意,按 ID 添加文档将覆盖与该 ID 匹配的任何现有文档。
docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
Document(
page_content="fresh apples are available at the market",
metadata={"id": 3, "location": "market", "topic": "food"},
),
Document(
page_content="the market also sells fresh oranges",
metadata={"id": 4, "location": "market", "topic": "food"},
),
Document(
page_content="the new art exhibit is fascinating",
metadata={"id": 5, "location": "museum", "topic": "art"},
),
Document(
page_content="a sculpture exhibit is also at the museum",
metadata={"id": 6, "location": "museum", "topic": "art"},
),
Document(
page_content="a new coffee shop opened on Main Street",
metadata={"id": 7, "location": "Main Street", "topic": "food"},
),
Document(
page_content="the book club meets at the library",
metadata={"id": 8, "location": "library", "topic": "reading"},
),
Document(
page_content="the library hosts a weekly story time for kids",
metadata={"id": 9, "location": "library", "topic": "reading"},
),
Document(
page_content="a cooking class for beginners is offered at the community center",
metadata={"id": 10, "location": "community center", "topic": "classes"},
),
]
vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
从向量存储中删除项目
vector_store.delete(ids=["3"])
查询向量存储
创建向量存储并添加相关文档后,您很可能希望在链或代理运行期间查询它。
过滤支持
向量存储支持一组可以应用于文档元数据字段的过滤器。
操作符 | 含义/类别 |
---|---|
\$eq | 相等(==) |
\$ne | 不相等(!=) |
\$lt | 小于(<) |
\$lte | 小于或等于(<=) |
\$gt | 大于(>) |
\$gte | 大于或等于(>=) |
\$in | 特殊情况(in) |
\$nin | 特殊情况(not in) |
\$between | 特殊情况(between) |
\$like | 文本(like) |
\$ilike | 文本(不区分大小写的 like) |
\$and | 逻辑(and) |
\$or | 逻辑(or) |
直接查询
执行简单的相似性搜索可以通过以下方式完成
results = vector_store.similarity_search(
"kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}}
)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', 'location': 'library'}]
* ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* the new art exhibit is fascinating [{'id': 5, 'topic': 'art', 'location': 'museum'}]
如果您提供了一个包含多个字段但没有操作符的字典,则顶层将被解释为逻辑 **AND** 过滤器
vector_store.similarity_search(
"ducks",
k=10,
filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]
vector_store.similarity_search(
"ducks",
k=10,
filter={
"$and": [
{"id": {"$in": [1, 5, 2, 9]}},
{"location": {"$in": ["pond", "market"]}},
]
},
)
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]
如果您想执行相似性搜索并接收相应的得分,您可以运行
results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.763449] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
有关您可以在 PGVector
向量存储上执行的不同搜索的完整列表,请参阅 API 参考。
通过转换为检索器进行查询
您还可以将向量存储转换为检索器,以便更轻松地在您的链中使用它。
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]
用于检索增强生成
有关如何将此向量存储用于检索增强生成(RAG)的指南,请参阅以下部分
API 参考
有关所有 ModuleNameVectorStore 功能和配置的详细文档,请访问 API 参考:https://python.langchain.ac.cn/v0.2/api_reference/postgres/vectorstores/langchain_postgres.vectorstores.PGVector.html