NanoPQ（产品量化）

产品量化算法（k-NN）简而言之，是一种量化算法，它有助于压缩数据库向量，这有助于在涉及大型数据集时进行语义搜索。简而言之，嵌入被分成 M 个子空间，这些子空间进一步进行聚类。在对向量进行聚类后，质心向量被映射到每个子空间的每个聚类中存在的向量。

此笔记本介绍了如何使用一个检索器，该检索器在幕后使用由nanopq 包实现的产品量化。

%pip install -qU langchain-community langchain-openai nanopq

from langchain_community.embeddings.spacy_embeddings import SpacyEmbeddings
from langchain_community.retrievers import NanoPQRetriever

API 参考：SpacyEmbeddings | NanoPQRetriever

使用文本创建新的检索器

retriever = NanoPQRetriever.from_texts(
    ["Great world", "great words", "world", "planets of the world"],
    SpacyEmbeddings(model_name="en_core_web_sm"),
    clusters=2,
    subspace=2,
)

使用检索器

现在我们可以使用检索器了！

retriever.invoke("earth")

M: 2, Ks: 2, metric : <class 'numpy.uint8'>, code_dtype: l2
iter: 20, seed: 123
Training the subspace: 0 / 2
Training the subspace: 1 / 2
Encoding the subspace: 0 / 2
Encoding the subspace: 1 / 2

[Document(page_content='world'),
 Document(page_content='Great world'),
 Document(page_content='great words'),
 Document(page_content='planets of the world')]

检索器概念指南
检索器操作指南

NanoPQ（产品量化）

使用文本创建新的检索器

使用检索器

此页面是否有用？

您也可以在 GitHub 上留下详细的反馈在 GitHub 上.

NanoPQ（产品量化）

使用文本创建新的检索器​

使用检索器​

相关内容​

此页面是否有用？

您也可以在 GitHub 上留下详细的反馈 在 GitHub 上.

使用文本创建新的检索器

使用检索器

相关内容

您也可以在 GitHub 上留下详细的反馈在 GitHub 上.