Pinecone 重排序

本笔记展示了如何使用 PineconeRerank 进行两阶段向量检索重排，使用 Pinecone 的托管重排 API，如 langchain_pinecone/libs/pinecone/rerank.py 中所示。

设置

安装 langchain-pinecone 包。

%pip install -qU "langchain-pinecone"

凭证

设置您的 Pinecone API 密钥以使用重排 API。

import os
from getpass import getpass

os.environ["PINECONE_API_KEY"] = os.getenv("PINECONE_API_KEY") or getpass(
    "Enter your Pinecone API key: "
)

实例化

使用 PineconeRerank 根据与查询的相关性对文档列表进行重排。

from langchain_core.documents import Document
from langchain_pinecone import PineconeRerank

# Initialize reranker
reranker = PineconeRerank(model="bge-reranker-v2-m3")

# Sample documents
documents = [
    Document(page_content="Paris is the capital of France."),
    Document(page_content="Berlin is the capital of Germany."),
    Document(page_content="The Eiffel Tower is in Paris."),
]

# Rerank documents
query = "What is the capital of France?"
reranked_docs = reranker.compress_documents(documents, query)

# Print results
for doc in reranked_docs:
    score = doc.metadata.get("relevance_score")
    print(f"Score: {score:.4f} | Content: {doc.page_content}")

API 参考：Document | PineconeRerank

/Users/jakit/customers/aurelio/langchain-pinecone/libs/pinecone/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
``````output
Score: 0.9998 | Content: Paris is the capital of France.
Score: 0.1950 | Content: The Eiffel Tower is in Paris.
Score: 0.0042 | Content: Berlin is the capital of Germany.

使用

使用 Top-N 进行重排

指定 top_n 以限制返回文档的数量。

# Return only top-1 result
reranker_top1 = PineconeRerank(model="bge-reranker-v2-m3", top_n=1)
top1_docs = reranker_top1.compress_documents(documents, query)
print("Top-1 Result:")
for doc in top1_docs:
    print(f"Score: {doc.metadata['relevance_score']:.4f} | Content: {doc.page_content}")

Top-1 Result:
Score: 0.9998 | Content: Paris is the capital of France.

使用自定义排序字段进行重排

如果您的文档是字典或具有自定义字段，请使用 rank_fields 指定要排序的字段。

# Sample dictionary documents with 'text' field
docs_dict = [
    {
        "id": "doc1",
        "text": "Article about renewable energy.",
        "title": "Renewable Energy",
    },
    {"id": "doc2", "text": "Report on economic growth.", "title": "Economic Growth"},
    {
        "id": "doc3",
        "text": "News on climate policy changes.",
        "title": "Climate Policy",
    },
]

# Initialize reranker with rank_fields
reranker_text = PineconeRerank(model="bge-reranker-v2-m3", rank_fields=["text"])
climate_docs = reranker_text.rerank(docs_dict, "Latest news on climate change.")

# Show IDs and scores
for res in climate_docs:
    print(f"ID: {res['id']} | Score: {res['score']:.4f}")

ID: doc3 | Score: 0.9892
ID: doc1 | Score: 0.0006
ID: doc2 | Score: 0.0000

我们可以根据标题字段进行重排

economic_docs = reranker_text.rerank(docs_dict, "Economic forecast.")

# Show IDs and scores
for res in economic_docs:
    print(
        f"ID: {res['id']} | Score: {res['score']:.4f} | Title: {res['document']['title']}"
    )

ID: doc2 | Score: 0.8918 | Title: Economic Growth
ID: doc3 | Score: 0.0002 | Title: Climate Policy
ID: doc1 | Score: 0.0000 | Title: Renewable Energy

使用附加参数进行重排

您可以将模型特有的参数（例如，truncate）直接传递给 .rerank()。

如何处理比模型支持的输入更长的输入。接受的值：END 或 NONE。END 在输入令牌限制处截断输入序列。NONE 在输入超出输入令牌限制时返回错误。

# Rerank with custom truncate parameter
docs_simple = [
    {"id": "docA", "text": "Quantum entanglement is a physical phenomenon..."},
    {"id": "docB", "text": "Classical mechanics describes motion..."},
]

reranked = reranker.rerank(
    documents=docs_simple,
    query="Explain the concept of quantum entanglement.",
    truncate="END",
)
# Print reranked IDs and scores
for res in reranked:
    print(f"ID: {res['id']} | Score: {res['score']:.4f}")

ID: docA | Score: 0.6950
ID: docB | Score: 0.0001

在链中使用

API 参考

PineconeRerank(model, top_n, rank_fields, return_documents)
.rerank(documents, query, rank_fields=None, model=None, top_n=None, truncate="END")
.compress_documents(documents, query)（返回元数据中包含 relevance_score 的 Document 对象）

检索器概念指南
检索器操作指南

设置​

凭证​

实例化​

使用​

使用 Top-N 进行重排​

使用自定义排序字段进行重排​

使用附加参数进行重排​

在链中使用​

API 参考​

相关​

设置

凭证

实例化

使用

使用 Top-N 进行重排

使用自定义排序字段进行重排

使用附加参数进行重排

在链中使用

API 参考

相关