跳至主要内容

ScaNN

ScaNN(可扩展最近邻)是一种用于高效向量相似性搜索的方法,适用于大规模数据。

ScaNN 包含搜索空间修剪和量化,用于最大内积搜索,并且还支持其他距离函数,例如欧氏距离。该实现针对支持 AVX2 的 x86 处理器进行了优化。有关更多详细信息,请参见其 Google Research github

您需要通过 pip install -qU langchain-community 安装 langchain-community 才能使用此集成

安装

通过 pip 安装 ScaNN。或者,您可以按照 ScaNN 网站 上的说明从源代码安装。

%pip install --upgrade --quiet  scann

检索演示

下面我们将展示如何在结合使用 ScaNN 和 Huggingface 嵌入的情况下使用 ScaNN。

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import ScaNN
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)


model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)

db = ScaNN.from_documents(docs, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)

docs[0]

检索QA 演示

接下来,我们将演示如何在结合使用 ScaNN 和 Google PaLM API 的情况下使用 ScaNN。

您可以从 https://developers.generativeai.google/tutorials/setup 获取 API 密钥

from langchain.chains import RetrievalQA
from langchain_community.chat_models.google_palm import ChatGooglePalm

palm_client = ChatGooglePalm(google_api_key="YOUR_GOOGLE_PALM_API_KEY")

qa = RetrievalQA.from_chain_type(
llm=palm_client,
chain_type="stuff",
retriever=db.as_retriever(search_kwargs={"k": 10}),
)
print(qa.run("What did the president say about Ketanji Brown Jackson?"))
The president said that Ketanji Brown Jackson is one of our nation's top legal minds, who will continue Justice Breyer's legacy of excellence.
print(qa.run("What did the president say about Michael Phelps?"))
The president did not mention Michael Phelps in his speech.

保存和加载本地检索索引

db.save_local("/tmp/db", "state_of_union")
restored_db = ScaNN.load_local("/tmp/db", embeddings, index_name="state_of_union")

此页面对您有帮助吗?


您也可以在 GitHub 上留下详细的反馈 on GitHub.