跳至主要内容

Marqo

此笔记本展示了如何使用与 Marqo 向量存储相关的功能。

Marqo 是一个开源向量搜索引擎。Marqo 允许您存储和查询多模态数据,例如文本和图像。Marqo 使用大量开源模型为您创建向量,您也可以提供自己的微调模型,Marqo 会为您处理加载和推断。

您需要使用 pip install -qU langchain-community 安装 langchain-community 才能使用此集成

要使用我们的 Docker 镜像运行此笔记本,请先运行以下命令以获取 Marqo

docker pull marqoai/marqo:latest
docker rm -f marqo
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
%pip install --upgrade --quiet  marqo
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Marqo
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
API 参考:TextLoader
import marqo

# initialize marqo
marqo_url = "http://localhost:8882" # if using marqo cloud replace with your endpoint (console.marqo.ai)
marqo_api_key = "" # if using marqo cloud replace with your api key (console.marqo.ai)

client = marqo.Client(url=marqo_url, api_key=marqo_api_key)

index_name = "langchain-demo"

docsearch = Marqo.from_documents(docs, index_name=index_name)

query = "What did the president say about Ketanji Brown Jackson"
result_docs = docsearch.similarity_search(query)
Index langchain-demo exists.
print(result_docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
result_docs = docsearch.similarity_search_with_score(query)
print(result_docs[0][0].page_content, result_docs[0][1], sep="\n")
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
0.68647254

其他功能

Marqo 作为向量存储的一个强大功能是,您可以使用外部创建的索引。例如

  • 如果您有一个来自另一个应用程序的图像和文本对数据库,您可以简单地将其用于 LangChain 和 Marqo 向量存储。请注意,引入自己的多模态索引将禁用 add_texts 方法。

  • 如果您有一个文本文档数据库,您可以将其引入 LangChain 框架并通过 add_texts 添加更多文本。

返回的文档是通过将您自己的函数传递给搜索方法中的 page_content_builder 回调来自定义的。

多模态示例

# use a new index
index_name = "langchain-multimodal-demo"

# incase the demo is re-run
try:
client.delete_index(index_name)
except Exception:
print(f"Creating {index_name}")

# This index could have been created by another system
settings = {"treat_urls_and_pointers_as_images": True, "model": "ViT-L/14"}
client.create_index(index_name, **settings)
client.index(index_name).add_documents(
[
# image of a bus
{
"caption": "Bus",
"image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg",
},
# image of a plane
{
"caption": "Plane",
"image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg",
},
],
)
{'errors': False,
'processingTimeMs': 2090.2822139996715,
'index_name': 'langchain-multimodal-demo',
'items': [{'_id': 'aa92fc1c-1fb2-4d86-b027-feb507c419f7',
'result': 'created',
'status': 201},
{'_id': '5142c258-ef9f-4bf2-a1a6-2307280173a0',
'result': 'created',
'status': 201}]}
def get_content(res):
"""Helper to format Marqo's documents into text to be used as page_content"""
return f"{res['caption']}: {res['image']}"


docsearch = Marqo(client, index_name, page_content_builder=get_content)


query = "vehicles that fly"
doc_results = docsearch.similarity_search(query)
for doc in doc_results:
print(doc.page_content)
Plane: https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg
Bus: https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg

仅文本示例

# use a new index
index_name = "langchain-byo-index-demo"

# incase the demo is re-run
try:
client.delete_index(index_name)
except Exception:
print(f"Creating {index_name}")

# This index could have been created by another system
client.create_index(index_name)
client.index(index_name).add_documents(
[
{
"Title": "Smartphone",
"Description": "A smartphone is a portable computer device that combines mobile telephone "
"functions and computing functions into one unit.",
},
{
"Title": "Telephone",
"Description": "A telephone is a telecommunications device that permits two or more users to"
"conduct a conversation when they are too far apart to be easily heard directly.",
},
],
)
{'errors': False,
'processingTimeMs': 139.2144540004665,
'index_name': 'langchain-byo-index-demo',
'items': [{'_id': '27c05a1c-b8a9-49a5-ae73-fbf1eb51dc3f',
'result': 'created',
'status': 201},
{'_id': '6889afe0-e600-43c1-aa3b-1d91bf6db274',
'result': 'created',
'status': 201}]}
# Note text indexes retain the ability to use add_texts despite different field names in documents
# this is because the page_content_builder callback lets you handle these document fields as required


def get_content(res):
"""Helper to format Marqo's documents into text to be used as page_content"""
if "text" in res:
return res["text"]
return res["Description"]


docsearch = Marqo(client, index_name, page_content_builder=get_content)

docsearch.add_texts(["This is a document that is about elephants"])
['9986cc72-adcd-4080-9d74-265c173a9ec3']
query = "modern communications devices"
doc_results = docsearch.similarity_search(query)

print(doc_results[0].page_content)
A smartphone is a portable computer device that combines mobile telephone functions and computing functions into one unit.
query = "elephants"
doc_results = docsearch.similarity_search(query, page_content_builder=get_content)

print(doc_results[0].page_content)
This is a document that is about elephants

加权查询

我们还公开了 Marqo 的加权查询,这是一种构建复杂语义搜索的强大方法。

query = {"communications devices": 1.0}
doc_results = docsearch.similarity_search(query)
print(doc_results[0].page_content)
A smartphone is a portable computer device that combines mobile telephone functions and computing functions into one unit.
query = {"communications devices": 1.0, "technology post 2000": -1.0}
doc_results = docsearch.similarity_search(query)
print(doc_results[0].page_content)
A telephone is a telecommunications device that permits two or more users toconduct a conversation when they are too far apart to be easily heard directly.

带有来源的问答

本节展示了如何将 Marqo 用作 RetrievalQAWithSourcesChain 的一部分。Marqo 将执行对来源中信息的搜索。

import getpass
import os

from langchain.chains import RetrievalQAWithSourcesChain
from langchain_openai import OpenAI

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key:········
with open("../../how_to/state_of_the_union.txt") as f:
state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)
index_name = "langchain-qa-with-retrieval"
docsearch = Marqo.from_documents(docs, index_name=index_name)
Index langchain-qa-with-retrieval exists.
chain = RetrievalQAWithSourcesChain.from_chain_type(
OpenAI(temperature=0), chain_type="stuff", retriever=docsearch.as_retriever()
)
chain(
{"question": "What did the president say about Justice Breyer"},
return_only_outputs=True,
)
{'answer': ' The president honored Justice Breyer, thanking him for his service and noting that he is a retiring Justice of the United States Supreme Court.\n',
'sources': '../../../state_of_the_union.txt'}

此页面是否有帮助?


您也可以在 GitHub 上留下详细的反馈 GitHub.