SingleStoreDB

SingleStoreDB 是一款功能强大、高性能的分布式 SQL 数据库解决方案，专为在云和本地环境中脱颖而出而设计。凭借其多功能的特性集，它提供了无缝的部署选项，同时提供了无与伦比的性能。

SingleStoreDB 的一项突出功能是对向量存储和操作的先进支持，使其成为需要复杂 AI 功能（如文本相似性匹配）的应用程序的理想选择。通过内置的向量函数，如点积和欧几里得距离，SingleStoreDB 使开发人员能够高效地实现复杂的算法。

对于希望利用 SingleStoreDB 中的向量数据的开发人员来说，有一个全面的教程可用，指导他们了解处理向量数据的复杂性。本教程深入探讨了 SingleStoreDB 中的向量存储，展示了其根据向量相似性进行搜索的能力。利用向量索引，可以以极快的速度执行查询，从而实现快速检索相关数据。

此外，SingleStoreDB 的向量存储与基于 Lucene 的全文索引无缝集成，从而实现强大的文本相似性搜索。用户可以根据文档元数据对象的选定字段过滤搜索结果，从而提高查询精度。

SingleStoreDB 的与众不同之处在于它能够以多种方式组合向量和全文搜索，提供灵活性和多功能性。无论是通过文本或向量相似性进行预筛选并选择最相关的数据，还是采用加权求和方法计算最终相似性评分，开发人员都可以选择多种方法。

本质上，SingleStoreDB 为管理和查询向量数据提供了全面的解决方案，为 AI 驱动的应用程序提供了无与伦比的性能和灵活性。

您需要使用 pip install -qU langchain-community 安装 langchain-community 才能使用此集成

# Establishing a connection to the database is facilitated through the singlestoredb Python connector.
# Please ensure that this connector is installed in your working environment.
%pip install --upgrade --quiet  singlestoredb

import getpass
import os

# We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

from langchain_community.vectorstores import SingleStoreDB
from langchain_community.vectorstores.utils import DistanceStrategy
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

API 参考：SingleStoreDB | DistanceStrategy | Document | OpenAIEmbeddings

# loading docs
# we will use some artificial data for this example
docs = [
    Document(
        page_content="""In the parched desert, a sudden rainstorm brought relief,
            as the droplets danced upon the thirsty earth, rejuvenating the landscape
            with the sweet scent of petrichor.""",
        metadata={"category": "rain"},
    ),
    Document(
        page_content="""Amidst the bustling cityscape, the rain fell relentlessly,
            creating a symphony of pitter-patter on the pavement, while umbrellas
            bloomed like colorful flowers in a sea of gray.""",
        metadata={"category": "rain"},
    ),
    Document(
        page_content="""High in the mountains, the rain transformed into a delicate
            mist, enveloping the peaks in a mystical veil, where each droplet seemed to
            whisper secrets to the ancient rocks below.""",
        metadata={"category": "rain"},
    ),
    Document(
        page_content="""Blanketing the countryside in a soft, pristine layer, the
            snowfall painted a serene tableau, muffling the world in a tranquil hush
            as delicate flakes settled upon the branches of trees like nature's own 
            lacework.""",
        metadata={"category": "snow"},
    ),
    Document(
        page_content="""In the urban landscape, snow descended, transforming
            bustling streets into a winter wonderland, where the laughter of
            children echoed amidst the flurry of snowballs and the twinkle of
            holiday lights.""",
        metadata={"category": "snow"},
    ),
    Document(
        page_content="""Atop the rugged peaks, snow fell with an unyielding
            intensity, sculpting the landscape into a pristine alpine paradise,
            where the frozen crystals shimmered under the moonlight, casting a
            spell of enchantment over the wilderness below.""",
        metadata={"category": "snow"},
    ),
]

embeddings = OpenAIEmbeddings()

有几种方法可以建立与数据库的连接。您可以设置环境变量，也可以将命名参数传递给 SingleStoreDB 构造函数。或者，您可以将这些参数提供给 from_documents 和 from_texts 方法。

# Setup connection url as environment variable
os.environ["SINGLESTOREDB_URL"] = "root:pass@localhost:3306/db"

# Load documents to the store
docsearch = SingleStoreDB.from_documents(
    docs,
    embeddings,
    table_name="notebook",  # use table with a custom name
)

query = "trees in the snow"
docs = docsearch.similarity_search(query)  # Find documents that correspond to the query
print(docs[0].page_content)

SingleStoreDB 通过允许用户通过基于元数据字段的预筛选来增强和改进搜索结果，从而提升了搜索功能。此功能使开发人员和数据分析师能够微调查询，确保搜索结果完全符合他们的要求。通过使用特定的元数据属性过滤搜索结果，用户可以缩小查询范围，只关注相关的数据子集。

query = "trees branches"
docs = docsearch.similarity_search(
    query, filter={"category": "snow"}
)  # Find documents that correspond to the query and has category "snow"
print(docs[0].page_content)

通过利用 SingleStore DB 8.5 或更高版本中提供的ANN 向量索引，提高搜索效率。通过在创建向量存储对象期间设置 use_vector_index=True，您可以激活此功能。此外，如果您的向量在维度上与默认的 OpenAI 嵌入大小 1536 不同，请确保相应地指定 vector_size 参数。

SingleStoreDB 提供了多种搜索策略，每种策略都经过精心设计，以满足特定用例和用户偏好。默认的 VECTOR_ONLY 策略利用向量操作（如 点积 或 欧几里得距离）直接计算向量之间的相似性分数，而 TEXT_ONLY 采用基于 Lucene 的全文搜索，特别有利于以文本为中心的应用程序。对于希望采用平衡方法的用户，FILTER_BY_TEXT 首先根据文本相似性细化结果，然后进行向量比较，而 FILTER_BY_VECTOR 优先考虑向量相似性，在评估文本相似性以获得最佳匹配之前过滤结果。值得注意的是，FILTER_BY_TEXT 和 FILTER_BY_VECTOR 都需要全文索引才能操作。此外，WEIGHTED_SUM 成为一种复杂的策略，通过权衡向量和文本相似性来计算最终相似性分数，但仅利用点积距离计算，并且还需要全文索引。这些多功能策略使用户能够根据自己的独特需求微调搜索，从而促进高效、精确的数据检索和分析。此外，SingleStoreDB 的混合方法（例如 FILTER_BY_TEXT、FILTER_BY_VECTOR 和 WEIGHTED_SUM 策略）无缝地融合了基于向量和文本的搜索，以最大限度地提高效率和准确性，确保用户能够充分利用平台的功能，用于广泛的应用。

docsearch = SingleStoreDB.from_documents(
    docs,
    embeddings,
    distance_strategy=DistanceStrategy.DOT_PRODUCT,  # Use dot product for similarity search
    use_vector_index=True,  # Use vector index for faster search
    use_full_text_search=True,  # Use full text index
)

vectorResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreDB.SearchStrategy.VECTOR_ONLY,
    filter={"category": "rain"},
)
print(vectorResults[0].page_content)

textResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreDB.SearchStrategy.TEXT_ONLY,
)
print(textResults[0].page_content)

filteredByTextResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreDB.SearchStrategy.FILTER_BY_TEXT,
    filter_threshold=0.1,
)
print(filteredByTextResults[0].page_content)

filteredByVectorResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreDB.SearchStrategy.FILTER_BY_VECTOR,
    filter_threshold=0.1,
)
print(filteredByVectorResults[0].page_content)

weightedSumResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreDB.SearchStrategy.WEIGHTED_SUM,
    text_weight=0.2,
    vector_weight=0.8,
)
print(weightedSumResults[0].page_content)

在多模态数据分析领域，整合图像和文本等多种信息类型变得越来越重要。能够促进这种整合的强大工具之一是CLIP，这是一种最先进的模型，能够将图像和文本都嵌入到共享的语义空间中。通过这样做，CLIP 使得能够通过相似性搜索检索跨不同模态的相关内容。

为了说明这一点，让我们考虑一个旨在有效分析多模态数据的应用程序场景。在本例中，我们利用了OpenClip 多模态嵌入的功能，这些功能利用了 CLIP 的框架。使用 OpenClip，我们可以将文本描述与相应的图像无缝地嵌入在一起，从而实现全面的分析和检索任务。无论是根据文本查询识别视觉上相似的图像，还是查找与特定视觉内容相关的相关文本段落，OpenClip 都使用户能够以出色的效率和准确性探索和提取多模态数据的见解。

%pip install -U langchain openai singlestoredb langchain-experimental # (newest versions required for multi-modal)

import os

from langchain_community.vectorstores import SingleStoreDB
from langchain_experimental.open_clip import OpenCLIPEmbeddings

os.environ["SINGLESTOREDB_URL"] = "root:pass@localhost:3306/db"

TEST_IMAGES_DIR = "../../modules/images"

docsearch = SingleStoreDB(OpenCLIPEmbeddings())

image_uris = sorted(
    [
        os.path.join(TEST_IMAGES_DIR, image_name)
        for image_name in os.listdir(TEST_IMAGES_DIR)
        if image_name.endswith(".jpg")
    ]
)

# Add images
docsearch.add_images(uris=image_uris)

API 参考：SingleStoreDB | OpenCLIPEmbeddings

向量存储概念指南
向量存储操作指南

SingleStoreDB

此页面是否有用？

您也可以留下详细的反馈在 GitHub 上.

SingleStoreDB

多模态示例：利用 CLIP 和 OpenClip 嵌入​

相关​

此页面是否有用？

您也可以留下详细的反馈 在 GitHub 上.

多模态示例：利用 CLIP 和 OpenClip 嵌入

相关

您也可以留下详细的反馈在 GitHub 上.