跳至主要内容

DuckDB

DuckDB 是一个进程内 SQL OLAP 数据库管理系统。

加载一个 DuckDB 查询,每行一个文档。

%pip install --upgrade --quiet  duckdb
from langchain_community.document_loaders import DuckDBLoader
API 参考:DuckDBLoader
%%file example.csv
Team,Payroll
Nationals,81.34
Reds,82.20
Writing example.csv
loader = DuckDBLoader("SELECT * FROM read_csv_auto('example.csv')")

data = loader.load()
print(data)
[Document(page_content='Team: Nationals\nPayroll: 81.34', metadata={}), Document(page_content='Team: Reds\nPayroll: 82.2', metadata={})]

指定哪些列是内容与元数据 (Specifying Which Columns are Content vs Metadata)

loader = DuckDBLoader(
"SELECT * FROM read_csv_auto('example.csv')",
page_content_columns=["Team"],
metadata_columns=["Payroll"],
)

data = loader.load()
print(data)
[Document(page_content='Team: Nationals', metadata={'Payroll': 81.34}), Document(page_content='Team: Reds', metadata={'Payroll': 82.2})]

向元数据添加来源 (Adding Source to Metadata)

loader = DuckDBLoader(
"SELECT Team, Payroll, Team As source FROM read_csv_auto('example.csv')",
metadata_columns=["source"],
)

data = loader.load()
print(data)
[Document(page_content='Team: Nationals\nPayroll: 81.34\nsource: Nationals', metadata={'source': 'Nationals'}), Document(page_content='Team: Reds\nPayroll: 82.2\nsource: Reds', metadata={'source': 'Reds'})]

此页面是否有帮助? (Was this page helpful?)


您也可以留下详细的反馈 (You can also leave detailed feedback) 在 GitHub 上 (on GitHub).