Writer 文本分割器
本笔记本提供了 Writer 的文本分割器的快速入门概览。
Writer 的上下文感知分割端点为长文档(最长 4000 字)提供了智能文本分割功能。与简单的基于字符的分割不同,它保留了分块之间的语义和上下文,使其成为处理长篇内容并保持连贯性的理想选择。在 `langchain-writer` 中,我们提供了将 Writer 的上下文感知分割端点用作 LangChain 文本分割器的方法。
概述
集成详情
类别 | 包 | 本地 | 可序列化 | JS 支持 | 包下载量 | 最新包版本 |
---|---|---|---|---|---|---|
WriterTextSplitter | langchain-writer | ❌ | ❌ | ❌ |
设置
`WriterTextSplitter` 可以在 `langchain-writer` 包中找到
%pip install --quiet -U langchain-writer
凭证
注册Writer AI Studio以生成 API 密钥(您可以遵循此快速入门)。然后,设置 WRITER_API_KEY 环境变量
import getpass
import os
if not os.getenv("WRITER_API_KEY"):
os.environ["WRITER_API_KEY"] = getpass.getpass("Enter your Writer API key: ")
设置LangSmith以获得一流的可观察性也很有帮助(但并非必需)。如果您希望这样做,可以设置 `LANGSMITH_TRACING` 和 `LANGSMITH_API_KEY` 环境变量
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()
实例化
实例化 `WriterTextSplitter` 实例,并将 `strategy` 参数设置为以下之一
- `llm_split`:使用语言模型进行精确的语义分割
- `fast_split`:使用基于启发式的方法进行快速分割
- `hybrid_split`:结合两种方法
from langchain_writer.text_splitter import WriterTextSplitter
splitter = WriterTextSplitter(strategy="fast_split")
使用
`WriterTextSplitter` 可以同步或异步使用。
同步用法
要同步使用 `WriterTextSplitter`,请使用您要分割的文本调用 `split_text` 方法
text = """Reeeeeeeeeeeeeeeeeeeeeaally long text you want to divide into smaller chunks. For example you can add a poem multiple times:
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
"""
chunks = splitter.split_text(text)
chunks
您可以打印分块的长度,以查看创建了多少个分块
print(len(chunks))
异步使用
要异步使用 `WriterTextSplitter`,请使用您要分割的文本调用 `asplit_text` 方法
async_chunks = await splitter.asplit_text(text)
async_chunks
打印分块的长度,以查看创建了多少个分块
print(len(async_chunks))
API 参考
有关 `WriterTextSplitter` 所有功能和配置的详细文档,请参阅API 参考。
更多资源
您可以在 Writer 文档中找到有关 Writer 模型(包括成本、上下文窗口和支持的输入类型)和工具的信息。