跳到主要内容
Open In ColabOpen on GitHub

Writer Text Splitter

本笔记本提供了一个快速概览,帮助您开始使用 Writer 的 文本分割器

Writer 的 context-aware splitting endpoint(上下文感知分割端点) 为长文档(最多 4000 字)提供智能文本分割功能。与简单的基于字符的分割不同,它保留了块之间的语义含义和上下文,使其成为处理长篇内容同时保持连贯性的理想选择。在 langchain-writer 中,我们提供了 Writer 的上下文感知分割端点作为 LangChain 文本分割器的用法。

概述

集成详情

本地可序列化JS 支持包下载量最新包
WriterTextSplitterlangchain-writerPyPI - DownloadsPyPI - Version

设置

WriterTextSplitterlangchain-writer 包中可用

%pip install --quiet -U langchain-writer

凭证

注册 Writer AI Studio 以生成 API 密钥(您可以按照此 快速入门)。然后,设置 WRITER_API_KEY 环境变量

import getpass
import os

if not os.getenv("WRITER_API_KEY"):
os.environ["WRITER_API_KEY"] = getpass.getpass("Enter your Writer API key: ")

设置 LangSmith 以获得一流的可观测性也很有帮助(但不是必需的)。如果您希望这样做,可以设置 LANGSMITH_TRACINGLANGSMITH_API_KEY 环境变量

# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

实例化

实例化 WriterTextSplitter 的一个实例,并将 strategy 参数设置为以下选项之一

  • llm_split:使用语言模型进行精确的语义分割
  • fast_split:使用基于启发式的方法进行快速分割
  • hybrid_split:结合两种方法
from langchain_writer.text_splitter import WriterTextSplitter

splitter = WriterTextSplitter(strategy="fast_split")

用法

WriterTextSplitter 可以同步或异步使用。

同步用法

要同步使用 WriterTextSplitter,请使用您要分割的文本调用 split_text 方法

text = """Reeeeeeeeeeeeeeeeeeeeeaally long text you want to divide into smaller chunks. For example you can add a poem multiple times:
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
"""

chunks = splitter.split_text(text)
chunks

您可以打印块的长度,以查看创建了多少块

print(len(chunks))

异步用法

要异步使用 WriterTextSplitter,请使用您要分割的文本调用 asplit_text 方法

async_chunks = await splitter.asplit_text(text)
async_chunks

打印块的长度,以查看创建了多少块

print(len(async_chunks))

API 参考

有关所有 WriterTextSplitter 功能和配置的详细文档,请访问 API 参考

其他资源

您可以在 Writer 文档中找到有关 Writer 模型(包括成本、上下文窗口和支持的输入类型)和工具的信息。


本页是否对您有帮助?