跳到主要内容

如何从 LangSmith 数据集中选择示例

📦兼容性
本指南中的代码需要 langsmith>=0.1.101, langchain-core>=0.2.34。请确保你已安装正确的包。

LangSmith 数据集内置了相似度搜索支持,使其成为构建和查询少样本示例的绝佳工具。

在本指南中,我们将看到如何使用索引的 LangSmith 数据集作为少样本示例选择器。

设置

开始之前,请确保你已创建 LangSmith 帐户并设置你的凭据

import getpass
import os

if not os.environ.get("LANGSMITH_API_KEY"):
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Set LangSmith API key:\n\n")

os.environ["LANGSMITH_TRACING"] = "true"
Set LangSmith API key:

········

我们需要安装 langsmith SDK。在本示例中,我们还将使用 langchainlangchain-openailangchain-benchmarks

%pip install -qU "langsmith>=0.1.101" "langchain-core>=0.2.34" langchain langchain-openai langchain-benchmarks

现在,我们将克隆一个公共数据集,并为该数据集启用索引。我们还可以通过 LangSmith UI 启用索引。

我们将克隆多宇宙数学少样本示例数据集

这将启用对数据集的搜索,并确保我们每次更新/添加示例时,它们也会被索引。

from langsmith import Client as LangSmith

ls_client = LangSmith()

dataset_name = "multiverse-math-few-shot-examples-v2"
dataset_public_url = (
"https://smith.langchain.com/public/620596ee-570b-4d2b-8c8f-f828adbe5242/d"
)

ls_client.clone_public_dataset(dataset_public_url)

dataset_id = ls_client.read_dataset(dataset_name=dataset_name).id

ls_client.index_dataset(dataset_id=dataset_id)

查询数据集

索引可能需要几秒钟。数据集被索引后,我们可以搜索相似的示例。请注意,similar_examples 方法的输入必须与示例输入具有相同的模式。在这种情况下,我们的示例输入是一个包含 "question" 键的字典

examples = ls_client.similar_examples(
{"question": "whats the negation of the negation of the negation of 3"},
limit=3,
dataset_id=dataset_id,
)
len(examples)
3
examples[0].inputs["question"]
'evaluate the negation of -100'

对于此数据集,输出是以 OpenAI 消息格式提出的问题之后的对话

examples[0].outputs["conversation"]
[{'role': 'assistant',
'content': None,
'tool_calls': [{'id': 'toolu_01HTpq4cYNUac6F7omUc2Wz3',
'type': 'function',
'function': {'name': 'negate', 'arguments': '{"a": -100}'}}]},
{'role': 'tool',
'content': '-100.0',
'tool_call_id': 'toolu_01HTpq4cYNUac6F7omUc2Wz3'},
{'role': 'assistant', 'content': 'So the answer is 100.'},
{'role': 'user',
'content': '100 is incorrect. Please refer to the output of your tool call.'},
{'role': 'assistant',
'content': [{'text': "You're right, my previous answer was incorrect. Let me re-evaluate using the tool output:",
'type': 'text'}],
'tool_calls': [{'id': 'toolu_01XsJQboYghGDygQpPjJkeRq',
'type': 'function',
'function': {'name': 'negate', 'arguments': '{"a": -100}'}}]},
{'role': 'tool',
'content': '-100.0',
'tool_call_id': 'toolu_01XsJQboYghGDygQpPjJkeRq'},
{'role': 'assistant', 'content': 'The answer is -100.0'},
{'role': 'user',
'content': 'You have the correct numerical answer but are returning additional text. Please only respond with the numerical answer.'},
{'role': 'assistant', 'content': '-100.0'}]

创建动态少样本提示

搜索返回的示例,其输入与查询输入最相似。我们可以使用它来进行少样本提示模型,如下所示

from langchain.chat_models import init_chat_model
from langchain_benchmarks.tool_usage.tasks.multiverse_math import (
add,
cos,
divide,
log,
multiply,
negate,
pi,
power,
sin,
subtract,
)
from langchain_core.runnables import RunnableLambda
from langsmith import AsyncClient as AsyncLangSmith

async_ls_client = AsyncLangSmith()


def similar_examples(input_: dict) -> dict:
examples = ls_client.similar_examples(input_, limit=5, dataset_id=dataset_id)
return {**input_, "examples": examples}


async def asimilar_examples(input_: dict) -> dict:
examples = await async_ls_client.similar_examples(
input_, limit=5, dataset_id=dataset_id
)
return {**input_, "examples": examples}


def construct_prompt(input_: dict) -> list:
instructions = """You are great at using mathematical tools."""
examples = []
for ex in input_["examples"]:
examples.append({"role": "user", "content": ex.inputs["question"]})
for msg in ex.outputs["conversation"]:
if msg["role"] == "assistant":
msg["name"] = "example_assistant"
if msg["role"] == "user":
msg["name"] = "example_user"
examples.append(msg)
return [
{"role": "system", "content": instructions},
*examples,
{"role": "user", "content": input_["question"]},
]


tools = [add, cos, divide, log, multiply, negate, pi, power, sin, subtract]
llm = init_chat_model("gpt-4o-2024-08-06")
llm_with_tools = llm.bind_tools(tools)

example_selector = RunnableLambda(func=similar_examples, afunc=asimilar_examples)

chain = example_selector | construct_prompt | llm_with_tools
ai_msg = await chain.ainvoke({"question": "whats the negation of the negation of 3"})
ai_msg.tool_calls
[{'name': 'negate',
'args': {'a': 3},
'id': 'call_uMSdoTl6ehfHh5a6JQUb2NoZ',
'type': 'tool_call'}]

查看 LangSmith 追踪,我们可以看到相关示例在 similar_examples 步骤中被提取出来,并作为消息传递给 ChatOpenAI:https://smith.langchain.com/public/9585e30f-765a-4ed9-b964-2211420cd2f8/r/fdea98d6-e90f-49d4-ac22-dfd012e9e0d9


此页面是否对您有帮助?