跳到主要内容
Open In ColabOpen on GitHub
Open In Colab

UpTrain

UpTrain [github || website || docs] 是一个开源平台,用于评估和改进 LLM 应用程序。它为 20 多个预配置检查(涵盖语言、代码、嵌入用例)提供评分,对失败案例实例执行根本原因分析,并为解决这些问题提供指导。

UpTrain 回调处理程序

此 notebook 展示了 UpTrain 回调处理程序无缝集成到您的管道中,从而促进各种评估。我们选择了一些我们认为适合评估链的评估。这些评估自动运行,结果显示在输出中。有关 UpTrain 评估的更多详细信息,请参见此处

Langchain 中选定的检索器已突出显示以进行演示

1. Vanilla RAG:

RAG 在检索上下文和生成响应方面起着至关重要的作用。为了确保其性能和响应质量,我们进行以下评估

2. 多查询生成:

MultiQueryRetriever 创建具有与原始问题相似含义的多个问题变体。考虑到复杂性,我们包括了之前的评估并添加了

3. 上下文压缩和重新排序:

重新排序涉及根据与查询的相关性对节点进行重新排序并选择前 n 个节点。由于一旦重新排序完成,节点数量可能会减少,因此我们执行以下评估

  • 上下文重新排序:检查重新排序的节点的顺序是否比原始顺序更与查询相关。
  • 上下文简洁性:检查减少的节点数量是否仍然提供所有必要的信息。

这些评估共同确保了链中 RAG、MultiQueryRetriever 和重新排序过程的稳健性和有效性。

安装依赖项

%pip install -qU langchain langchain_openai langchain-community uptrain faiss-cpu flashrank
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
``````output
WARNING: There was an error checking the latest version of pip.
Note: you may need to restart the kernel to use updated packages.

注意:如果您想使用该库的 GPU 启用版本,您也可以安装 faiss-gpu 而不是 faiss-cpu

导入库

from getpass import getpass

from langchain.chains import RetrievalQA
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.callbacks.uptrain_callback import UpTrainCallbackHandler
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import (
RecursiveCharacterTextSplitter,
)

加载文档

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()

将文档拆分为块

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
chunks = text_splitter.split_documents(documents)

创建检索器

embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever()

定义 LLM

llm = ChatOpenAI(temperature=0, model="gpt-4")

设置

UpTrain 为您提供

  1. 具有高级向下钻取和过滤选项的仪表板
  2. 失败案例的见解和常见主题
  3. 生产数据的可观察性和实时监控
  4. 通过与 CI/CD 管道的无缝集成进行回归测试

您可以选择以下选项之一来使用 UpTrain 进行评估

1. UpTrain 的开源软件 (OSS):

您可以使用开源评估服务来评估您的模型。在这种情况下,您将需要提供 OpenAI API 密钥。UpTrain 使用 GPT 模型来评估 LLM 生成的响应。您可以在此处获取您的密钥。

为了在 UpTrain 仪表板中查看您的评估,您需要通过在终端中运行以下命令来设置它

git clone https://github.com/uptrain-ai/uptrain
cd uptrain
bash run_uptrain.sh

这将在您的本地计算机上启动 UpTrain 仪表板。您可以通过 http://localhost:3000/dashboard 访问它。

参数

  • key_type="openai"
  • api_key="OPENAI_API_KEY"
  • project_name="PROJECT_NAME"

2. UpTrain 托管服务和仪表板:

或者,您可以使用 UpTrain 的托管服务来评估您的模型。您可以在此处创建一个免费的 UpTrain 帐户并获得免费试用积分。如果您想要更多试用积分,请在此处与 UpTrain 的维护人员预约通话

使用托管服务的好处是

  1. 无需在本地计算机上设置 UpTrain 仪表板。
  2. 无需 API 密钥即可访问许多 LLM。

执行评估后,您可以在 https://dashboard.uptrain.ai/dashboard 的 UpTrain 仪表板中查看它们

参数

  • key_type="uptrain"
  • api_key="UPTRAIN_API_KEY"
  • project_name="PROJECT_NAME"

注意: project_name 将是项目名称,在该项目名称下,执行的评估将显示在 UpTrain 仪表板中。

设置 API 密钥

notebook 将提示您输入 API 密钥。您可以通过更改下面单元格中的 key_type 参数来选择 OpenAI API 密钥或 UpTrain API 密钥。

KEY_TYPE = "openai"  # or "uptrain"
API_KEY = getpass()

1. Vanilla RAG

UpTrain 回调处理程序将在生成后自动捕获查询、上下文和响应,并将对响应运行以下三个评估(评分从 0 到 1)

# Create the RAG prompt
template = """Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)

# Create the chain
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| rag_prompt_text
| llm
| StrOutputParser()
)

# Create the uptrain callback handler
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# Invoke the chain with a query
query = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(query, config=config)
2024-04-17 17:03:44.969 | INFO     | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:04:05.809 | INFO  | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!
``````output

Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that she is a former top litigator in private practice, a former federal public defender, and comes from a family of public school educators and police officers. He described her as a consensus builder and noted that since her nomination, she has received a broad range of support from various groups, including the Fraternal Order of Police and former judges appointed by both Democrats and Republicans.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

2. 多查询生成

MultiQueryRetriever 用于解决 RAG 管道可能无法根据查询返回最佳文档集的问题。它生成与原始查询含义相同的多个查询,然后为每个查询获取文档。

为了评估此检索器,UpTrain 将运行以下评估

# Create the retriever
multi_query_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)

# Create the uptrain callback
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# Create the RAG prompt
template = """Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)

chain = (
{"context": multi_query_retriever, "question": RunnablePassthrough()}
| rag_prompt_text
| llm
| StrOutputParser()
)

# Invoke the chain with a query
question = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(question, config=config)
2024-04-17 17:04:10.675 | INFO     | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:04:16.804 | INFO  | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!
``````output

Question: What did the president say about Ketanji Brown Jackson
Multi Queries:
- How did the president comment on Ketanji Brown Jackson?
- What were the president's remarks regarding Ketanji Brown Jackson?
- What statements has the president made about Ketanji Brown Jackson?

Multi Query Accuracy Score: 0.5
``````output
2024-04-17 17:04:22.027 | INFO  | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:04:44.033 | INFO  | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!
``````output

Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that since her nomination, she has received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

3. 上下文压缩和重新排序

重新排序过程涉及根据与查询的相关性对节点进行重新排序并选择前 n 个节点。由于一旦重新排序完成,节点数量可能会减少,因此我们执行以下评估

  • 上下文重新排序:检查重新排序的节点的顺序是否比原始顺序更与查询相关。
  • 上下文简洁性:检查减少的节点数量是否仍然提供所有必要的信息。
# Create the retriever
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)

# Create the chain
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)

# Create the uptrain callback
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# Invoke the chain with a query
query = "What did the president say about Ketanji Brown Jackson"
result = chain.invoke(query, config=config)
2024-04-17 17:04:46.462 | INFO     | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:04:53.561 | INFO  | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!
``````output

Question: What did the president say about Ketanji Brown Jackson

Context Conciseness Score: 0.0
Context Reranking Score: 1.0
``````output
2024-04-17 17:04:56.947 | INFO  | uptrain.framework.evalllm:evaluate_on_server:378 - Sending evaluation request for rows 0 to <50 to the Uptrain
2024-04-17 17:05:16.551 | INFO  | uptrain.framework.evalllm:evaluate:367 - Local server not running, start the server to log data and visualize in the dashboard!
``````output

Question: What did the president say about Ketanji Brown Jackson
Response: The President mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5

UpTrain 的仪表板和洞察

这是一个简短的视频,展示了仪表板和洞察

langchain_uptrain.gif


此页面对您有帮助吗?