
如何缓存 LLM 响应

LangChain 为 LLM 提供了一个可选的缓存层。 这有两个好处

如果你经常多次请求相同的完成,它可以减少你对 LLM 提供商的 API 调用次数,从而节省资金。 它可以通过减少你对 LLM 提供商的 API 调用次数来加快你的应用程序速度。

%pip install -qU langchain_openai langchain_community

import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass()
# Please manually enter OpenAI Key
from langchain_core.globals import set_llm_cache
from langchain_openai import OpenAI

# To make the caching really obvious, lets use a slower and older model.
# Caching supports newer chat models as well.
llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2)
API 参考:set_llm_cache | OpenAI
from langchain_core.caches import InMemoryCache


# The first time, it is not yet in cache, so it should take longer
llm.invoke("Tell me a joke")
API 参考:InMemoryCache
CPU times: user 546 ms, sys: 379 ms, total: 925 ms
Wall time: 1.11 s
"\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
# The second time it is, so it goes faster
llm.invoke("Tell me a joke")
CPU times: user 192 µs, sys: 77 µs, total: 269 µs
Wall time: 270 µs
"\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

SQLite 缓存

!rm .langchain.db
# We can do the same thing with a SQLite cache
from langchain_community.cache import SQLiteCache

API 参考:SQLiteCache
# The first time, it is not yet in cache, so it should take longer
llm.invoke("Tell me a joke")
CPU times: user 10.6 ms, sys: 4.21 ms, total: 14.8 ms
Wall time: 851 ms
"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
# The second time it is, so it goes faster
llm.invoke("Tell me a joke")
CPU times: user 59.7 ms, sys: 63.6 ms, total: 123 ms
Wall time: 134 ms
"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
