6
【千帆SDK+Semantic-Kernel】RAG知识检索增强实战
大模型开发/技术交流
- LLM
- API
- 大模型推理
2月23日12221看过
💡学习前小提示
请大家点击链接并加🌟:https://github.com/baidubce/bce-qianfan-sdk
通过SK实现RAG
对于需要外部知识支撑的场景,我们通常会使用RAG(Retrieval Augmented Generation)来实现,其中一般会涉及到文档解析,切片,向量化检索,通过LLM生成输出等步骤。 在Semantic ChatBot中我们基于semantic kernel实现了一个简单的chatbot,并使用
context variables
实现了历史聊天的存储和记录。 但是对于很长的外部知识库库,我们可能会因此需要超长的上下文,甚至无法没办法记录所有的历史,为此SK中提供了Memory类型以记录实现LLM的记忆能力。
! pip install semantic-kernel==0.4.5.dev0
初始化鉴权,导入SK,以及适配SK的Qianfan实现类型
import osos.environ["QIANFAN_ACCESS_KEY"] = "your_ak"os.environ["QIANFAN_SECRET_KEY"] = "your_sk"import semantic_kernel as skfrom qianfan.extensions.semantic_kernel import (QianfanChatCompletion,QianfanTextEmbedding,)
SK Memory
SK Memory 是一个数据框架,可以通过接入外部的各种数据源;可以是从网页,数据库,email等,这些都集成在了SK的内置connectors中,而通过
QianfanTextEmbedding
,可以提取这些数据源中的文本的特征向量,以供后续的检索使用。
这里使用了
VolatileMemoryStore
作为Memory的实现为例,VolatileMemoryStore
实现了内存的临时存储(底层通过一个Dict[Dict[str, MemoryRecord]] 实现分collection的kv存储)。
from semantic_kernel.memory import VolatileMemoryStorefrom semantic_kernel.core_skills import TextMemorySkillkernel = sk.Kernel()qf_chat_service = QianfanChatCompletion(ai_model_id="ERNIE-Bot")qf_text_embedding = QianfanTextEmbedding(ai_model_id="Embedding-V1")kernel.add_chat_service("chat-qf", qf_chat_service)kernel.add_text_embedding_generation_service("embed-eb", qf_text_embedding)kernel.register_memory_store(memory_store=VolatileMemoryStore())kernel.import_skill(TextMemorySkill())
调用异步函数,完成数据的添加,这里往了一个名为
aboutMe
的collection
中添加了若干个人信息
async def populate_memory(kernel: sk.Kernel) -> None:# Add some documents to the semantic memoryawait kernel.memory.save_information_async(collection="aboutMe", id="info1", text="我名字叫做小度")await kernel.memory.save_information_async(collection="aboutMe", id="info2", text="我工作在baidu")await kernel.memory.save_information_async(collection="aboutMe", id="info3", text="我来自中国")await kernel.memory.save_information_async(collection="aboutMe",id="info4",text="我曾去过北京,上海,深圳",)await kernel.memory.save_information_async(collection="aboutMe", id="info5", text="我爱打羽毛球")
通过
TextMemoryBase
中实现的余弦相似度计算向量相似度可以search到对应相似的回答:
async def search_memory_examples(kernel: sk.Kernel) -> None:questions = ["我的名字是?","我在哪里工作?","我去过哪些地方旅游?","我的家乡是?","我的爱好是?",]for question in questions:print(f"Question: {question}")result = await kernel.memory.search_async("aboutMe", question)print(f"Answer: {result[0].text}\n")await populate_memory(kernel)await search_memory_examples(kernel)
[INFO] [02-19 16:28:48] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1Question: 我的名字是?[INFO] [02-19 16:28:48] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1Answer: 我名字叫做小度Question: 我在哪里工作?[INFO] [02-19 16:28:49] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1Answer: 我工作在baiduQuestion: 我去过哪些地方旅游?[INFO] [02-19 16:28:49] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1Answer: 我曾去过北京,上海,深圳Question: 我的家乡是?[INFO] [02-19 16:28:50] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1Answer: 我来自中国Question: 我的爱好是?Answer: 我爱打羽毛球
如果结合对话场景
如何将外部知识库和对话系统进行融合? SK中提供了
TextMemorySkill
,其中包含了recall
function,可以获取一个input并在kernel的Memory之上执行相似度检索。
from typing import Tupleasync def setup_chat_with_memory(kernel: sk.Kernel,) -> Tuple[sk.SKFunctionBase, sk.SKContext]:from semantic_kernel.core_skills import TextMemorySkillsk_prompt = """你是一个问答机器人,你的背景资料如下,背景资料:- {{$fact1}}: {{recall $fact1}}- {{$fact2}}: {{recall $fact2}}- {{$fact3}}: {{recall $fact3}}- {{$fact4}}: {{recall $fact4}}- {{$fact5}}: {{recall $fact5}}聊天记录:{{$chat_history}}回答以下当前输入:: {{$user_input}}回答:""".strip()chat_func = kernel.create_semantic_function(sk_prompt, temperature=0.8)context = kernel.create_new_context()context["fact1"] = "名字是?"context["fact2"] = "哪里工作?"context["fact3"] = "去过哪些地方旅游?"context["fact4"] = "家乡是?"context["fact5"] = "爱好是?"context[sk.core_skills.TextMemorySkill.COLLECTION_PARAM] = "aboutMe"context[sk.core_skills.TextMemorySkill.RELEVANCE_PARAM] = "0.7"context["chat_history"] = ""return chat_func, context
其中
RelevanceParam
用于指定检索的阈值,COLLECTION_PARAM
用于指定collection名,sk_prompt中的recall 是TextMemorySkill
中的一个NativeFunction。相当于TextMemorySkill.search_async
async def chat(kernel: sk.Kernel, chat_func: sk.SKFunctionBase, context: sk.SKContext) -> bool:try:user_input = input("用户:> ")context["user_input"] = user_inputprint(f"User:> {user_input}")except KeyboardInterrupt:print("\n\nExiting chat...")return Falseexcept EOFError:print("\n\nExiting chat...")return Falseif user_input == "exit":print("\n\nExiting chat...")return Falseprint(context.variables)answer = await kernel.run_async(chat_func, input_vars=context.variables)context["chat_history"] += f"\n当前输入:> {user_input}\n回答:> {answer}\n"print(f"Bot:> {answer}")return Trueawait populate_memory(kernel)# print("开始提问")# await search_memory_examples(kernel)print("构建prompt...")chat_func, context = await setup_chat_with_memory(kernel)print("开始对话 (type 'exit' to exit):\n")chatting = Truewhile chatting:chatting = await chat(kernel, chat_func, context)
[INFO] [02-19 16:29:03] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1[INFO] [02-19 16:29:03] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1[INFO] [02-19 16:29:04] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1[INFO] [02-19 16:29:04] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1[INFO] [02-19 16:29:05] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1构建prompt...开始对话 (type 'exit' to exit):[INFO] [02-19 16:29:10] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1User:> 你的工作是什么我叫["\u6211\u540d\u5b57\u53eb\u505a\u5c0f\u5ea6"],欢迎使用我的服务。Bot:> 我是一名百度员工,负责回答用户的问题。User:> exitExiting chat...
添加外部links到Memory中
很多时候,我们有大量的外部知识库,接下来我们将使用SK的
VolatileMemoryStore
以用于加载外部链接: 例如我们添加千帆SDK的repo:
github_files = {"https://github.com/baidubce/bce-qianfan-sdk/blob/main/README.md": "README: 千帆SDK介绍,安装,基础使用方法","https://github.com/baidubce/bce-qianfan-sdk/blob/main/cookbook/finetune/trainer_finetune.ipynb": "Cookbook: 千帆SDK Trainer使用方法"}
https://github.com/baidubce/bce-qianfan-sdk/blob/main/cookbook/finetune/trainer_finetune.ipynb
https://github.com/baidubce/bce-qianfan-sdk/blob/main/README.md
与之前不同,这里通过
SaveReferenceAsync
将数据和引用来源reference 分开存储
memory_collection_name = "QianfanGithub"i = 0for entry, value in github_files.items():await kernel.memory.save_reference_async(collection=memory_collection_name,description=value,text=value,external_id=entry,external_source_name="GitHub",)i += 1print(" 已添加 {} saved".format(i))
ask = "我希望整体了解千帆SDK,有什么办法?"print("===========================\n" + "Query: " + ask + "\n")results = await kernel.memory.search_async(memory_collection_name, ask, limit=5, min_relevance_score=0.7)i = 0for res in results:i += 1print(f"Result {i}:")print(" URL: : " + res.id)print(" Title : " + res.description)print(" Relevance: " + str(res.relevance))print()
[INFO] [02-19 16:29:52] openapi_requestor.py:275 [t:8406866752]: async requesting llm api endpoint: /embeddings/embedding-v1===========================Query: 我希望整体了解千帆SDK,有什么办法?Result 1:URL: : https://github.com/baidubce/bce-qianfan-sdk/blob/main/README.mdTitle : README: 千帆SDK介绍,安装,基础使用方法Relevance: 0.7502846678234273
除了VolatileMemory之外,我们还可以通过对接外部向量库的形式实现大量的外部知识库,SK官方提供常用的例如
chroma
,pinecone
等实现。通过直接替换memory_store可以实现kernel和chroma的对接:
from semantic_kernel.connectors.memory.chroma import (ChromaMemoryStore,)kernel.register_memory_store(memory_store=ChromaMemoryStore(persist_directory="./"))
评论