【LangChain】概要(Summarization)

LangChain学习文档

  • 流行
    • 【LangChain】Retrieval QA
    • 【LangChain】对话式问答(Conversational Retrieval QA)
    • 【LangChain】SQL
    • 【LangChain】概要(Summarization)

概要

summarization chain可用于汇总多个文档。一种方法是输入多个较小的文档,将它们分为块后,并使用 MapReduceDocumentsChain 对它们进行操作。还可以选择将进行汇总的链改为 StuffDocumentsChainRefineDocumentsChain

准备数据

首先我们准备数据。在此示例中,我们从一个长文档创建多个文档,但可以以任何方式获取这些文档(本笔记本的重点是突出显示获取文档后要执行的操作)。

from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
# 大模型
llm = OpenAI(temperature=0)
# 初始化拆分器
text_splitter = CharacterTextSplitter()
# 加载长文本
with open("../../state_of_the_union.txt") as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)

from langchain.docstore.document import Document
# 将拆分后的文本转成文档
docs = [Document(page_content=t) for t in texts[:3]]

快速开始

如果您只是想尽快开始,建议采用以下方法:

from langchain.chains.summarize import load_summarize_chain
# 注意这里是load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)

结果:

'问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

如果您想更好地控制和了解正在发生的事情,请参阅以下信息。

stuff Chain

本节展示使用 stuff Chain 进行汇总的结果。

chain = load_summarize_chain(llm, chain_type="stuff")
chain.run(docs)

结果:

    ' 问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

自定义提示(Custom Prompts

您还可以在该链上使用您自己的提示。在此示例中,我们将用意大利语回复。

prompt_template = """Write a concise summary of the following:

{text}

CONCISE SUMMARY IN ITALIAN:"""
# 上面的prompt是要用意大利语做摘要
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
# summarize_chain
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)
chain.run(docs)

结果:就不打印了。

map_reduce Chain

本节展示使用map_reduce Chain进行汇总的结果。

chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)
    ' 问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

中间步骤

如果我们想检查它们,我们还可以返回 map_reduce 链的中间步骤。这是通过 return_map_steps 变量完成的。

chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True)

chain({"input_documents": docs}, return_only_outputs=True)

结果:

    {'map_steps': [" In response to Russia's aggression in Ukraine, the United States has united with other freedom-loving nations to impose economic sanctions and hold Putin accountable. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs and seize their ill-gotten gains.",
      ' The United States and its European allies are taking action to punish Russia for its invasion of Ukraine, including seizing assets, closing off airspace, and providing economic and military assistance to Ukraine. The US is also mobilizing forces to protect NATO countries and has released 30 million barrels of oil from its Strategic Petroleum Reserve to help blunt gas prices. The world is uniting in support of Ukraine and democracy, and the US stands with its Ukrainian-American citizens.',
      " President Biden and Vice President Harris ran for office with a new economic vision for America, and have since passed the American Rescue Plan and the Bipartisan Infrastructure Law to help struggling families and rebuild America's infrastructure. This includes creating jobs, modernizing roads, airports, ports, and waterways, replacing lead pipes, providing affordable high-speed internet, and investing in American products to support American jobs."],
     'output_text': " In response to Russia's aggression in Ukraine, the United States and its allies have imposed economic sanctions and are taking other measures to hold Putin accountable. The US is also providing economic and military assistance to Ukraine, protecting NATO countries, and passing legislation to help struggling families and rebuild America's infrastructure. The world is uniting in support of Ukraine and democracy, and the US stands with its Ukrainian-American citizens."}

自定义提示

您还可以在该链上使用您自己的prompt。在此示例中,我们将用意大利语回复。

# 该prompt说:要用意大利语做摘要
prompt_template = """Write a concise summary of the following:

{text}

CONCISE SUMMARY IN ITALIAN:"""
# 创建prompt的模板
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
chain({"input_documents": docs}, return_only_outputs=True)

自定义MapReduceChain

多输入提示

您还可以使用多输入提示。在此示例中,我们将使用 MapReduce 链来回答有关我们代码的特定问题。

from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
# 第一个prompt
map_template_string = """Give the following python code information, generate a description that explains what the code does and also mention the time complexity.
Code:
{code}

Return the the description in the following format:
name of the function: description of the function
"""

# 第二个prompt
reduce_template_string = """Given the following python function names and descriptions, answer the following question
{code_description}
Question: {question}
Answer:
"""
# 第一个prompt模板
MAP_PROMPT = PromptTemplate(input_variables=["code"], template=map_template_string)
# 第二个prompt模板
REDUCE_PROMPT = PromptTemplate(input_variables=["code_description", "question"], template=reduce_template_string)
# 大模型
llm = OpenAI()
# map 链
map_llm_chain = LLMChain(llm=llm, prompt=MAP_PROMPT)
#reduce 链
reduce_llm_chain = LLMChain(llm=llm, prompt=REDUCE_PROMPT)

generative_result_reduce_chain = StuffDocumentsChain(
    llm_chain=reduce_llm_chain,
    document_variable_name="code_description",
)

combine_documents = MapReduceDocumentsChain(
    llm_chain=map_llm_chain,
    combine_document_chain=generative_result_reduce_chain,
    document_variable_name="code",
)

map_reduce = MapReduceChain(
    combine_documents_chain=combine_documents,
    text_splitter=CharacterTextSplitter(separator="\n##\n", chunk_size=100, chunk_overlap=0),
)

代码片段为:

code = """
def bubblesort(list):
   for iter_num in range(len(list)-1,0,-1):
      for idx in range(iter_num):
         if list[idx]>list[idx+1]:
            temp = list[idx]
            list[idx] = list[idx+1]
            list[idx+1] = temp
    return list
##
def insertion_sort(InputList):
   for i in range(1, len(InputList)):
      j = i-1
      nxt_element = InputList[i]
   while (InputList[j] > nxt_element) and (j >= 0):
      InputList[j+1] = InputList[j]
      j=j-1
   InputList[j+1] = nxt_element
   return InputList
##
def shellSort(input_list):
   gap = len(input_list) // 2
   while gap > 0:
      for i in range(gap, len(input_list)):
         temp = input_list[i]
         j = i
   while j >= gap and input_list[j - gap] > temp:
      input_list[j] = input_list[j - gap]
      j = j-gap
      input_list[j] = temp
   gap = gap//2
   return input_list

"""
# 哪个函数的时间复杂度更好
map_reduce.run(input_text=code, question="Which function has a better time complexity?")

结果:

    Created a chunk of size 247, which is longer than the specified 100
    Created a chunk of size 267, which is longer than the specified 100

    'shellSort has a better time complexity than both bubblesort and insertion_sort, as it has a time complexity of O(n^2), while the other two have a time complexity of O(n^2).'

refine(提炼) Chain

本节显示使用refine链进行汇总的结果。

chain = load_summarize_chain(llm, chain_type="refine")

chain.run(docs)

结果:

问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。

中间步骤

如果我们想要检查它们,我们还可以返回refine链的中间步骤。这是通过 return_refine_steps 变量完成的。

# 注意这里指定参数
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True)

chain({"input_documents": docs}, return_only_outputs=True)
# 结果
'问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

自定义prompt

您还可以在该链上使用您自己的提示。在此示例中,我们将用意大利语回复。

prompt_template = """写出以下内容的简洁摘要:

{text}

意大利语简洁摘要:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
refine_template = (
    "你的工作是编写最终摘要\n"
    "我们已经提供了一定程度的现有摘要: {existing_answer}\n"
    "我们有机会完善现有的摘要"
    "(only if needed) 下面有更多背景信息.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "鉴于新的背景,完善意大利语的原始摘要"
    "如果上下文没有用,则返回原始摘要。"
)
refine_prompt = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=refine_template,
)
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True, question_prompt=PROMPT, refine_prompt=refine_prompt)
chain({"input_documents": docs}, return_only_outputs=True)

结果:

    {'intermediate_steps': ["\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia e bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi.",
      "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare,",
      "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare."],
     'output_text': "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare."}

参考地址:

https://python.langchain.com/docs/modules/chains/popular/summarize

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/41153.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Flutter——最详细(NavigationRail)使用教程

NavigationRail 简介 一个 Material Design 小部件,旨在显示在应用程序的左侧或右侧,以便在少量视图(通常在三到五个视图之间)之间导航。 使用场景: 通过Row属性,左侧或右侧菜单栏按钮 属性作用onDestinati…

Excel之VLOOKUP()函数介绍

Excel之VLOOKUP()函数介绍 Excel的VLOOKUP函数语法: VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]) 参数说明: lookup_value:要查找的值或要比较的值。 table_array:包含要在其中进行查找的数据表的区…

[ 容器 ] Docker 基本管理

目录 一、Docker 概述1.1 Docker 是什么?1.2 Docker 的宗旨1.3 容器的优点1.4 Docker 与 虚拟机的区别1.5 容器在内核中支持的两种技术namespace的六大类型 二、Docker核心概念2.1 镜像2.2 容器2.3 仓库 三、安装 Docker四、docker 镜像操作五、 Docker 容器操作总结…

如何录音转文字:探寻声音的文字之舞

随着科技的飞速进步,人们对于信息的传递和记录变得越发便捷。在这个数字化时代,录音转文字技术无疑是一颗璀璨的明珠,它让声音和文字在交织中跳跃,为我们带来了新的感知和体验。在这篇文章中,我们将深入探讨录音转文字…

Python实现word简历中图片模糊

Python实现word简历中照片模糊——保护个人隐私的有效方法 一、引言背景 在现代招聘流程中,电子简历成为了主要的招聘方式之一。然而,简历中包含的个人信息往往涉及隐私问题,特别是照片。为了保护求职者的个人隐私和数据安全,许多…

Stable Diffusion生成图片参数查看与抹除

前几天分享了几张Stable Diffusion生成的艺术二维码,有同学反映不知道怎么查看图片的参数信息,还有的同学问怎么保护自己的图片生成参数不会泄露,这篇文章就来专门分享如何查看和抹除图片的参数。 查看图片的生成参数 1、打开Stable Diffus…

【密码学】一、概述

概述 1、密码学的发展历史1.1 古代密码时代1.2 机械密码时代1.3 信息密码时代1.4 现代密码时代 2、密码学的基本概念3、密码学的基本属性4、密码体制分类4.1 对称密码体制4.2 非对称加密体制 5、密码分析 1、密码学的发展历史 起因:保密通信和身份认证问题 根据时间…

Twisted Circuit

题目描述 输入格式 The input consists of four lines, each line containing a single digit 0 or 1. 输出格式 Output a single digit, 0 or 1. 题意翻译 读入四个整数 00 或者 11,作为如图所示的电路图的输入。请输出按照电路图运算后的结果。 感谢PC_DOS …

推荐一款在win、mac、android之间传递文件或消息的软件,LocalSend,前提需要在同一网络下

官方地址 https://github.com/localsend/localsend/releases/download/v1.10.0/LocalSend-1.10.0.dmg 可选择不同的设备进行发送接收,超级好用

etcd实现大规模服务治理应用实战

导读:服务治理目前越来越被企业建设所重视,特别现在云原生,微服务等各种技术被更多的企业所应用,本文内容是百度小程序团队基于大模型服务治理实战经验的一些总结,同时结合当前较火的分布式开源kv产品etcd,…

hybridCLR热更遇到问题

报错1: No ‘git‘ executable was found. Please install Git on your system then restart 下载Git安装: Git - Downloading Package 配置:https://blog.csdn.net/baidu_38246836/article/details/106812067 重启电脑 unity:…

docker容器引擎(一)

docker 一、docker的理论部分docker的概述容器受欢迎的原因容器与虚拟机的区别docker核心概念 二、安装docker三、docker镜像操作四、docker容器操作 一、docker的理论部分 docker的概述 一个开源的应用容器引擎,基于go语言开发并遵循了apache2.0协议开源再Linux容…

UML 图

统一建模语言(Unified Modeling Language,UML)是用来设计软件的可视化建模语言。它的特点是简单、统一、图形化、能表达软件设计中的动态与静态信息。 UML 从目标系统的不同角度出发,定义了用例图、类图、对象图、状态图、活动图…

力扣 406. 根据身高重建队列

题目来源:https://leetcode.cn/problems/queue-reconstruction-by-height/description/ C题解1:分别对h和k两个维度进行考虑,我这里是优先考虑k值,k值相同的时候h小的排前面。然后再一一遍历,对于people[i]&#xff0c…

如何自学网络安全(黑客)

自学网络安全(黑客)需要掌握一系列的技能和知识,以下是一些学习网络安全的步骤: 基础知识:首先,你需要对计算机网络和操作系统有基本的了解。学习计算机网络的基本原理、网络协议和网络安全的基本概念。同时…

基于timegan扩增技术,进行多维度数据扩增(Python编程,数据集为瓦斯浓度气体数据集)

1.数据集介绍 瓦斯是被预测气体,其它列为特征列,原始数据一共有472行数据,因为原始数据比较少,所以要对原始数据(总共8列数据)进行扩增。 开始数据截图 截止数据截图 2. 文件夹介绍 lstm.py是对未扩增的数据进行训练…

C++基础算法前缀和和差分篇

📟作者主页:慢热的陕西人 🌴专栏链接:C算法 📣欢迎各位大佬👍点赞🔥关注🚓收藏,🍉留言 主要讲解了前缀和和差分算法 文章目录 Ⅳ. 前缀和 和 差分Ⅵ .Ⅰ前缀和…

更改el-select-dropdown_item selected选中颜色

更改el-select-dropdown_item selected选中颜色 默认为element主题色 在修改element select下拉框选中颜色时会发现不生效,原因是:el-select下拉框插入到了body中 解决办法: 在select标签里填写:popper-append-to-body"false"属性…

数据结构-单链表

#include<stdio.h> #include<stdlib.h>typedef struct Node {int data;struct Node* next; }Node;//创建一个头结点&#xff0c;数据域保存链表节点数 Node* init_single_list() {Node* node (Node*)malloc(sizeof(Node));node->next NULL;node->data 0; …

小黑子—JavaWeb:第一章 - JDBC

JavaWeb入门1.0 1. javaweb介绍2. 数据库设计2.1 约束2.2 表关系2.3 多表查询2.3.1 内连接&#xff08;连接查询&#xff09;2.3.2 外连接&#xff08;连接查询&#xff09;2.3.3 子查询 2.4 事务 3. JDBC3.1 JDBC 快速入门 4 JDBC API详解4.1 DriverManager4.2 Conncetion4.3 …
最新文章