>头
In this story we are going to dive in LangChain’s LLM链 class. 根据 LangChain的文档 LLM链 允许定义提示模板,然后将键值对列表发送到提示模板,以便大型语言模型进行处理.
You would typically use LLM链 using a workflow like this one:
These are the steps:
key_value_list = [ {'内容': "Across the UK, 化石燃料公司没有兑现承诺,留下了伤痕累累、污染严重的景观, and no one held accountable", 标题:“威尔士露天采矿的有毒遗产”}, {'内容': "解决方案不是更多字段,而是更好, 更紧凑, cruelty-free and pollution-free factories", 'title': "‘Far最小值g good, factory bad’, we think"},]
In order to better understand how you can use LLM链我们创建了一个简单的脚本,它将多个提示模板应用于 TheGuardian的 (a british newspaper) RSS Feeds. RSS提要包含基于文本的记录列表,这些记录可以转换为键值列表.
The purpose of this script is to get the sentiment, 一组文章的关键词,并根据预定义的一组类别对文章进行分类.
然后,脚本计算在专栏作家的RSS提要中找到的情绪、类别和关键词:
The model we used was “gpt-3.5-turbo”
The script can be viewed on Github:
http://github.com/gilfernandes/llm_chain_out/blob/main/langchain_llm_chain_extract.py
命令行脚本执行如下操作:
对于sys中的url.argv [1]: process_url (url)def process_url (url): """ Extracts the 内容 of each RSS Feed. 将每个RSS提要的内容发送到LLMChain,以便将提示应用于提取的记录. 为每个RSS提要创建一个数据集,该数据集结合LLM的输出并从中生成HTML和Excel文件. :param url: the URL of the RSS feed, like e.g: http://www.theguardian.com/profile/georgemonbiot/rss """ print(f"Processing {url}") zipped_results = [] llm_responses = [] input_list = extract_rss(url) for prompt_template in prompt_templates: llm_responses.append(process_llm(input_list, prompt_template)) sentiment_counter = Counter() categories_counter = Counter() for zipped in zip(input_list, *llm_responses): sentiment = {'sentiment': zipped[1]['text']} Categorized_sentiment = categorize_sentiment(已压缩[1]['text']) sentiment_counter[categorized_sentiment] += 1 Sentiment_category = {' Sentiment_category ': categorized_sentiment} keywords = {'keywords': zipped[2]['text']} raw_categories = zipped[3]['text'] classification = {'classification': raw_categories} Sanitized_topics = sanitize_categories(raw_categories) categories_counter.update(消毒_topics) 消毒_categories = {'topics': ",".join(消毒_topics)} Full_record = { * *压缩[0], * *情绪, * *关键字, **sentiment_category, * *分类, **消毒_categories } zipped_results.append(full_record) Result_df = pd.DataFrame(zipped_results) Title = url.replace(":", "_").取代 ("/", "_") Serialize_results (url, result_df, title, sentiment_counter, categores_counter)
def extract_rss(url): """ Extracts the 内容 and title from a URL. :param url The RSS feed URL, like e.g: http://www.theguardian.com/pro文件/georgemonbiot/rs """ response = requests.得到(url) tree = ElementTree.fromstring(response.内容) 内容= [] for child in tree: 如果孩子.标签== 'channel': for channel_child in child: 如果channel_child.标签== 'item': 内容.append({'内容': channel_child[2].text, 'title': channel_child[0].文本}) 返回内容
prompt_templates = [( 请告诉我{内容}的情感与这个标题:{标题}? 它是非常积极、积极、非常消极、消极还是中性? " 请用这些表达来回答:“非常肯定”, “积极”, “非常消极的”, 'negative' or 'neutral'"), 请从标题{内容}中提取最相关的关键词:{标题}. 在关键字列表前使用“Keywords:”前缀.", “请使用以下内容{内容}对以下内容进行分类,标题{标题}使用这些类别:”+“,".join(accepted_categories)]模型= 'gpt-3.5-turbo”def process_llm(input_list: list, prompt_template): """ Creates the LLMChain object using a specific model :param input_list一个包含每篇文章的内容和标题的字典列表 :param prompt_template包含内容和标题参数的单个提示模板 """ llm = ChatOpenAI(temperature=0, model=model) # You can also use another model. text-davinci-003 is more expensive than gpt-3.5-turbo # llm = OpenAI(temperature=0, model='text-davinci-003') llm_chain = LLMChain( llm = llm, prompt=PromptTemplate.from_template(prompt_template) ) 返回llm_chain.apply(input_list)
def sanitize_categories(text): 文本=文本.低() 消毒= [] for cat in accepted_categories: 如果在文本中: 消毒.追加(cat) 返回消毒def sanitize_keywords(text): 文本=文本.低() 文本=文本.replace("keywords:", "").带() 消毒= [re].子(r \.$", "", s.带()) for s in text.分裂(",")) 返回消毒def categorize_sentiment(text): 文本=文本.低() if “非常消极的” in text: return “非常消极的” elif 'negative' in text: 返回“负面” elif 'very positive' in text: return 'very positive' elif “积极” in text: 返回"正面" 返回“中性”
Def serialize_results(url, result_df, title, sentiment_counter, categores_counter): """ Converts the results to an Excel sheet or HTML page. The HTML page also contains the counter information. :param url The RSS feed URL :param result_df原始数据与LLM输出的组合 :param title RSS提要URL,其中包含一些修改过的字符 :param sentiment_counter包含情绪信息的计数器 :param categores_counter统计类别的计数器 """ result_df.to_excel(target_folder/f"{标题}.xlsx”) html_文件 = target_folder/f"{标题}.html” html_内容 = result_df.to_html(escape=False) # Make sure the 文件 is written in UTF-8 使用open(html_文件, "w", encoding="utf-8")作为文件: 文件.write(html_内容) sentiment_html = generate_sentiment_table(sentiment_counter, "Sentiment") categorories_html = generate_sentiment_table(categorories_counter, "Category") with open(html_文件, encoding="utf8") as f: 内容 = f""" {re.子(r '.+?theguardian.com/pro文件', '', url)} Sentiment Count {sentiment_html} Categories Count {categories_html} {f.read ()} DIY硬件论坛 新葡京博彩官网 Puck-break-support@ruiled.net Buying-platform-billing@shirokuma-house.net 乐天使 MGM-Mirage-media@s-wieno.com 聚利科技 pp电子 电子试玩 Crown-Sports-contactus@sdgzsx.net bck体育 二色商城 Hockey-Breakthrough-Deluxe-Edition-marketing@ruiled.net 神兵传奇官方网站 Optimist-billing@carlosfrancisco.net 肇庆学院招生办 请看小说网 大众口腔 New-Portugal-new-Beijing-careers@lin-koln.com E人E本官网 温州科技职业学院 济宁一中 西安中国国际旅行社 美康生物 运城百姓网 陕西中公金融人 大眼仔旭 泰州新闻网 弹弹123 票务之星 站点地图 玉林天气预报 《天天炫斗》官方网站 中国兵器装备集团公司 长沙欣欣旅游网 """ 内容=内容.替换('class="dataframe"', 'class="table -striped -hover dataframe"') with open(html_文件, "w", encoding="utf8") as f: f.写(内容)
以下是脚本输出示例,用于以下专栏:
LangChain’s LLM链 提供了一种非常方便的与LLM交互的方式,当您有一个基于列表的输入,您希望对其应用带有参数的预定义LLM提示符时.
Gil Fernandes, Onepoint Consulting
在这里注册