的帖子

< / div > < / div >

< / div >

添加文章 < / div > < / div > < / div >

Exploring LangChain’s LLM链

#AI # python < / div > < / div >

< / div > < / div > < / div >

< / div >

查看配置文件 < / div >

查看更多文章 < / div > < / div >

< / div > < / div > < / div >

< / div >

In this story we are going to dive in LangChain’s LLM链 class. 根据 LangChain的文档 LLM链允许定义提示模板，然后将键值对列表发送到提示模板，以便大型语言模型进行处理.

How LLM链 Works

You would typically use LLM链 using a workflow like this one:

These are the steps:

创建一个 LLM链 object with a specific model. This model can be either a chat (e.g. “gpt-3.5-turbo”) or a simple LLM (‘text-davinci-003’)
Define a prompt template, like e.g: “Please extract the most relevant keywords from {内容} 标题: {标题}. 在关键字列表前使用“Keywords:”前缀.”
使用提示模板中指定的键创建键值对列表，如e.g:

key_value_list = [
  {'内容': "Across the UK, 化石燃料公司没有兑现承诺，留下了伤痕累累、污染严重的景观, and no one held accountable", 
    标题:“威尔士露天采矿的有毒遗产”}，
  {'内容': "解决方案不是更多字段，而是更好, 更紧凑, cruelty-free and pollution-free factories", 
    'title': "‘Far最小值g good, factory bad’, we think"},
]

The list of key value pairs is applied to the LLM链对象，该对象在内部向LLM发送请求并检索结果.

Using LLM链 with TheGuardian RSS Feeds

In order to better understand how you can use LLM链我们创建了一个简单的脚本，它将多个提示模板应用于 TheGuardian的 (a british newspaper) RSS Feeds. RSS提要包含基于文本的记录列表，这些记录可以转换为键值列表.

The purpose of this script is to get the sentiment, 一组文章的关键词，并根据预定义的一组类别对文章进行分类.

然后，脚本计算在专栏作家的RSS提要中找到的情绪、类别和关键词:

The model we used was “gpt-3.5-turbo”

The script can be viewed on Github:

http://github.com/gilfernandes/llm_chain_out/blob/main/langchain_llm_chain_extract.py

脚本操作

命令行脚本执行如下操作:

循环浏览包含卫报专栏作家RSS提要的url列表，比如e.g: http://www.theguardian.com/profile/georgemonbiot/rss or http://www.theguardian.com/profile/simonjenkins/rss

对于sys中的url.argv [1]:
    process_url (url)

def process_url (url):
    """
    Extracts the 内容 of each RSS Feed.
    将每个RSS提要的内容发送到LLMChain，以便将提示应用于提取的记录.
    为每个RSS提要创建一个数据集，该数据集结合LLM的输出并从中生成HTML和Excel文件.
    :param url: the URL of the RSS feed, like e.g: http://www.theguardian.com/profile/georgemonbiot/rss
    """
    print(f"Processing {url}")
    zipped_results = []
    llm_responses = []
    input_list = extract_rss(url)
    for prompt_template in prompt_templates:
        llm_responses.append(process_llm(input_list, prompt_template))
    sentiment_counter = Counter()
    categories_counter = Counter()
    for zipped in zip(input_list, *llm_responses):
        sentiment = {'sentiment': zipped[1]['text']}
        Categorized_sentiment = categorize_sentiment(已压缩[1]['text'])
        sentiment_counter[categorized_sentiment] += 1
        Sentiment_category = {' Sentiment_category ': categorized_sentiment}
        keywords = {'keywords': zipped[2]['text']}
        raw_categories = zipped[3]['text']
        classification = {'classification': raw_categories}
        Sanitized_topics = sanitize_categories(raw_categories)
        categories_counter.update(消毒_topics)
        消毒_categories = {'topics': ",".join(消毒_topics)}
        Full_record = {
            * *压缩[0], 
            * *情绪, 
            * *关键字, 
            **sentiment_category, 
            * *分类,
            **消毒_categories
        }
        zipped_results.append(full_record)
    Result_df = pd.DataFrame(zipped_results)
    Title = url.replace(":", "_").取代 ("/", "_")
    Serialize_results (url, result_df, title, sentiment_counter, categores_counter)

它为每个URL提取多篇文章(通常为20篇). 对于每篇文章，只提取内容和标题.

def extract_rss(url):
    """
    Extracts the 内容 and title from a URL.
    :param url The RSS feed URL, like e.g: http://www.theguardian.com/pro文件/georgemonbiot/rs
    """
    response = requests.得到(url)
    tree = ElementTree.fromstring(response.内容)
    内容= []
    for child in tree:
        如果孩子.标签== 'channel':
            for channel_child in child:
                如果channel_child.标签== 'item':
                    内容.append({'内容': channel_child[2].text, 'title': channel_child[0].文本})
    返回内容

然后，对于每篇文章，脚本循环执行一系列提示:

“Please tell me the sentiment of {内容} with this 标题: {标题}? 它是非常积极、积极、非常消极、消极还是中性? 请用这些表达来回答:, “积极”, “非常消极的”, ‘negative’ or ‘neutral’”
“Please extract the most relevant keywords from {内容} 标题: {标题}. 在关键字列表前使用“Keywords:”前缀.”
“请使用以下内容对以下内容进行分类 {内容} 与标题 {标题} using these categories: ‘politics’, “环境”, “社会”, “体育”, “生活方式”, “技术”, “艺术”

prompt_templates = [(
    请告诉我{内容}的情感与这个标题:{标题}? 它是非常积极、积极、非常消极、消极还是中性? " 
    请用这些表达来回答:“非常肯定”, “积极”, “非常消极的”, 'negative' or 'neutral'"),
    请从标题{内容}中提取最相关的关键词:{标题}. 在关键字列表前使用“Keywords:”前缀.",
    “请使用以下内容{内容}对以下内容进行分类，标题{标题}使用这些类别:”+“,".join(accepted_categories)
]

模型= 'gpt-3.5-turbo”

def process_llm(input_list: list, prompt_template):
    """
    Creates the LLMChain object using a specific model
    :param input_list一个包含每篇文章的内容和标题的字典列表
    :param prompt_template包含内容和标题参数的单个提示模板
    """
    llm = ChatOpenAI(temperature=0, model=model)
    # You can also use another model. text-davinci-003 is more expensive than gpt-3.5-turbo
    # llm = OpenAI(temperature=0, model='text-davinci-003')
    llm_chain = LLMChain(
        llm = llm,
        prompt=PromptTemplate.from_template(prompt_template)
    )
    返回llm_chain.apply(input_list)

在这个循环之后，我们有一个全球最大的博彩平台情绪的回应, 另一个是关键词，另一个是每篇文章的类别. 我们对LLM输出进行消毒，以便能够计算情感、类别和关键字.


def sanitize_categories(text):
    文本=文本.低()
    消毒= []
    for cat in accepted_categories:
        如果在文本中:
            消毒.追加(cat)
    返回消毒


def sanitize_keywords(text):
    文本=文本.低()
    文本=文本.replace("keywords:", "").带()
    消毒= [re].子(r \.$", "", s.带()) for s in text.分裂(","))
    返回消毒


def categorize_sentiment(text):
    文本=文本.低()
    if “非常消极的” in text:
        return “非常消极的”
    elif 'negative' in text:
        返回“负面”
    elif 'very positive' in text:
        return 'very positive'
    elif “积极” in text:
        返回"正面"
    返回“中性”

我们使用一些计数器来计算每个作者的情绪和类别(参见 process_url 上面的代码)
最后，脚本生成一个带有情绪计数的Excel文件和一个HTML文件, 类别计数和所有提示回复的每篇文章

Def serialize_results(url, result_df, title, sentiment_counter, categores_counter):
    """
    Converts the results to an Excel sheet or HTML page. The HTML page also contains the counter information.
    :param url The RSS feed URL
    :param result_df原始数据与LLM输出的组合
    :param title RSS提要URL，其中包含一些修改过的字符
    :param sentiment_counter包含情绪信息的计数器
    :param categores_counter统计类别的计数器
    """
    result_df.to_excel(target_folder/f"{标题}.xlsx”)
    html_文件 = target_folder/f"{标题}.html”
    html_内容 = result_df.to_html(escape=False)
    # Make sure the 文件 is written in UTF-8
    使用open(html_文件， "w"， encoding="utf-8")作为文件:
        文件.write(html_内容)
    sentiment_html = generate_sentiment_table(sentiment_counter， "Sentiment")
    categorories_html = generate_sentiment_table(categorories_counter， "Category")
    with open(html_文件, encoding="utf8") as f:
        内容 = f"""
                
                    
                    
                    
                
                
                    
                        {re.子(r '.+?theguardian.com/pro文件', '', url)}

                        Sentiment Count

                        {sentiment_html}
                        Categories Count

                        {categories_html}
                        {f.read ()}
                    

                

DIY硬件论坛
新葡京博彩官网
Puck-break-support@ruiled.net
Buying-platform-billing@shirokuma-house.net
乐天使
MGM-Mirage-media@s-wieno.com
聚利科技
pp电子
电子试玩
Crown-Sports-contactus@sdgzsx.net
bck体育
二色商城
Hockey-Breakthrough-Deluxe-Edition-marketing@ruiled.net
神兵传奇官方网站
Optimist-billing@carlosfrancisco.net
肇庆学院招生办
请看小说网
大众口腔
New-Portugal-new-Beijing-careers@lin-koln.com
E人E本官网

温州科技职业学院
济宁一中
西安中国国际旅行社
美康生物
运城百姓网
陕西中公金融人
大眼仔旭
泰州新闻网
弹弹123 
票务之星

站点地图
玉林天气预报
《天天炫斗》官方网站
中国兵器装备集团公司
长沙欣欣旅游网



            """
        内容=内容.替换('class="dataframe"'， 'class="table -striped -hover dataframe"')
    with open(html_文件, "w", encoding="utf8") as f:
        f.写(内容)

Sample Script Output

以下是脚本输出示例，用于以下专栏:

结论

LangChain’s LLM链提供了一种非常方便的与LLM交互的方式，当您有一个基于列表的输入，您希望对其应用带有参数的预定义LLM提示符时.

Gil Fernandes, Onepoint Consulting

< / div > < / div > < / div > < / div >

Exploring LangChain’s LLM链

分享

How LLM链 Works

Using LLM链 with TheGuardian RSS Feeds

脚本操作

{re.子(r '.+?theguardian.com/pro文件', '', url)}

Sentiment Count

Categories Count

Sample Script Output

结论

相关的帖子

Posted by Fuzzy Labs

模糊实验室发布免费工具，导出谷歌分析数据到谷歌BigQuery

由D55发布

D55展示全球最大的博彩平台科技孵化器颠覆性创新

Posted by Woodhurst Consulting

AI in transaction monitoring – Two birds, one stone

Posted by Woodhurst Consulting

Machine Learning for the foreseeable future

Posted by Woodhurst Consulting

You’re more than your credit score

Posted by Woodhurst Consulting

Getting the whole picture

Posted by Woodhurst Consulting

Imperfect Intelligence, Part I – Garbage Data

Posted by Woodhurst Consulting

Imperfect Intelligence, Part II – A biased system

Posted by iomart Group plc

Good luck Prolific North Tech Award no最小值ees

Posted by Leyton UK

Innovation Funding & Collaboration for Growth

Subscribe to our newsletter