关于LLM_function_call的思考

LLM Agent_tools function_call

Word count: 4.8kReading time: 24 min

 2024/02/07 

在这篇文章里我们会

首先拆解：openai的function_call功能，搞清楚LLM调用function_call的底层原理
然后通过langchain来自定义function_call功能也就是制作一个可以调用tools的agent
当有多个tools时，拆解Agent的Route（路由）逻辑

先说结论：因为LLM每次输出的无状态性，所以我们把fucntion_call拆解为三个步骤：

判断调用哪个function
外部调用function返回结果
连同结果和原始prompt一起输入LLM，然后得到最终结果

这个说起来很抽象，而且真正上手写Agent的时候，其实会发现中间依旧有很多细节，值得关注！

Part 1：拆解openai官方的function_call功能

import json

# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    weather_info = {
        "location": location,
        "temperature": "72",
        "unit": unit,
        "forecast": ["sunny", "windy"],
    }
    return json.dumps(weather_info)
  
# define a function 这个描述格式很重要，在后文我们将其定义为open_ai的function格式
# 其实也可以看作是一个function格式的prompt
functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["location"],
        },
    }
]

messages = [
    {
        "role": "user",
        "content": "What's the weather like in Boston?"
    }
]

import openai

# 创建第一步响应
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=messages,
    functions=functions
)
print(response)

{
  "id": "chatcmpl-8pIeueSu7OIViCojG6WbXs0nO7ACh",
  "object": "chat.completion",
  "created": 1707237144,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "get_current_weather",
          "arguments": "{\n  \"location\": \"Boston, MA\"\n}"
        }
      },
      "logprobs": null,
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 82,
    "completion_tokens": 18,
    "total_tokens": 100
  },
  "system_fingerprint": null
}

在这里我们可以看到第一步调用的返回结果（message[“content”]）其实是null,但是返回了”function_call”，function_call中有两个重要的返回参数，name：需要调用的function_Name和需要调用function的arguments，以及根据user_input对于参数的解析

这里的需要调用的function_Name实际是LLM根据传入的fucntions[‘name’]以及functions[‘description’]的理解来确定的

因此如果要进一步得到结果，沿着这条思路，我们拿到function_call（argumetns）返回的结果，然后再连同初始message（用户问题）一起传入LLM，然后得到最终结果：

接下来，我们来写出代码：

args = json.loads(response["choices"][0]["message"]['function_call']['arguments'])
observation = get_current_weather(args)
messages.append(
        {
            "role": "function",
            "name": "get_current_weather",
            "content": observation,
        }
)

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=messages,
)
print(response)

{
  "id": "chatcmpl-8pIytDN4Zj7WvgWUQPK9SCKYhH13G",
  "object": "chat.completion",
  "created": 1707238383,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The weather in Boston is currently 72\u00b0F. It is sunny and windy."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 16,
    "total_tokens": 72
  },
  "system_fingerprint": null
}

到此我们确实拿到了LLM的输出结果：he weather in Boston is currently 72\u00b0F. It is sunny and windy.

所以整理总结一下思路，我们发现所谓的LLM_function_call功能实际上就是把你具有的工具生成一个描述（“定义好一个详细的描述文档包括：function的功能描述，名字，需要的参数”），然后连同user_input一起传入LLM，让LLM判断需要调用哪个function（定义好输出格式）并从user_input中解析出传入参数，然后外部调用function返回结果，最后再将返回结果连同初始message（prompt/user_input）一起传入LLM得到答案。

之所以要如此麻烦的调用两次LLM输出结果，其本质原因是：每次调用LLM-API的输出都是stateless（无状态的）的，所以LLM的判断输出和结果输出其实是人为的在外部缝合起来

写到这里其实还有个问题，细心的读者应该会发现，如果user_input是一个无关function功能的输入怎么办

比如：

messages = [
    {
        "role": "user",
        "content": "hi!",  # 问题跟weather无关
    }
]

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=messages,
    functions=functions,
)

print(response)

{
  "id": "chatcmpl-8pJ8NOb8jn1rRDpRpyW8bsdOfflhi",
  "object": "chat.completion",
  "created": 1707238971,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 76,
    "completion_tokens": 10,
    "total_tokens": 86
  },
  "system_fingerprint": null
}

从这个返回结果我们会发现response的message[‘content’]不为null，直接返回了结果，而且也没有function_call相关信息，所以在这儿，我们发现LLM自动判别出了这个input与function的功能无关，所以不调用function而是直接输出结果。

那基于上面的思路，我们怎么知道LLM这次的输出是否需要function_call呢，有人会说，可以加一个逻辑判断respone的message[‘content’]是否为null，这确实可以，但其实我们仔细观察两个（response）返回的结果，会发现有一个参数：finish_reason

第一个response的”finish_reason”:”function_call”,第二个”finish_reason”:”stop”所以这里的finish_reason就直接决定了LLM的下一步action

我们可以根据此来定义一个route（路由）逻辑：

# 这里def get_current_weather(location, unit="fahrenheit")和functions =[{{}}]沿用上面的，所以省略
def openai_call(user_input):
		messages = [
    {
        "role": "user",
        "content": user_input, 
    }]
		response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=messages,
    functions=functions,
    )  
    if response['choices']['finish_reason']=='function_call':
   		 args = json.loads(response["choices"][0]["message"]['function_call']['arguments'])
			 observation = get_current_weather(args)
       messages.append(
        {
            "role": "function",
            "name": "get_current_weather",
            "content": observation,
        })
        response = openai.ChatCompletion.create(
    		model="gpt-3.5-turbo-0613",
    		messages=messages,)
    return response['choices'][0]["message"]["content"]

到此我们就完整的完成了一个LLM_function_call（Agent）的功能

在这个过程中，我们会发现其实原理非常简单，但是定义functions=[{}]格式的时候非常麻烦，所以我们引入langchain，langchain封装了大量的函数，来简化我们的定义操作

比如上文中定义function格式的时候，我们只需要按固定格式写出def get_current_weather()，然后调用format_tool_to_openai_function（get_current_weather()），即可转化为function格式

同时langchain也封装了很多链式拼接prompt的方式，在接下里的代码里就会尤其凸显

Part 2：Build own tools With Langchain

在写langchain代码之前，这里我们先补充一个知识：Pydantic Syntax

Pydantic Syntax(构建一个类)【如果了解pydantic Syntax类可以直接跳过】

Pydantic 是一个用于数据验证和解析的Python库，它基于Python的类型注解。Pydantic 允许你定义数据模型，并在运行时对传入的数据进行验证，确保数据符合预期的类型和结构。这在处理API请求、解析JSON数据或定义配置时非常有用。

以下是Pydantic的一些基本语法和概念：

安装Pydantic：
你可以使用pip来安装Pydantic：
1
pip install pydantic

定义数据模型：
在Pydantic中，你可以通过继承BaseModel类来定义数据模型。在模型类中，你可以使用类型注解来指定字段的类型。

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int
    email: str = None  # 可选字段，可以没有默认值

验证数据：
当你创建模型的实例时，Pydantic会自动验证传入的数据。如果数据不符合模型定义，它会抛出ValidationError。
1
user = User(name="Alice", age=30)
处理可选字段和默认值：
你可以为字段指定默认值，这样在创建实例时，如果该字段没有提供值，就会使用默认值。
1
2
3
4
class User(BaseModel):
name: str
age: int
email: str = "default@example.com" # 默认值

自定义验证：
你可以使用validator装饰器来添加自定义验证逻辑。

from pydantic import validator

class User(BaseModel):
    name: str
    age: int

    @validator('age')
    def check_age(cls, v):
        if v < 18:
            raise ValueError("Age must be at least 18")
        return v

序列化和反序列化：
Pydantic 提供了dict()和json()方法来序列化模型实例为字典或JSON字符串，以及parse_obj()和parse_file()方法来从字典或文件中反序列化数据。
1
2
user_dict = user.dict() # {'name': 'Alice', 'age': 30}
user_json = user.json() # '{"name": "Alice", "age": 30}'

配置：
你可以在模型类中使用Config类来配置模型的行为，例如设置字段的默认值、验证策略等。

class User(BaseModel):
    name: str
    age: int

    class Config:
        orm_mode = True  # 在ORM模式下，某些字段可能会被转换为数据库字段

集成FastAPI：
Pydantic 与FastAPI框架紧密集成，可以自动处理API请求和响应的验证。

from fastapi import FastAPI, Request
from pydantic import BaseModel

app = FastAPI()

class Item(BaseModel):
    name: str
    description: str = Field(..., example="An example item")
    price: float
    tax: float = 0.1

@app.post("/items/")
async def create_item(item: Item):
    return item

Pydantic 的语法简洁且功能强大，它使得数据验证和解析变得简单而直观。通过使用Pydantic，你可以确保你的应用程序始终接收到正确格式和类型的数据。

补充pydantic类型的数据结构是因为，我们可以直接调用convert_pydantic_to_openai_function()函数将定义的Pydantic类转换为可以输出LLM的function_call格式,我们用下面这个例子可以看到：

from langchain.utils.openai_functions import convert_pydantic_to_openai_function
class OpenMeteoInput(BaseModel):
    """Latitude of the location to fetch weather data for"""
    latitude: float = Field(..., description="Latitude of the location to fetch weather data for")
    longitude: float = Field(..., description="Longitude of the location to fetch weather data for")
openMeteoInput_function=[convert_pydantic_to_openai_function(OpenMeteoInput)]
print(openMeteoInput_function)

[{'name': 'OpenMeteoInput',
  'description': 'Latitude of the location to fetch weather data for',
  'parameters': {'title': 'OpenMeteoInput',
   'description': 'Latitude of the location to fetch weather data for',
   'type': 'object',
   'properties': {'latitude': {'title': 'Latitude',
     'description': 'Latitude of the location to fetch weather data for',
     'type': 'number'},
    'longitude': {'title': 'Longitude',
     'description': 'Longitude of the location to fetch weather data for',
     'type': 'number'}},
   'required': ['latitude', 'longitude']}}]

这里需要注意的是，能被解析为openai_function的pydantic类必须包含”””Latitude of the location to fetch weather data for”””，这个被解析出来后是对function的description

接下来我们可以进入到build Agent with Langchain的阶段了

import requests
from pydantic import BaseModel, Field
import datetime

# Define the input schema
class OpenMeteoInput(BaseModel):
    latitude: float = Field(..., description="Latitude of the location to fetch weather data for")
    longitude: float = Field(..., description="Longitude of the location to fetch weather data for")
    
# @tool会自动将get_current_temperature函数解析为上文所提到的LLM可直接调用的，functions格式，只不过这里是langchain的tools格式，其实原理是一样的，就是多了一层封装
# args_schema=OpenMeteoInput 的作用是让这个get_current_temperature的function的input_args与OpenMetoInput
# 的描述一样
@tool(args_schema=OpenMeteoInput)
def get_current_temperature(latitude: float, longitude: float) -> dict:
    """Fetch current temperature for given coordinates."""
    
    BASE_URL = "https://api.open-meteo.com/v1/forecast"
    
    # Parameters for the request
    params = {
        'latitude': latitude,
        'longitude': longitude,
        'hourly': 'temperature_2m',
        'forecast_days': 1,
    }

    # Make the request
    response = requests.get(BASE_URL, params=params)
    
    if response.status_code == 200:
        results = response.json()
    else:
        raise Exception(f"API Request failed with status code: {response.status_code}")

    current_utc_time = datetime.datetime.utcnow()
    time_list = [datetime.datetime.fromisoformat(time_str.replace('Z', '+00:00')) for time_str in results['hourly']['time']]
    temperature_list = results['hourly']['temperature_2m']
    
    closest_time_index = min(range(len(time_list)), key=lambda i: abs(time_list[i] - current_utc_time))
    current_temperature = temperature_list[closest_time_index]
    
    return f'The current temperature is {current_temperature}°C'
  
  
# 这里同样也可以转换成openai的function格式
# 这里的format_tool_to_openai_function猜测是封装了convert_pydantic_to_openai_function类，原理都差不多
from langchain.tools.render import format_tool_to_openai_function
get_current_temperature_functions=format_tool_to_openai_function(get_current_temperature)
from langchain.chat_models import ChatOpenAI
model = ChatOpenAI(temperature=0).bind(functions=get_current_temperature_functions)
model.invoke("what is the weather in sf right now")

Routing(When there are mutiples functions)

# 第二个函数
import wikipedia
@tool
def search_wikipedia(query: str) -> str:
    """Run Wikipedia search and get page summaries."""
    page_titles = wikipedia.search(query)
    summaries = []
    for page_title in page_titles[: 3]:
        try:
            wiki_page =  wikipedia.page(title=page_title, auto_suggest=False)
            summaries.append(f"Page: {page_title}\nSummary: {wiki_page.summary}")
        except (
            self.wiki_client.exceptions.PageError,
            self.wiki_client.exceptions.DisambiguationError,
        ):
            pass
    if not summaries:
        return "No good Wikipedia Search Result was found"
    return "\n\n".join(summaries)

# 构建functions，在这里还是用的openai本身的方法，只能做单步判断，最终输出逻辑还需要自己实现
functions = [
    format_tool_to_openai_function(f) for f in [
        search_wikipedia, get_current_temperature
    ]
]
model = ChatOpenAI(temperature=0).bind(functions=functions)
model.invoke("what is the weather in sf right now")
# 输出：AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_current_temperature', #'arguments': '{\n  "latitude": 37.7749,\n  "longitude": -122.4194\n}'}})
model.invoke("what is langchain")
# AIMessage(content='', additional_kwargs={'function_call': {'name': 'search_wikipedia', 'arguments': '{\n  "query": "langchain"\n}'}})

## 接下来使用构建chain的方式
from langchain.prompts import ChatPromptTemplate
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful but sassy assistant"),
    ("user", "{input}"),
])
chain = prompt | model | OpenAIFunctionsAgentOutputParser()
result = chain.invoke({"input": "what is the weather in sf right now"})
type(result)
# langchain.schema.agent.AgentActionMessageLog
result.tool
# result.tool 'get_current_temperature'
result.tool_input
# {'latitude': 37.7749, 'longitude': -122.4194}
result = chain.invoke({"input": "hi!"})
type(result)
# langchain.schema.agent.AgentFinish
result.return_values
{'output': 'Hello! How can I assist you today?'}

当我们构建chain后，我们可以明显的看到LLM的router逻辑:

当需要调用function时，type(result)：langchain.schema.agent.AgentActionMessageLog类，返回result.tool和result.tool_input
当不需要调用function时，type(result)：langchain.schema.agent.AgentFinish类，此时可返回result.return_values

因此我们可以基于此来写出Router的逻辑

from langchain.schema.agent import AgentFinish
def route(result):
  #实际上就是封装了一个判断type(result)？=langchain.schema.agent.AgentFinish的逻辑
    if isinstance(result, AgentFinish):         
        return result.return_values['output']
    else:
        tools = {
            "search_wikipedia": search_wikipedia, 
            "get_current_temperature": get_current_temperature,
        }
        return tools[result.tool].run(result.tool_input)
      # 这里使用了tool类，当使用@tool注解后，一方面是可以转换成langchain的function格式，另一方面可以通过使用function_name.run(args)来调用function
      
chain = prompt | model | OpenAIFunctionsAgentOutputParser() | route # 链式调用法则
result = chain.invoke({"input": "What is the weather in san francisco right now?"})
result
# 'The current temperature is 11.6°C'
result = chain.invoke({"input": "What is langchain?"})
result
# 'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). As a language model integration framework, LangChain\'s use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.\n\n\n\nPage: OpenAI\nSummary: OpenAI is a U.S. based artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work".\nAs one of the leading organizations of the AI Spring, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the artificial intelligence spring.The organization consists of the non-profit OpenAI, Inc. registered in Delaware and its for-profit subsidiary OpenAI Global, LLC. It was founded by Ilya Sutskever, Greg Brockman, Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, Jessica Livingston, John Schulman, Pamela Vagata, and Wojciech Zaremba, with Sam Altman and Elon Musk serving as the initial board members. Microsoft provided OpenAI Global LLC with a $1 billion investment in 2019 and a $10 billion investment in 2023, with a significant portion of the investment in the form of compute resources on Microsoft\'s Azure cloud service.On November 17, 2023, the board removed Altman as CEO, while Brockman was removed as chairman and then resigned as president. Four days later, both returned after negotiations with the board, and most of the board members resigned. The new initial board included former Salesforce co-CEO Bret Taylor as chairman. It was also announced that Microsoft will have a non-voting board seat.\n\n\n\nPage: DataStax\nSummary: DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries.\n\n'
result = chain.invoke({"input": "hi!"})
result
# 'Hello! How can I assist you today?'

写在最后，我们来总结一下，制作一个function_call功能的agent的思路：

首先制作一个@tool的function
format_tool_to_openai_function（functions）生成可以调用的function格式
构建prompt
构建route：
- instance（result，AgentFinsh）
- if ：return result.return_values[‘output’] （type(result)==langchain.schema.agent.AgentFinish）
- Else: return function_call.run(input_tool) (tools[‘result.tool’].run(input_tool))
构建链：chain = prompt ｜model ｜OpenAIFunctionsAgentOutputParser() | route
调用链：result= chain.invoke({“input”:”question?”})

用一幅图来概括一下：

其实写到这儿，我还挺好奇，这个chain到底是什么？

1
2
3

# 我们可以打印出chain来看看
chain = prompt | model | OpenAIFunctionsAgentOutputParser() | route 
print(chain)

ChatPromptTemplate(
  input_variables=['input'],
  messages=[
  SystemMessagePromptTemplate(
    prompt=PromptTemplate(
    											input_variables=[], 
    										  template='You are helpful but sassy assistant'
  											 )
                  						), 
  HumanMessagePromptTemplate(
    prompt=PromptTemplate(                                
      		input_variables=['input'], 
      		template='{input}'
    										 )
  											   )
						]
                  )
| RunnableBinding(bound=ChatOpenAI(
  client=<class'openai.api_resources.chat_completion.ChatCompletion'>, 
  temperature=0.0, 			           
  openai_api_key='syshk-', 
  openai_api_base='http://jupyter-api-proxy.internal.dlai/rev-proxy',      
  openai_organization='', openai_proxy=''), 
  kwargs={'functions': 
         [{'name': 'search_wikipedia', 
                   'description': 'search_wikipedia(query: str) -> str - Run Wikipedia                                       search and get page summaries.', 
                   'parameters': {'title': 'search_wikipediaSchemaSchema', 
                                 'type' : 'object',
                                 'properties': {'query': {'title':'Query',                  																													 'type':'string'}},                  															'required': ['query']
                                 }
          }, 
          {'name': 'get_current_temperature', 
           'description': 'get_current_temperature(latitude: float, longitude: float) ->   															dict - Fetch current temperature for given coordinates.', 						 'parameters': {'title':'OpenMeteoInput', 
           								'type':'object', 
           								'properties': {'latitude': {'title': 'Latitude', 
                                                      'description': 'Latitude of the 		                                                       location to fetch weather data for', 
                                                      'type': 'number'
                                                     }, 
                                         'longitude': {'title': 'Longitude', 					                                                            'description': 'Longitude of the 																											location to fetch weather data for',
                                                       'type': 'number'}
                                        	}, 
                          'required': ['latitude', 'longitude']
                          }
           }
         ]
         }
                							 )
| OpenAIFunctionsAgentOutputParser()
| RunnableLambda(...)

这里我把输出的结果格式化的显示出来大家可以看的更清晰：

看到这儿，我们其实都会发现chain的底层原理就是作prompt拼接，所以，我个人认为，所谓的大模型开发应用就是一个基于prompt的开发，所谓的langchain实际就是，封装了一堆，拼接prompt的方法以及解析LLM输出的方法！

这里我们实际上是把route作为决策结果传入Agent因此其实我们还有另外一种制作Agent的思路，

其实底层逻辑还是：LLM决策function，然后调用function，拼接function_call的结果给LLM作输出

另外一种定义路由的方式

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.tools.render import format_tool_to_openai_function
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.prompts import MessagesPlaceholder
from langchain.agents.format_scratchpad import format_to_openai_functions
from langchain.schema.agent import AgentFinish
tools = [
  get_current_temperature, 
  search_wikipedia
]

functions = [format_tool_to_openai_function(f) for f in tools]

model = ChatOpenAI(temperature=0).bind(functions=functions)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful but sassy assistant"),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

chain = prompt | model | OpenAIFunctionsAgentOutputParser()

def run_agent(user_input):
    intermediate_steps = []
    while True:
        result = chain.invoke({
            "input": user_input, 
            "agent_scratchpad": format_to_openai_functions(intermediate_steps)
        })
        if isinstance(result, AgentFinish):
            return result
        tool = {
            "search_wikipedia": search_wikipedia, 
            "get_current_temperature": get_current_temperature,
        }[result.tool]
        observation = tool.run(result.tool_input)
        intermediate_steps.append((result, observation))

原理：在初始prompt里定义一个agent_scratchpad,可以理解为中间步骤需要加入的LLM记事本：function_call的结果以及function的nam，所以在这儿会定义一个中间步骤，当LLM思考到需要调用function的时候，就把中间步骤（记事本内容）传入LLM，然后根据记事本内容输出

Author：Yangyehan

Link：http://example.com/2024/02/07/24-2-7%E6%97%A5/

Publish date：February 7th 2024, 2:20:26 am

Update date：August 28th 2024, 9:58:07 am

License：本文采用知识共享署名-非商业性使用 4.0 国际许可协议进行许可