diff --git a/docs/source/building_agents.md b/docs/source/building_agents.md new file mode 100644 index 0000000..8edb005 --- /dev/null +++ b/docs/source/building_agents.md @@ -0,0 +1,123 @@ + +### The best agentic systems are the simplest: simplify the workflow as much as you can + +Giving an LLM some agency in your workflow introducessome risk of errors. + +Well-programmed agentic systems have good error logging and retry mechanisms anyway, so the LLM engine has a chance to self-correct their mistake. But to reduce the risk of LLM error to the maximum, you should simplify your worklow! + +Let's take again the example from [intro_agents]: a bot that answers user queries on a surf trip company. +Instead of letting the agent do 2 different calls for "travel distance API" and "weather API" each time they are asked about a new surf spot, you could just make one unified tool "return_spot_information", a functions that calls both APIs at once and returns their concatenated outputs to the user. + +This will reduce costs, latency, and error risk! + +So our first actionable takeaway is *you should group tools if possible* + + +### Improve the information flow to the LLM engine + +Remember that your LLM engine is like a ~intelligent~ robot, tapped into a room with the only communication with the outside world being notes passed under a door. + +It won't know of anything that happened if you don't explicitly put that into its prompt. + +For a `CodeAgent` using variables, it cannot access any varible not saved into its state. +For instance check out this agent trace for an LLM that I asked to make me a car picture: +``` +==================================================================================================== New task ==================================================================================================== +Make me a cool car picture +──────────────────────────────────────────────────────────────────────────────────────────────────── New step ──────────────────────────────────────────────────────────────────────────────────────────────────── +Agent is executing the code below: ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +image_generator(prompt="A cool, futuristic sports car with LED headlights, aerodynamic design, and vibrant color, high-res, photorealistic") +────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── + +Last output from code snippet: ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png +Step 1: + +- Time taken: 16.35 seconds +- Input tokens: 1,383 +- Output tokens: 77 +──────────────────────────────────────────────────────────────────────────────────────────────────── New step ──────────────────────────────────────────────────────────────────────────────────────────────────── +Agent is executing the code below: ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +final_answer("/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png") +────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +Print outputs: + +Last output from code snippet: ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png +Final answer: +/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png +``` + +The LLM never explicitly saved the image output into a variable, so it cannot access it again except by leveraging the path that was logged while saving the image. So properly logging the code execution made a big difference! + +Particular guidelines to follow: +- Each tool should log (by simply using `print` statements inside the tool's `forward` method) everything that could be useful for the LLM engine. + - In particular, logging detail on tool execution errors would help a lot! + +For instance, here's a tool that : + +First, here's a poor version: +```py +from my_weather_api return convert_location_to_coordinates, get_weather_report_at_coordinates +# Let's say "get_weather_report_at_coordinates" returns a list of [temperature in °C, risk of rain on a scale 0-1, wave height in m] +import datetime + +@tool +def get_weather_api(location (str), date_time: str) -> str: + """ + Returns the weather report. + + Args: + - location (`str`): the name of the place that you want the weather for. + - date_time (`str`): the date and time for which you want the report. + """ + lon, lat = convert_location_to_coordinates(location) + date_time = datetime.strptime(date_time) + return str(get_weather_report_at_coordinates((lon, lat), date_time)) +``` + +Why is it bad? +- there's no precision of the format that should be used for `date_time` +- there's no detail on how location should +- there's no logging mechanism tying to explicit failure cases like location not being in a proper format, or date_time not being properly formatted. +- the output format is hard to understand + +If the tool call fails, the error trace logged in memory can help the LLM reverse engineer the tool to fix the errors. But why leave it so much heavy lifting to do? + +Here's a better way to build this tool: +```py +from my_weather_api return convert_location_to_coordinates, get_weather_report_at_coordinates +# Let's say "get_weather_report_at_coordinates" returns a list of [temperature in °C, risk of rain on a scale 0-1, wave height in m] +import datetime + +@tool +def get_weather_api(location (str), date_time: str) -> str: + """ + Returns the weather report. + + Args: + - location (`str`): the name of the place that you want the weather for. Should be a place name, followed by possibly a city name, then a country, like "Anchor Point, Taghazout, Morocco". + - date_time (`str`): the date and time for which you want the report, formatted as '%m/%d/%y %H:%M:%S'. + """ + lon, lat = convert_location_to_coordinates(location) + try: + date_time = datetime.strptime(date_time) + except Exception as e: + raise ValueError("Conversion of `date_time` to datetime format failed, make sure to provide a string in format '%m/%d/%y %H:%M:%S'. Full trace:" + str(e)) + temperature_celsius, risk_of_rain, wave_height = get_weather_report_at_coordinates((lon, lat), date_time) + return f"Weather report for {location}, {date_time}: Temperature will be {temperature_celsius}°C, risk of rain is {risk_of_rain*100:.0f}%, wave height is {wave_height}m." +``` \ No newline at end of file diff --git a/docs/source/intro_agents.md b/docs/source/intro_agents.md index 827c652..c3ba507 100644 --- a/docs/source/intro_agents.md +++ b/docs/source/intro_agents.md @@ -15,8 +15,6 @@ rendered properly in your Markdown viewer. --> # Introduction to Agents -[[open-in-colab]] - ### Why do we need agentic systems? Current LLMs are like basic reasoning robots, that are trapped into a room. @@ -31,22 +29,41 @@ The whole idea of agentic systems is to embed LLMs into a program where their in ### What is an agentic system ? -Being "agentic" is not a 0-1 definition: instead, we should talk about "agency", defined as a spectrum. +Being "agentic" is not a discrete, 0 or 1 definition: instead, we should talk about "agency" being a continuous spectrum. -Any system leveraging LLMs will embed them into code. Then the influence of the LLM's input on the code workflow is the level of agency of LLMs in the system. +Any system leveraging LLMs will embed them into code. The influence of the LLM's input on the code workflow is the level of agency of LLMs in the system. -If the output of the LLM has no further impact on the way functions are run, this system is not agentic at all. +If the output of the LLM has no further impact on the workflow, as in a program that just postprocesses a LLM's output and returns it, this system is not agentic at all. -Once one an LLM output is used to determine which branch of an `if/else` switch is ran, that starts to be some level of agency: a router. +Once an LLM output is used to determine which branch of an `if/else` switch is ran, the system starts to have some level of agency: it's a router. Then it can get more agentic. - -If you use an LLM output to determine which function is run and with which arguments, that's tool calling. - -If you use an LLM output to determine if you should keep iterating in a while loop, you get a multi-step agent. +- If you use an LLM output to determine which function is run and with which arguments, that's tool calling. +- If you use an LLM output to determine if you should keep iterating in a while loop, you get a multi-step agent. And the workflow can become even more complex. That's up to you to decide. +### When to use an agentic system ? + +Given the definition above, agents are useful when you need an LLM to help you determine the workflow of an app. +You should regularize to not use any agentic behaviour. + +For intance, let's say you're making an app that handles customer requests on a surfing trip website. + +If you know in advance that the requests will have to be classified in either of 2 buckets according to deterministic criteria, and you have a predefined workflow for each of these 2 cases, then this means you can make a fixed workflow. +For instance, if you let the user click a button to determine their query, and it goes into either of these: +1. Want some knowledge on the trips. Then you give them access to a search bar to search your knowledge base +2. Wants to talk to sales. Then you let them type in a contact form. + +If that deterministic workflow fits all queries, by all means just hardcode verything: this will give you a 100% reliable system with no risk of error introduced by letting unpredictable LLMs meddle in your workflow. + +But what if the workflow can't be determined that well in advance? Say, 10% or 20% of users requests won't fit properly into your rigid categories, and risk being mishandled by the program? + +Let's say, a user wants to ask : "I can come on Monday, but I forgot my passport so risk being delayed to Wednesday, is it possible to take me and my stuff to surf on Tuesday morning, with a concellation insurance?" +This question into play many factors: availability of employees, weather, travelling distance, knowledge about cancellation policies... +Probably none of the predetermined criteria above won't work properly. + +That percentage of "won't fit in a predetermined workflow" means that you need more flexibility: making your system agentic will provide it that flexibility. In our example, you could just make a multi-step agent that has access to a weather API tool, a google maps API to compute travel distance, an employee availability dashboard and a RAG system on your knowledge base. ### Why {Agents}? @@ -58,6 +75,7 @@ But once you start going for more complicated behaviours like letting an LLM cal - for a multi-step agent where the LLM output determines the loop, you need to give a different prompt to the LLM based on what happened in the last loop iteration: so you need some kind of memory. See? With these two examples, we already found the need for a few items to help us: +- of course an LLM that acts as the engine powering the system - a list of tools that the agent can access - a parser that extracts tool calls from the LLM output - system prompt synced with the parser @@ -84,4 +102,4 @@ Few existing framework build on this idea to make code agents first-class citize Especially, since code execution can be a security concern (arbitrary code execution!), we provide options at runtime: - a secure python interpreter to run code more safely in your environment -- a sandbox `uv` environment. \ No newline at end of file +- a sandboxed `uv` environment. \ No newline at end of file