diff --git a/README.md b/README.md index e616307..487b25b 100644 --- a/README.md +++ b/README.md @@ -29,14 +29,15 @@ limitations under the License. `smolagents` is a library that enables you to run powerful agents in a few lines of code. It offers: -✨ **Simplicity**: the logic for agents fits in ~thousand lines of code. We kept abstractions to their minimal shape above raw code! +✨ **Simplicity**: the logic for agents fits in ~thousand lines of code (see [agents.py](https://github.com/huggingface/smolagents/blob/main/src/smolagents/agents.py)). We kept abstractions to their minimal shape above raw code! -🌐 **Support for any LLM**: it supports models hosted on the Hub loaded in their `transformers` version or through our inference API, but also models from OpenAI, Anthropic, and many more through our LiteLLM integration. - -πŸ§‘β€πŸ’» **First-class support for Code Agents**, i.e. agents that write their actions in code (as opposed to "agents being used to write code"), [read more here](https://huggingface.co/docs/smolagents/tutorials/secure_code_execution). +πŸ§‘β€πŸ’» **First-class support for Code Agents**, i.e. agents that write their actions in code (as opposed to "agents being used to write code"). To make it secure, we support executing in sandboxed environments via [E2B](https://e2b.dev/). + - On top of this [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent) class, we still support the standard [`ToolCallingAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.ToolCallingAgent) that writes actions as JSON/text blobs. πŸ€— **Hub integrations**: you can share and load tools to/from the Hub, and more is to come! +🌐 **Support for any LLM**: it supports models hosted on the Hub loaded in their `transformers` version or through our inference API, but also supports models from OpenAI, Anthropic and many others via our [LiteLLM](https://www.litellm.ai/) integration. + ## Quick demo First install the package. @@ -70,6 +71,18 @@ Still, we implement several types of agents: `CodeAgent` writes its actions as P By the way, why use a framework at all? Well, because a big part of this stuff is non-trivial. For instance, the code agent has to keep a consistent format for code throughout its system prompt, its parser, the execution. So our framework handles this complexity for you. But of course we still encourage you to hack into the source code and use only the bits that you need, to the exclusion of everything else! +## How strong are open models for agentic workflows? + +We've created [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent) instances with some leading models, and compared them on [this benchmark](https://huggingface.co/datasets/m-ric/agents_medium_benchmark_2) that gathers questions from a few different benchmarks to propose a varied blend of challenges. + +[Find the benchmark here](https://github.com/huggingface/smolagents/blob/main/examples/benchmark.ipynb) for more detail on the agentic setup used, and see a comparison of code agents versus tool calling agents (spoilers: code works better). + +

+ benchmark of different models on agentic workflows +

+ +This comparison shows that open source models can now take on the best closed models! + ## Citing smolagents If you use `smolagents` in your publication, please cite it by using the following BibTeX entry. diff --git a/docs/source/en/conceptual_guides/intro_agents.md b/docs/source/en/conceptual_guides/intro_agents.md index 063b062..c233b39 100644 --- a/docs/source/en/conceptual_guides/intro_agents.md +++ b/docs/source/en/conceptual_guides/intro_agents.md @@ -15,7 +15,7 @@ rendered properly in your Markdown viewer. --> # Introduction to Agents -### πŸ€”Β What are agents? +## πŸ€”Β What are agents? Any efficient system using AI will need to provide LLMs some kind of access to the real world: for instance the possibility to call a search tool to get external information, or to act on certain programs in order to solve a task. In other words, LLMs should have ***agency***. Agentic programs are the gateway to the outside world for LLMs. @@ -31,7 +31,7 @@ See in the table below how agency can vary across systems: | Agency Level | Description | How that's called | Example Pattern | | ------------ | ------------------------------------------------------- | ----------------- | -------------------------------------------------- | | β˜†β˜†β˜† | LLM output has no impact on program flow | Simple Processor | `process_llm_output(llm_response)` | -| β˜…β˜†β˜† | LLM output determines basic control flow | Router | `if llm_decision(): path_a() else: path_b()` | +| β˜…β˜†β˜† | LLM output determines an if/else switch | Router | `if llm_decision(): path_a() else: path_b()` | | β˜…β˜…β˜† | LLM output determines function execution | Tool Caller | `run_function(llm_chosen_tool, llm_chosen_args)` | | β˜…β˜…β˜… | LLM output controls iteration and program continuation | Multi-step Agent | `while llm_should_continue(): execute_next_step()` | | β˜…β˜…β˜… | One agentic workflow can start another agentic workflow | Multi-Agent | `if llm_trigger(): execute_agent()` | @@ -53,35 +53,32 @@ This agentic system runs in a loop, executing a new action at each step (the act -### When to use an agentic system ? - -Agents are useful when you need an LLM to determine the workflow of an app. - -The question to ask is: "Do I really need flexibility in the workflow to efficiently solve the task at hand?" - -If a fixed workflow can work, you might as well build it all in good old no-AI code for 100% robustness. For the sake of simplicity and robstness, it's advised to regularize towards not using any agentic behaviour. On the opposite, agents are useful when the fixed workflow is not sufficient. +## βœ…Β When to use agents / β›”Β when to avoid them +Agents are useful when you need an LLM to determine the workflow of an app. But they’re often overkill. The question is: do I really need flexibility in the workflow to efficiently solve the task at hand? +If the pre-determined workflow falls short too often, that means you need more flexibility. Let's take an example: say you're making an app that handles customer requests on a surfing trip website. -You could know in advance that the requests will have to be classified in either of 2 buckets according to deterministic criteria, and you have a predefined workflow for each of these 2 cases. -For instance, this is if you let the user click a button to determine their query, and it goes into either of these buckets: +You could know in advance that the requests will can belong to either of 2 buckets (based on user choice), and you have a predefined workflow for each of these 2 cases. -1. Want some knowledge on the trips? β‡’ Then you give them access to a search bar to search your knowledge base -2. Wants to talk to sales? β‡’ Then you let them type in a contact form. +1. Want some knowledge on the trips? β‡’ give them access to a search bar to search your knowledge base +2. Wants to talk to sales? β‡’ let them type in a contact form. -If that deterministic workflow fits all queries, by all means just code everything! This will give you a 100% reliable system with no risk of error introduced by letting unpredictable LLMs meddle in your workflow. +If that deterministic workflow fits all queries, by all means just code everything! This will give you a 100% reliable system with no risk of error introduced by letting unpredictable LLMs meddle in your workflow. For the sake of simplicity and robustness, it's advised to regularize towards not using any agentic behaviour. -But what if the workflow can't be determined that well in advance? Say, 20% or 40% of users requests do not fit properly into your rigid categories, and are thus not handled properly by your program? +But what if the workflow can't be determined that well in advance? -For instance, a user wants to ask : "I can come on Monday, but I forgot my passport so risk being delayed to Wednesday, is it possible to take me and my stuff to surf on Tuesday morning, with a cancellation insurance?" This question hinges on many factors, and probably none of the predetermined criteria above won't be sufficient for this request. +For instance, a user wants to ask : `"I can come on Monday, but I forgot my passport so risk being delayed to Wednesday, is it possible to take me and my stuff to surf on Tuesday morning, with a cancellation insurance?"` This question hinges on many factors, and probably none of the predetermined criteria above will suffice for this request. -If the pre-determined workflow falls short too often, that means you need more flexibility, which is just what an agentic setup provides. In the above example, you could just make a multi-step agent that has access to a weather API tool, a google maps API to compute travel distance, an employee availability dashboard and a RAG system on your knowledge base. +If the pre-determined workflow falls short too often, that means you need more flexibility. -Until recently, computer programs were restricted to pre-determined workflows (with possible piles of if/else switches), thus focused on extremely narrow tasks, like "compute the sum of these numbers" or "find the shortest path in this graph". +That is where an agentic setup helps. -But actually, most real-life tasks are like our trip example above, they do not fit in pre-determined workflows. Agentic systems open up the vast world of real-world tasks to programs! +In the above example, you could just make a multi-step agent that has access to a weather API for weather forecasts, Google Maps API to compute travel distance, an employee availability dashboard and a RAG system on your knowledge base. -### Why `smolagents`? +Until recently, computer programs were restricted to pre-determined workflows, trying to handle complexity by piling up if/else switches. They focused on extremely narrow tasks, like "compute the sum of these numbers" or "find the shortest path in this graph". But actually, most real-life tasks, like our trip example above, do not fit in pre-determined workflows. Agentic systems open up the vast world of real-world tasks to programs! + +## Why `smolagents`? For some low-level agentic use cases, like chains or routers, you can write all the code yourself. You'll be much better that way, since it will let you control and understand your system better. @@ -101,11 +98,17 @@ But wait, since we give room to LLMs in decisions, surely they will make mistake All these elements need tight coupling to make a well-functioning system. That's why we decided we needed to make basic building blocks to make all this stuff work together. -### Code agents +## Code agents -[Multiple](https://huggingface.co/papers/2402.01030) [research](https://huggingface.co/papers/2411.01747) [papers](https://huggingface.co/papers/2401.00812) have shown that having the LLM write its actions (the tool calls) in code is much better than the current standard format for tool calling, which is across the industry different shades of "writing actions as a JSON of tools names and arguments to use, which you then parse to know which tool to execute and with which arguments". +In a multi-step agent, at each step, the LLM can write an action, in the form of some calls to external tools. A common format (used by Anthropic, OpenAI, and many others) for writing these actions is generally different shades of "writing actions as a JSON of tools names and arguments to use, which you then parse to know which tool to execute and with which arguments". -Why is code better? Well, because we crafted our code languages specifically to be great at expressing actions performed by a computer. If JSON snippets were a better way, JSON would be the top programming language and programming would be hell on earth. +[Multiple](https://huggingface.co/papers/2402.01030) [research](https://huggingface.co/papers/2411.01747) [papers](https://huggingface.co/papers/2401.00812) have shown that having the tool calling LLMs in code is much better. + +The reason for this simply that *we crafted our code languages specifically to be the best possible way to express actions performed by a computer*. If JSON snippets were a better expression, JSON would be the top programming language and programming would be hell on earth. + +The figure below, taken from [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030), illustrate some advantages of writing actions in code: + + Writing actions in code rather than JSON-like snippets provides better: @@ -113,9 +116,3 @@ Writing actions in code rather than JSON-like snippets provides better: - **Object management:** how do you store the output of an action like `generate_image` in JSON? - **Generality:** code is built to express simply anything you can have a computer do. - **Representation in LLM training data:** plenty of quality code actions is already included in LLMs’ training data which means they’re already trained for this! - -This is illustrated on the figure below, taken from [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030). - - - -This is why we put emphasis on proposing code agents, in this case python agents, which meant building secure python interpreters. \ No newline at end of file diff --git a/docs/source/en/reference/agents.md b/docs/source/en/reference/agents.md index 21bb809..c30c5a9 100644 --- a/docs/source/en/reference/agents.md +++ b/docs/source/en/reference/agents.md @@ -103,7 +103,7 @@ print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"])) ### HfApiModel -The `HfApiModel` is an engine that wraps an [HF Inference API](https://huggingface.co/docs/api-inference/index) client for the execution of the LLM. +The `HfApiModel` wraps an [HF Inference API](https://huggingface.co/docs/api-inference/index) client for the execution of the LLM. ```python from smolagents import HfApiModel @@ -121,3 +121,22 @@ print(model(messages)) >>> Of course! If you change your mind, feel free to reach out. Take care! ``` [[autodoc]] HfApiModel + +### LiteLLMModel + +The `LiteLLMModel` leverages [LiteLLM](https://www.litellm.ai/) to support 100+ LLMs from various providers. + +```python +from smolagents import LiteLLMModel + +messages = [ + {"role": "user", "content": "Hello, how are you?"}, + {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, + {"role": "user", "content": "No need to help, take it easy."}, +] + +model = LiteLLMModel("anthropic/claude-3-5-sonnet-latest") +print(model(messages)) +``` + +[[autodoc]] LiteLLMModel \ No newline at end of file diff --git a/docs/source/en/tutorials/building_good_agents.md b/docs/source/en/tutorials/building_good_agents.md index 5283fad..b1df02c 100644 --- a/docs/source/en/tutorials/building_good_agents.md +++ b/docs/source/en/tutorials/building_good_agents.md @@ -22,7 +22,7 @@ How to build into this latter category? In this guide, we're going to see best practices for building agents. > [!TIP] -> If you're new to building agents, make sure to first read the [intro to agents](./intro_agents) and the [guided tour of smolagents](../guided_tour). +> If you're new to building agents, make sure to first read the [intro to agents](../conceptual_guides/intro_agents) and the [guided tour of smolagents](../guided_tour). ### The best agentic systems are the simplest: simplify the workflow as much as you can diff --git a/docs/source/en/tutorials/secure_code_execution.md b/docs/source/en/tutorials/secure_code_execution.md index c617f17..2189c5b 100644 --- a/docs/source/en/tutorials/secure_code_execution.md +++ b/docs/source/en/tutorials/secure_code_execution.md @@ -18,7 +18,7 @@ rendered properly in your Markdown viewer. [[open-in-colab]] > [!TIP] -> If you're new to building agents, make sure to first read the [intro to agents](./intro_agents) and the [guided tour of smolagents](../guided_tour). +> If you're new to building agents, make sure to first read the [intro to agents](../conceptual_guides/intro_agents) and the [guided tour of smolagents](../guided_tour). ### Code agents diff --git a/docs/source/en/tutorials/tools.md b/docs/source/en/tutorials/tools.md index 8ef3ec7..6ad4b92 100644 --- a/docs/source/en/tutorials/tools.md +++ b/docs/source/en/tutorials/tools.md @@ -20,7 +20,7 @@ rendered properly in your Markdown viewer. Here, we're going to see advanced tool usage. > [!TIP] -> If you're new to building agents, make sure to first read the [intro to agents](./intro_agents) and the [guided tour of smolagents](../guided_tour). +> If you're new to building agents, make sure to first read the [intro to agents](../conceptual_guides/intro_agents) and the [guided tour of smolagents](../guided_tour). ### Directly define a tool by subclassing Tool