Few doc improvements

This commit is contained in:
Aymeric 2024-12-31 14:19:28 +01:00
parent aa06a7ad78
commit 696d147020
4 changed files with 27 additions and 34 deletions

View File

@ -47,7 +47,7 @@ Once you have setup the `doc-builder` and additional packages with the pip insta
you can generate the documentation by typing the following command: you can generate the documentation by typing the following command:
```bash ```bash
doc-builder build smolagents docs/source/ --build_dir ~/tmp/test-build doc-builder build smolagents docs/source/en/ --build_dir ~/tmp/test-build
``` ```
You can adapt the `--build_dir` to set any temporary folder that you prefer. This command will create it and generate You can adapt the `--build_dir` to set any temporary folder that you prefer. This command will create it and generate
@ -59,7 +59,7 @@ Markdown editor.
To preview the docs, run the following command: To preview the docs, run the following command:
```bash ```bash
doc-builder preview smolagents docs/source/ doc-builder preview smolagents docs/source/en/
``` ```
The docs will be viewable at [http://localhost:5173](http://localhost:5173). You can also preview the docs once you The docs will be viewable at [http://localhost:5173](http://localhost:5173). You can also preview the docs once you

View File

@ -15,39 +15,28 @@ rendered properly in your Markdown viewer.
--> -->
# Introduction to Agents # Introduction to Agents
### What is an agent? ### 🤔 What are agents?
Any efficient system using AI will need to provide LLMs some kind of access to the real world: for instance the possibility to call a search tool to get external information, or to act on certain programs in order to solve a task. Any efficient system using AI will need to provide LLMs some kind of access to the real world: for instance the possibility to call a search tool to get external information, or to act on certain programs in order to solve a task. In other words, LLMs should have ***agency***. Agentic programs are the gateway to the outside world for LLMs.
In other words, give them some ***agency***. Agentic programs are the gateway to the outside world for LLMs. > [!TIP]
> AI Agents are **programs where LLM outputs control the workflow**.
For a rigorous definition, AI Agents are *“programs in which the workflow is determined by LLM outputs”*.
Any system leveraging LLMs will integrate the LLM outputs into code. The influence of the LLM's input on the code workflow is the level of agency of LLMs in the system. Any system leveraging LLMs will integrate the LLM outputs into code. The influence of the LLM's input on the code workflow is the level of agency of LLMs in the system.
Note that with this definition, "agent" is not a discrete, 0 or 1 definition: instead, "agency" evolves on a continuous spectrum, as you give more or less influence to the LLM on your workflow. Note that with this definition, "agent" is not a discrete, 0 or 1 definition: instead, "agency" evolves on a continuous spectrum, as you give more or less power to the LLM on your workflow.
- If the output of the LLM has no impact on the workflow, as in a program that just postprocesses a LLM's output and returns it, this system is not agentic at all. See in the table below how agency can vary across systems:
- If an LLM output is used to determine which branch of an `if/else` switch is ran, the system starts to have some level of agency: it's a router.
Then it can get more agentic.
- If you use an LLM output to determine which function is run and with which arguments, that's tool calling.
- If you use an LLM output to determine if you should keep iterating in a while loop, you have a multi-step agent.
| Agency Level | Description | How that's called | Example Pattern | | Agency Level | Description | How that's called | Example Pattern |
|-------------|-------------|-------------|-----------------| | ------------ | ------------------------------------------------------- | ----------------- | -------------------------------------------------- |
| No Agency | LLM output has no impact on program flow | Simple Processor | `process_llm_output(llm_response)` | | ☆☆☆ | LLM output has no impact on program flow | Simple Processor | `process_llm_output(llm_response)` |
| Basic Agency | LLM output determines basic control flow | Router | `if llm_decision(): path_a() else: path_b()` | | ★☆☆ | LLM output determines basic control flow | Router | `if llm_decision(): path_a() else: path_b()` |
| Higher Agency | LLM output determines function execution | Tool Caller | `run_function(llm_chosen_tool, llm_chosen_args)` | | ★★☆ | LLM output determines function execution | Tool Caller | `run_function(llm_chosen_tool, llm_chosen_args)` |
| High Agency | LLM output controls iteration and program continuation | Multi-step Agent | `while llm_should_continue(): execute_next_step()` | | ★★★ | LLM output controls iteration and program continuation | Multi-step Agent | `while llm_should_continue(): execute_next_step()` |
| High Agency | One agentic workflow can start another agentic workflow | Multi-Agent | `if llm_trigger(): execute_agent()` | | ★★★ | One agentic workflow can start another agentic workflow | Multi-Agent | `if llm_trigger(): execute_agent()` |
Since the systems versatility goes in lockstep with the level of agency that you give to the LLM, agentic systems can perform much broader tasks than any classic program. The multi-step agent has this code structure:
Programs are not just tools anymore, confined to an ultra-specialized task : they are agents.
One type of agentic system is quite simple: the multi-step agent. It has this structure:
```python ```python
memory = [user_defined_task] memory = [user_defined_task]
@ -57,7 +46,11 @@ while llm_should_continue(memory): # this loop is the multi-step part
memory += [action, observations] memory += [action, observations]
``` ```
This agentic system just runs in a loop, execution a new action at each step (the action can involve calling some pre-determined *tools* that are just functions), until its observations make it apparent that a satisfactory state has been reached to solve the given task. This agentic system runs in a loop, executing a new action at each step (the action can involve calling some pre-determined *tools* that are just functions), until its observations make it apparent that a satisfactory state has been reached to solve the given task. Heres an example of how a multi-step agent can solve a simple math question:
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Agent_ManimCE.gif"/>
</div>
### When to use an agentic system ? ### When to use an agentic system ?
@ -114,12 +107,12 @@ All these elements need tight coupling to make a well-functioning system. That's
Why is code better? Well, because we crafted our code languages specifically to be great at expressing actions performed by a computer. If JSON snippets were a better way, JSON would be the top programming language and programming would be hell on earth. Why is code better? Well, because we crafted our code languages specifically to be great at expressing actions performed by a computer. If JSON snippets were a better way, JSON would be the top programming language and programming would be hell on earth.
Code has better: Writing actions in code rather than JSON-like snippets provides better:
- **Composability:** could you nest JSON actions within each other, or define a set of JSON actions to re-use later, the same way you could just define a python function? - **Composability:** could you nest JSON actions within each other, or define a set of JSON actions to re-use later, the same way you could just define a python function?
- **Object management:** how do you store the output of an action like `generate_image` in JSON? - **Object management:** how do you store the output of an action like `generate_image` in JSON?
- **Generality:** code is built to express simply anything you can do have a computer do. - **Generality:** code is built to express simply anything you can have a computer do.
- **Representation in LLM training corpuses:** why not leverage this benediction of the sky that plenty of quality actions have already been included in LLM training corpuses? - **Representation in LLM training data:** plenty of quality code actions is already included in LLMs training data which means theyre already trained for this!
This is illustrated on the figure below, taken from [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030). This is illustrated on the figure below, taken from [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030).

View File

@ -192,8 +192,8 @@ translates to about 2,660,762 GWh/year.
2021. 2021.
``` ```
Seems like we'll need some sizeable powerplants if the scaling hypothesis continues. Seems like we'll need some sizeable powerplants if the [scaling hypothesis](https://gwern.net/scaling-hypothesis) continues to hold true.
Our agents managed to efficiently collaborate towards solving the task! ✅ Our agents managed to efficiently collaborate towards solving the task! ✅
💡 You can easily extend this to more agents: one does the code execution, one the web search, one handles file loadings... 💡 You can easily extend this orchestration to more agents: one does the code execution, one the web search, one handles file loadings...

View File

@ -131,7 +131,7 @@ agent.run(
additional_args={"mp3_sound_file_url":'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/recording.mp3'} additional_args={"mp3_sound_file_url":'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/recording.mp3'}
) )
``` ```
For instance, use this to pass images or strings. For instance, you can use this `additional_args` argument to pass images or strings that you want your agent to leverage.