diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 37e1b47..b392929 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -1,25 +1,28 @@ -- sections: +- title: Get started + sections: - local: index title: 🤗 Agents - local: quicktour - title: Quick tour - title: Get started -- sections: - - local: building_good_agents - title: Building good agents - - local: tools + title: ⏱️ Quick tour +- title: Tutorials + sections: + - local: tutorials/building_good_agents + title: ✨ Building good agents + - local: tutorials/tools title: 🛠️ Tools - in-depth guide - title: Tutorials -- sections: - - local: intro_agents - title: An introduction to agentic systems - title: Conceptual guides -- sections: - - local: text_to_sql +- title: Conceptual guides + sections: + - local: conceptual_guides/intro_agents + title: 🤖 An introduction to agentic systems + - local: conceptual_guides/react + title: 🤔 ReAct agents +- title: Examples + sections: + - local: examples/text_to_sql title: Text-to-SQL - title: Examples -- sections: - - sections: - - local: main_classes/agent - title: Agents and Tools - title: Main Classes +- title: Reference + sections: + - local: reference/agents + title: Agent-related objects + - local: reference/tools + title: Tool-related objects diff --git a/docs/source/quicktour.md b/docs/source/quicktour.md index 8fb59e6..9df58e1 100644 --- a/docs/source/quicktour.md +++ b/docs/source/quicktour.md @@ -27,60 +27,25 @@ An agent is a system that uses an LLM as its engine, and it has access to functi These *tools* are functions for performing a task, and they contain all necessary description for the agent to properly use them. -The agent can be programmed to: -- devise a series of actions/tools and run them all at once, like the [`CodeAgent`] -- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one, like the [`JsonAgent`] - -### Types of agents - -#### Code agent - -This agent has a planning step, then generates python code to execute all its actions at once. It natively handles different input and output types for its tools, thus it is the recommended choice for multimodal tasks. - -#### React agents - -This is the go-to agent to solve reasoning tasks, since the ReAct framework ([Yao et al., 2022](https://huggingface.co/papers/2210.03629)) makes it really efficient to think on the basis of its previous observations. - -We implement two versions of JsonAgent: -- [`JsonAgent`] generates tool calls as a JSON in its output. -- [`CodeAgent`] is a new type of JsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance. - -> [!TIP] -> Read [Open-source LLMs as LangChain Agents](https://huggingface.co/blog/open-source-llms-as-agents) blog post to learn more about ReAct agents. - -
- - -
- -![Framework of a React Agent](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/open-source-llms-as-agents/ReAct.png) - -For example, here is how a ReAct Code agent would work its way through the following question. +For example, here is how a Code agent with access to a `web_search` tool would work its way through the following question. ```py3 agent.run( -"""How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture -proposed in Attention is All You Need?""" +"""How many more blocks (also denoted as layers) are there in BERT base encoder than in the encoder from the architecture proposed in Attention is All You Need?""" ) ``` ```text =====New task===== -How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need? +How many more blocks (also denoted as layers) are there in BERT base encoder than in the encoder from the architecture proposed in Attention is All You Need? ====Agent is executing the code below: -bert_blocks = search(query="number of blocks in BERT base encoder") +bert_blocks = web_search(query="number of blocks in BERT base encoder") print("BERT blocks:", bert_blocks) ==== Print outputs: BERT blocks: twelve encoder blocks ====Agent is executing the code below: -attention_layer = search(query="number of layers in Attention is All You Need") +attention_layer = web_search(query="number of layers in Attention is All You Need") print("Attention layers:", attention_layer) ==== Print outputs: @@ -459,4 +424,4 @@ with gr.Blocks() as demo: if __name__ == "__main__": demo.launch() -``` +``` \ No newline at end of file diff --git a/docs/source/main_classes/agent.md b/docs/source/reference/agents.md similarity index 65% rename from docs/source/main_classes/agent.md rename to docs/source/reference/agents.md index 9321d19..e5be35f 100644 --- a/docs/source/main_classes/agent.md +++ b/docs/source/reference/agents.md @@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License. rendered properly in your Markdown viewer. --> -# Agents & Tools +# Agents @@ -27,19 +27,16 @@ contains the API docs for the underlying classes. ## Agents -We provide two types of agents, based on the main [`Agent`] class: -- [`CodeAgent`] acts in one shot, generating code to solve the task, then executes it at once. -- [`ReactAgent`] acts step by step, each step consisting of one thought, then one tool call and execution. It has two classes: +Our agents inherit from [`ReactAgent`], which means they can act in multiple steps, each step consisting of one thought, then one tool call and execution. Read more in [this conceptual guide](../conceptual_guides/react). + +We provide two types of agents, based on the main [`Agent`] class. - [`JsonAgent`] writes its tool calls in JSON. - [`CodeAgent`] writes its tool calls in Python code. -### Agent +### BaseAgent -[[autodoc]] Agent +[[autodoc]] BaseAgent -### CodeAgent - -[[autodoc]] CodeAgent ### React agents @@ -53,35 +50,10 @@ We provide two types of agents, based on the main [`Agent`] class: [[autodoc]] ManagedAgent -## Tools - -### load_tool - -[[autodoc]] load_tool - -### tool - -[[autodoc]] tool - -### Tool - -[[autodoc]] Tool - -### Toolbox - -[[autodoc]] Toolbox - -### launch_gradio_demo - -[[autodoc]] launch_gradio_demo - ### stream_to_gradio [[autodoc]] stream_to_gradio -### ToolCollection - -[[autodoc]] ToolCollection ## Engines @@ -129,33 +101,3 @@ HfApiEngine()(messages, stop_sequences=["conversation"]) ``` [[autodoc]] HfApiEngine - - -## Agent Types - -Agents can handle any type of object in-between tools; tools, being completely multimodal, can accept and return -text, image, audio, video, among other types. In order to increase compatibility between tools, as well as to -correctly render these returns in ipython (jupyter, colab, ipython notebooks, ...), we implement wrapper classes -around these types. - -The wrapped objects should continue behaving as initially; a text object should still behave as a string, an image -object should still behave as a `PIL.Image`. - -These types have three specific purposes: - -- Calling `to_raw` on the type should return the underlying object -- Calling `to_string` on the type should return the object as a string: that can be the string in case of an `AgentText` - but will be the path of the serialized version of the object in other instances -- Displaying it in an ipython kernel should display the object correctly - -### AgentText - -[[autodoc]] agents.types.AgentText - -### AgentImage - -[[autodoc]] agents.types.AgentImage - -### AgentAudio - -[[autodoc]] agents.types.AgentAudio diff --git a/docs/source/reference/tools.md b/docs/source/reference/tools.md new file mode 100644 index 0000000..8f53a16 --- /dev/null +++ b/docs/source/reference/tools.md @@ -0,0 +1,82 @@ + +# Tools + + + +Transformers Agents is an experimental API which is subject to change at any time. Results returned by the agents +can vary as the APIs or underlying models are prone to change. + + + +To learn more about agents and tools make sure to read the [introductory guide](../index). This page +contains the API docs for the underlying classes. + +## Tools + +### load_tool + +[[autodoc]] load_tool + +### tool + +[[autodoc]] tool + +### Tool + +[[autodoc]] Tool + +### Toolbox + +[[autodoc]] Toolbox + +### launch_gradio_demo + +[[autodoc]] launch_gradio_demo + + +### ToolCollection + +[[autodoc]] ToolCollection + +## Agent Types + +Agents can handle any type of object in-between tools; tools, being completely multimodal, can accept and return +text, image, audio, video, among other types. In order to increase compatibility between tools, as well as to +correctly render these returns in ipython (jupyter, colab, ipython notebooks, ...), we implement wrapper classes +around these types. + +The wrapped objects should continue behaving as initially; a text object should still behave as a string, an image +object should still behave as a `PIL.Image`. + +These types have three specific purposes: + +- Calling `to_raw` on the type should return the underlying object +- Calling `to_string` on the type should return the object as a string: that can be the string in case of an `AgentText` + but will be the path of the serialized version of the object in other instances +- Displaying it in an ipython kernel should display the object correctly + +### AgentText + +[[autodoc]] agents.types.AgentText + +### AgentImage + +[[autodoc]] agents.types.AgentImage + +### AgentAudio + +[[autodoc]] agents.types.AgentAudio diff --git a/docs/source/tutorials/tools.md b/docs/source/tutorials/tools.md index 7ea6619..1e6f44f 100644 --- a/docs/source/tutorials/tools.md +++ b/docs/source/tutorials/tools.md @@ -197,8 +197,8 @@ agent.run( ``` -> [!WARNING] -> Beware when adding tools to an agent that already works well because it can bias selection towards your tool or select another tool other than the one already defined. +> [!TIP] +> Beware of not adding too many tools to an agent: this can overwhelm weaker LLM engines. Use the `agent.toolbox.update_tool()` method to replace an existing tool in the agent's toolbox.