diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..897944e --- /dev/null +++ b/.gitignore @@ -0,0 +1,25 @@ +# Logging +logs +tmp +wandb + +# Data +data +outputs + +# Apple +.DS_Store + +# VS Code +.vscode + +# Environments +.env +.venv +env/ +venv/ +env.bak/ +venv.bak/ + +# Jupyter Notebook +.ipynb_checkpoints \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..fc64f02 --- /dev/null +++ b/README.md @@ -0,0 +1,277 @@ + + +

+
+ +
+

+ +

+ + License + Documentation + GitHub release + Contributor Covenant +

+ +

+

Run your *raw* PyTorch training script on any kind of device +

+ +

+ +

+ +## Easy to integrate + +๐Ÿค— Accelerate was created for PyTorch users who like to write the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16. + +๐Ÿค— Accelerate abstracts exactly and only the boilerplate code related to multi-GPUs/TPU/fp16 and leaves the rest of your code unchanged. + +Here is an example: + +```diff + import torch + import torch.nn.functional as F + from datasets import load_dataset ++ from accelerate import Accelerator + ++ accelerator = Accelerator() +- device = 'cpu' ++ device = accelerator.device + + model = torch.nn.Transformer().to(device) + optimizer = torch.optim.Adam(model.parameters()) + + dataset = load_dataset('my_dataset') + data = torch.utils.data.DataLoader(dataset, shuffle=True) + ++ model, optimizer, data = accelerator.prepare(model, optimizer, data) + + model.train() + for epoch in range(10): + for source, targets in data: + source = source.to(device) + targets = targets.to(device) + + optimizer.zero_grad() + + output = model(source) + loss = F.cross_entropy(output, targets) + +- loss.backward() ++ accelerator.backward(loss) + + optimizer.step() +``` + +As you can see in this example, by adding 5-lines to any standard PyTorch training script you can now run on any kind of single or distributed node setting (single CPU, single GPU, multi-GPUs and TPUs) as well as with or without mixed precision (fp8, fp16, bf16). + +In particular, the same code can then be run without modification on your local machine for debugging or your training environment. + +๐Ÿค— Accelerate even handles the device placement for you (which requires a few more changes to your code, but is safer in general), so you can even simplify your training loop further: + +```diff + import torch + import torch.nn.functional as F + from datasets import load_dataset ++ from accelerate import Accelerator + +- device = 'cpu' ++ accelerator = Accelerator() + +- model = torch.nn.Transformer().to(device) ++ model = torch.nn.Transformer() + optimizer = torch.optim.Adam(model.parameters()) + + dataset = load_dataset('my_dataset') + data = torch.utils.data.DataLoader(dataset, shuffle=True) + ++ model, optimizer, data = accelerator.prepare(model, optimizer, data) + + model.train() + for epoch in range(10): + for source, targets in data: +- source = source.to(device) +- targets = targets.to(device) + + optimizer.zero_grad() + + output = model(source) + loss = F.cross_entropy(output, targets) + +- loss.backward() ++ accelerator.backward(loss) + + optimizer.step() +``` + +Want to learn more? Check out the [documentation](https://huggingface.co/docs/accelerate) or have a look at our [examples](https://github.com/huggingface/accelerate/tree/main/examples). + +## Launching script + +๐Ÿค— Accelerate also provides an optional CLI tool that allows you to quickly configure and test your training environment before launching the scripts. No need to remember how to use `torch.distributed.run` or to write a specific launcher for TPU training! +On your machine(s) just run: + +```bash +accelerate config +``` + +and answer the questions asked. This will generate a config file that will be used automatically to properly set the default options when doing + +```bash +accelerate launch my_script.py --args_to_my_script +``` + +For instance, here is how you would run the GLUE example on the MRPC task (from the root of the repo): + +```bash +accelerate launch examples/nlp_example.py +``` + +This CLI tool is **optional**, and you can still use `python my_script.py` or `python -m torchrun my_script.py` at your convenience. + +You can also directly pass in the arguments you would to `torchrun` as arguments to `accelerate launch` if you wish to not run` accelerate config`. + +For example, here is how to launch on two GPUs: + +```bash +accelerate launch --multi_gpu --num_processes 2 examples/nlp_example.py +``` + +To learn more, check the CLI documentation available [here](https://huggingface.co/docs/accelerate/package_reference/cli). + +Or view the configuration zoo [here](https://github.com/huggingface/accelerate/blob/main/examples/config_yaml_templates/) + +## Launching multi-CPU run using MPI + +๐Ÿค— Here is another way to launch multi-CPU run using MPI. You can learn how to install Open MPI on [this page](https://www.open-mpi.org/faq/?category=building#easy-build). You can use Intel MPI or MVAPICH as well. +Once you have MPI setup on your cluster, just run: +```bash +accelerate config +``` +Answer the questions that are asked, selecting to run using multi-CPU, and answer "yes" when asked if you want accelerate to launch mpirun. +Then, use `accelerate launch` with your script like: +```bash +accelerate launch examples/nlp_example.py +``` +Alternatively, you can use mpirun directly, without using the CLI like: +```bash +mpirun -np 2 python examples/nlp_example.py +``` + +## Launching training using DeepSpeed + +๐Ÿค— Accelerate supports training on single/multiple GPUs using DeepSpeed. To use it, you don't need to change anything in your training code; you can set everything using just `accelerate config`. However, if you desire to tweak your DeepSpeed related args from your Python script, we provide you the `DeepSpeedPlugin`. + +```python +from accelerate import Accelerator, DeepSpeedPlugin + +# deepspeed needs to know your gradient accumulation steps beforehand, so don't forget to pass it +# Remember you still need to do gradient accumulation by yourself, just like you would have done without deepspeed +deepspeed_plugin = DeepSpeedPlugin(zero_stage=2, gradient_accumulation_steps=2) +accelerator = Accelerator(mixed_precision='fp16', deepspeed_plugin=deepspeed_plugin) + +# How to save your ๐Ÿค— Transformer? +accelerator.wait_for_everyone() +unwrapped_model = accelerator.unwrap_model(model) +unwrapped_model.save_pretrained(save_dir, save_function=accelerator.save, state_dict=accelerator.get_state_dict(model)) +``` + +Note: DeepSpeed support is experimental for now. In case you get into some problem, please open an issue. + +## Launching your training from a notebook + +๐Ÿค— Accelerate also provides a `notebook_launcher` function you can use in a notebook to launch a distributed training. This is especially useful for Colab or Kaggle notebooks with a TPU backend. Just define your training loop in a `training_function` then in your last cell, add: + +```python +from accelerate import notebook_launcher + +notebook_launcher(training_function) +``` + +An example can be found in [this notebook](https://github.com/huggingface/notebooks/blob/main/examples/accelerate_examples/simple_nlp_example.ipynb). [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/accelerate_examples/simple_nlp_example.ipynb) + +## Why should I use ๐Ÿค— Accelerate? + +You should use ๐Ÿค— Accelerate when you want to easily run your training scripts in a distributed environment without having to renounce full control over your training loop. This is not a high-level framework above PyTorch, just a thin wrapper so you don't have to learn a new library. In fact, the whole API of ๐Ÿค— Accelerate is in one class, the `Accelerator` object. + +## Why shouldn't I use ๐Ÿค— Accelerate? + +You shouldn't use ๐Ÿค— Accelerate if you don't want to write a training loop yourself. There are plenty of high-level libraries above PyTorch that will offer you that, ๐Ÿค— Accelerate is not one of them. + +## Frameworks using ๐Ÿค— Accelerate + +If you like the simplicity of ๐Ÿค— Accelerate but would prefer a higher-level abstraction around its capabilities, some frameworks and libraries that are built on top of ๐Ÿค— Accelerate are listed below: + +* [Amphion](https://github.com/open-mmlab/Amphion) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development. +* [Animus](https://github.com/Scitator/animus) is a minimalistic framework to run machine learning experiments. Animus highlights common "breakpoints" in ML experiments and provides a unified interface for them within [IExperiment](https://github.com/Scitator/animus/blob/main/animus/core.py#L76). +* [Catalyst](https://github.com/catalyst-team/catalyst#getting-started) is a PyTorch framework for Deep Learning Research and Development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write yet another train loop. Catalyst provides a [Runner](https://catalyst-team.github.io/catalyst/api/core.html#runner) to connect all parts of the experiment: hardware backend, data transformations, model training, and inference logic. +* [fastai](https://github.com/fastai/fastai#installing) is a PyTorch framework for Deep Learning that simplifies training fast and accurate neural nets using modern best practices. fastai provides a [Learner](https://docs.fast.ai/learner.html#Learner) to handle the training, fine-tuning, and inference of deep learning algorithms. +* [Finetuner](https://github.com/jina-ai/finetuner) is a service that enables models to create higher-quality embeddings for semantic search, visual similarity search, cross-modal text<->image search, recommendation systems, clustering, duplication detection, anomaly detection, or other uses. +* [InvokeAI](https://github.com/invoke-ai/InvokeAI) is a creative engine for Stable Diffusion models, offering industry-leading WebUI, terminal usage support, and serves as the foundation for many commercial products. +* [Kornia](https://kornia.readthedocs.io/en/latest/get-started/introduction.html) is a differentiable library that allows classical computer vision to be integrated into deep learning models. Kornia provides a [Trainer](https://kornia.readthedocs.io/en/latest/x.html#kornia.x.Trainer) with the specific purpose to train and fine-tune the supported deep learning algorithms within the library. +* [Open Assistant](https://projects.laion.ai/Open-Assistant/) is a chat-based assistant that understands tasks, can interact with their party systems, and retrieve information dynamically to do so. +* [pytorch-accelerated](https://github.com/Chris-hughes10/pytorch-accelerated) is a lightweight training library, with a streamlined feature set centered around a general-purpose [Trainer](https://pytorch-accelerated.readthedocs.io/en/latest/trainer.html), that places a huge emphasis on simplicity and transparency; enabling users to understand exactly what is going on under the hood, but without having to write and maintain the boilerplate themselves! +* [Stable Diffusion web UI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) is an open-source browser-based easy-to-use interface based on the Gradio library for Stable Diffusion. +* [torchkeras](https://github.com/lyhue1991/torchkeras) is a simple tool for training pytorch model just in a keras style, a dynamic and beautiful plot is provided in notebook to monitor your loss or metric. +* [transformers](https://github.com/huggingface/transformers) as a tool for helping train state-of-the-art machine learning models in PyTorch, Tensorflow, and JAX. (Accelerate is the backend for the PyTorch side). + + +## Installation + +This repository is tested on Python 3.8+ and PyTorch 1.10.0+ + +You should install ๐Ÿค— Accelerate in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). + +First, create a virtual environment with the version of Python you're going to use and activate it. + +Then, you will need to install PyTorch: refer to the [official installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform. Then ๐Ÿค— Accelerate can be installed using pip as follows: + +```bash +pip install accelerate +``` + +## Supported integrations + +- CPU only +- multi-CPU on one node (machine) +- multi-CPU on several nodes (machines) +- single GPU +- multi-GPU on one node (machine) +- multi-GPU on several nodes (machines) +- TPU +- FP16/BFloat16 mixed precision +- FP8 mixed precision with [Transformer Engine](https://github.com/NVIDIA/TransformerEngine) or [MS-AMP](https://github.com/Azure/MS-AMP/) +- DeepSpeed support (Experimental) +- PyTorch Fully Sharded Data Parallel (FSDP) support (Experimental) +- Megatron-LM support (Experimental) + +## Citing ๐Ÿค— Accelerate + +If you use ๐Ÿค— Accelerate in your publication, please cite it by using the following BibTeX entry. + +```bibtex +@Misc{accelerate, + title = {Accelerate: Training and inference at scale made simple, efficient and adaptable.}, + author = {Sylvain Gugger and Lysandre Debut and Thomas Wolf and Philipp Schmid and Zachary Mueller and Sourab Mangrulkar and Marc Sun and Benjamin Bossan}, + howpublished = {\url{https://github.com/huggingface/accelerate}}, + year = {2022} +} +``` diff --git a/agents/__init__.py b/agents/__init__.py new file mode 100644 index 0000000..70762c2 --- /dev/null +++ b/agents/__init__.py @@ -0,0 +1,69 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import TYPE_CHECKING + +from ..utils import ( + OptionalDependencyNotAvailable, + _LazyModule, + is_torch_available, +) + + +_import_structure = { + "agents": ["Agent", "CodeAgent", "ManagedAgent", "ReactAgent", "ReactCodeAgent", "ReactJsonAgent", "Toolbox"], + "llm_engine": ["HfApiEngine", "TransformersEngine"], + "monitoring": ["stream_to_gradio"], + "tools": ["PipelineTool", "Tool", "ToolCollection", "launch_gradio_demo", "load_tool", "tool"], +} + +try: + if not is_torch_available(): + raise OptionalDependencyNotAvailable() +except OptionalDependencyNotAvailable: + pass +else: + _import_structure["default_tools"] = ["FinalAnswerTool", "PythonInterpreterTool"] + _import_structure["document_question_answering"] = ["DocumentQuestionAnsweringTool"] + _import_structure["image_question_answering"] = ["ImageQuestionAnsweringTool"] + _import_structure["search"] = ["DuckDuckGoSearchTool", "VisitWebpageTool"] + _import_structure["speech_to_text"] = ["SpeechToTextTool"] + _import_structure["text_to_speech"] = ["TextToSpeechTool"] + _import_structure["translation"] = ["TranslationTool"] + +if TYPE_CHECKING: + from .agents import Agent, CodeAgent, ManagedAgent, ReactAgent, ReactCodeAgent, ReactJsonAgent, Toolbox + from .llm_engine import HfApiEngine, TransformersEngine + from .monitoring import stream_to_gradio + from .tools import PipelineTool, Tool, ToolCollection, launch_gradio_demo, load_tool, tool + + try: + if not is_torch_available(): + raise OptionalDependencyNotAvailable() + except OptionalDependencyNotAvailable: + pass + else: + from .default_tools import FinalAnswerTool, PythonInterpreterTool + from .document_question_answering import DocumentQuestionAnsweringTool + from .image_question_answering import ImageQuestionAnsweringTool + from .search import DuckDuckGoSearchTool, VisitWebpageTool + from .speech_to_text import SpeechToTextTool + from .text_to_speech import TextToSpeechTool + from .translation import TranslationTool +else: + import sys + + sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__) diff --git a/agents/agent_types.py b/agents/agent_types.py new file mode 100644 index 0000000..f5be746 --- /dev/null +++ b/agents/agent_types.py @@ -0,0 +1,260 @@ +# coding=utf-8 +# Copyright 2024 HuggingFace Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import pathlib +import tempfile +import uuid + +import numpy as np + +from ..utils import is_soundfile_availble, is_torch_available, is_vision_available, logging + + +logger = logging.get_logger(__name__) + +if is_vision_available(): + from PIL import Image + from PIL.Image import Image as ImageType +else: + ImageType = object + +if is_torch_available(): + import torch + from torch import Tensor +else: + Tensor = object + +if is_soundfile_availble(): + import soundfile as sf + + +class AgentType: + """ + Abstract class to be reimplemented to define types that can be returned by agents. + + These objects serve three purposes: + + - They behave as they were the type they're meant to be, e.g., a string for text, a PIL.Image for images + - They can be stringified: str(object) in order to return a string defining the object + - They should be displayed correctly in ipython notebooks/colab/jupyter + """ + + def __init__(self, value): + self._value = value + + def __str__(self): + return self.to_string() + + def to_raw(self): + logger.error( + "This is a raw AgentType of unknown type. Display in notebooks and string conversion will be unreliable" + ) + return self._value + + def to_string(self) -> str: + logger.error( + "This is a raw AgentType of unknown type. Display in notebooks and string conversion will be unreliable" + ) + return str(self._value) + + +class AgentText(AgentType, str): + """ + Text type returned by the agent. Behaves as a string. + """ + + def to_raw(self): + return self._value + + def to_string(self): + return str(self._value) + + +class AgentImage(AgentType, ImageType): + """ + Image type returned by the agent. Behaves as a PIL.Image. + """ + + def __init__(self, value): + AgentType.__init__(self, value) + ImageType.__init__(self) + + if not is_vision_available(): + raise ImportError("PIL must be installed in order to handle images.") + + self._path = None + self._raw = None + self._tensor = None + + if isinstance(value, ImageType): + self._raw = value + elif isinstance(value, (str, pathlib.Path)): + self._path = value + elif isinstance(value, torch.Tensor): + self._tensor = value + elif isinstance(value, np.ndarray): + self._tensor = torch.from_numpy(value) + else: + raise TypeError(f"Unsupported type for {self.__class__.__name__}: {type(value)}") + + def _ipython_display_(self, include=None, exclude=None): + """ + Displays correctly this type in an ipython notebook (ipython, colab, jupyter, ...) + """ + from IPython.display import Image, display + + display(Image(self.to_string())) + + def to_raw(self): + """ + Returns the "raw" version of that object. In the case of an AgentImage, it is a PIL.Image. + """ + if self._raw is not None: + return self._raw + + if self._path is not None: + self._raw = Image.open(self._path) + return self._raw + + if self._tensor is not None: + array = self._tensor.cpu().detach().numpy() + return Image.fromarray((255 - array * 255).astype(np.uint8)) + + def to_string(self): + """ + Returns the stringified version of that object. In the case of an AgentImage, it is a path to the serialized + version of the image. + """ + if self._path is not None: + return self._path + + if self._raw is not None: + directory = tempfile.mkdtemp() + self._path = os.path.join(directory, str(uuid.uuid4()) + ".png") + self._raw.save(self._path) + return self._path + + if self._tensor is not None: + array = self._tensor.cpu().detach().numpy() + + # There is likely simpler than load into image into save + img = Image.fromarray((255 - array * 255).astype(np.uint8)) + + directory = tempfile.mkdtemp() + self._path = os.path.join(directory, str(uuid.uuid4()) + ".png") + + img.save(self._path) + + return self._path + + def save(self, output_bytes, format, **params): + """ + Saves the image to a file. + Args: + output_bytes (bytes): The output bytes to save the image to. + format (str): The format to use for the output image. The format is the same as in PIL.Image.save. + **params: Additional parameters to pass to PIL.Image.save. + """ + img = self.to_raw() + img.save(output_bytes, format, **params) + + +class AgentAudio(AgentType, str): + """ + Audio type returned by the agent. + """ + + def __init__(self, value, samplerate=16_000): + super().__init__(value) + + if not is_soundfile_availble(): + raise ImportError("soundfile must be installed in order to handle audio.") + + self._path = None + self._tensor = None + + self.samplerate = samplerate + if isinstance(value, (str, pathlib.Path)): + self._path = value + elif is_torch_available() and isinstance(value, torch.Tensor): + self._tensor = value + elif isinstance(value, tuple): + self.samplerate = value[0] + if isinstance(value[1], np.ndarray): + self._tensor = torch.from_numpy(value[1]) + else: + self._tensor = torch.tensor(value[1]) + else: + raise ValueError(f"Unsupported audio type: {type(value)}") + + def _ipython_display_(self, include=None, exclude=None): + """ + Displays correctly this type in an ipython notebook (ipython, colab, jupyter, ...) + """ + from IPython.display import Audio, display + + display(Audio(self.to_string(), rate=self.samplerate)) + + def to_raw(self): + """ + Returns the "raw" version of that object. It is a `torch.Tensor` object. + """ + if self._tensor is not None: + return self._tensor + + if self._path is not None: + tensor, self.samplerate = sf.read(self._path) + self._tensor = torch.tensor(tensor) + return self._tensor + + def to_string(self): + """ + Returns the stringified version of that object. In the case of an AgentAudio, it is a path to the serialized + version of the audio. + """ + if self._path is not None: + return self._path + + if self._tensor is not None: + directory = tempfile.mkdtemp() + self._path = os.path.join(directory, str(uuid.uuid4()) + ".wav") + sf.write(self._path, self._tensor, samplerate=self.samplerate) + return self._path + + +AGENT_TYPE_MAPPING = {"string": AgentText, "image": AgentImage, "audio": AgentAudio} +INSTANCE_TYPE_MAPPING = {str: AgentText, ImageType: AgentImage} + +if is_torch_available(): + INSTANCE_TYPE_MAPPING[Tensor] = AgentAudio + + +def handle_agent_inputs(*args, **kwargs): + args = [(arg.to_raw() if isinstance(arg, AgentType) else arg) for arg in args] + kwargs = {k: (v.to_raw() if isinstance(v, AgentType) else v) for k, v in kwargs.items()} + return args, kwargs + + +def handle_agent_outputs(output, output_type=None): + if output_type in AGENT_TYPE_MAPPING: + # If the class has defined outputs, we can map directly according to the class definition + decoded_outputs = AGENT_TYPE_MAPPING[output_type](output) + return decoded_outputs + else: + # If the class does not have defined output, then we map according to the type + for _k, _v in INSTANCE_TYPE_MAPPING.items(): + if isinstance(output, _k): + return _v(output) + return output diff --git a/agents/agents.py b/agents/agents.py new file mode 100644 index 0000000..08c30d5 --- /dev/null +++ b/agents/agents.py @@ -0,0 +1,1278 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2024 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import json +import logging +import re +import time +from typing import Any, Callable, Dict, List, Optional, Tuple, Union + +from .. import is_torch_available +from ..utils import logging as transformers_logging +from ..utils.import_utils import is_pygments_available +from .agent_types import AgentAudio, AgentImage +from .default_tools import BASE_PYTHON_TOOLS, FinalAnswerTool, setup_default_tools +from .llm_engine import HfApiEngine, MessageRole +from .monitoring import Monitor +from .prompts import ( + DEFAULT_CODE_SYSTEM_PROMPT, + DEFAULT_REACT_CODE_SYSTEM_PROMPT, + DEFAULT_REACT_JSON_SYSTEM_PROMPT, + PLAN_UPDATE_FINAL_PLAN_REDACTION, + PROMPTS_FOR_INITIAL_PLAN, + PROMPTS_FOR_PLAN_UPDATE, + SUPPORTED_PLAN_TYPES, + SYSTEM_PROMPT_FACTS, + SYSTEM_PROMPT_FACTS_UPDATE, + USER_PROMPT_FACTS_UPDATE, +) +from .python_interpreter import LIST_SAFE_MODULES, evaluate_python_code +from .tools import ( + DEFAULT_TOOL_DESCRIPTION_TEMPLATE, + Tool, + get_tool_description_with_args, + load_tool, +) + + +if is_pygments_available(): + from pygments import highlight + from pygments.formatters import Terminal256Formatter + from pygments.lexers import PythonLexer + + +class CustomFormatter(logging.Formatter): + grey = "\x1b[38;20m" + bold_yellow = "\x1b[33;1m" + red = "\x1b[31;20m" + green = "\x1b[32;20m" + bold_green = "\x1b[32;20;1m" + bold_red = "\x1b[31;1m" + bold_white = "\x1b[37;1m" + orange = "\x1b[38;5;214m" + bold_orange = "\x1b[38;5;214;1m" + reset = "\x1b[0m" + format = "%(message)s" + + FORMATS = { + logging.DEBUG: grey + format + reset, + logging.INFO: format, + logging.WARNING: bold_yellow + format + reset, + logging.ERROR: red + format + reset, + logging.CRITICAL: bold_red + format + reset, + 31: reset + format + reset, + 32: green + format + reset, + 33: bold_green + format + reset, + 34: bold_white + format + reset, + 35: orange + format + reset, + 36: bold_orange + format + reset, + } + + def format(self, record): + log_fmt = self.FORMATS.get(record.levelno) + formatter = logging.Formatter(log_fmt) + return formatter.format(record) + + +logger = transformers_logging.get_logger(__name__) +logger.propagate = False +ch = logging.StreamHandler() +ch.setFormatter(CustomFormatter()) +logger.addHandler(ch) + + +def parse_json_blob(json_blob: str) -> Dict[str, str]: + try: + first_accolade_index = json_blob.find("{") + last_accolade_index = [a.start() for a in list(re.finditer("}", json_blob))][-1] + json_blob = json_blob[first_accolade_index : last_accolade_index + 1].replace('\\"', "'") + json_data = json.loads(json_blob, strict=False) + return json_data + except json.JSONDecodeError as e: + place = e.pos + if json_blob[place - 1 : place + 2] == "},\n": + raise ValueError( + "JSON is invalid: you probably tried to provide multiple tool calls in one action. PROVIDE ONLY ONE TOOL CALL." + ) + raise ValueError( + f"The JSON blob you used is invalid due to the following error: {e}.\n" + f"JSON blob was: {json_blob}, decoding failed on that specific part of the blob:\n" + f"'{json_blob[place-4:place+5]}'." + ) + except Exception as e: + raise ValueError(f"Error in parsing the JSON blob: {e}") + + +def parse_code_blob(code_blob: str) -> str: + try: + pattern = r"```(?:py|python)?\n(.*?)\n```" + match = re.search(pattern, code_blob, re.DOTALL) + return match.group(1).strip() + except Exception as e: + raise ValueError( + f""" +The code blob you used is invalid: due to the following error: {e} +This means that the regex pattern {pattern} was not respected: make sure to include code with the correct pattern, for instance: +Thoughts: Your thoughts +Code: +```py +# Your python code here +```""" + ) + + +def parse_json_tool_call(json_blob: str) -> Tuple[str, Dict[str, str]]: + json_blob = json_blob.replace("```json", "").replace("```", "") + tool_call = parse_json_blob(json_blob) + if "action" in tool_call and "action_input" in tool_call: + return tool_call["action"], tool_call["action_input"] + elif "action" in tool_call: + return tool_call["action"], None + else: + raise ValueError( + f"Missing keys: {[key for key in ['action', 'action_input'] if key not in tool_call]} in blob {tool_call}" + ) + + +def parse_text_tool_call(text: str) -> Tuple[str, Union[str, Dict[str, str]]]: + """ + Expects a text in the format: 'Action:', 'Action input:', 'Observation:'. 'Action input:' contains a json string with input arguments. + """ + try: + if "Observation:" in text: + text = text.split("Observation:")[0] + if "Action:" in text: + text = text.split("Action:")[1] + tool_name, tool_input = text.split("Action input:") + if "{" in tool_input: + tool_input = parse_json_blob(tool_input) + else: + tool_input = tool_input.strip().replace('"', "") + return tool_name.strip().replace('"', "").replace("\\", ""), tool_input + except Exception as e: + raise ValueError( + f"Error in parsing the text tool call: {e}. Be sure to provide the correct format. DO NOT repeat your previous incorrect tool call." + ) + + +def to_text(input: Union[List[Dict[str, str]], Dict[str, str], str]) -> str: + if isinstance(input, list): + return "\n".join([m["content"] for m in input]) + elif isinstance(input, dict): + return input["content"] + else: + return input + + +HUGGINGFACE_DEFAULT_TOOLS = {} +_tools_are_initialized = False + + +class Toolbox: + """ + The toolbox contains all tools that the agent can perform operations with, as well as a few methods to + manage them. + + Args: + tools (`List[Tool]`): + The list of tools to instantiate the toolbox with + add_base_tools (`bool`, defaults to `False`, *optional*, defaults to `False`): + Whether to add the tools available within `transformers` to the toolbox. + """ + + def __init__(self, tools: List[Tool], add_base_tools: bool = False): + self._tools = {tool.name: tool for tool in tools} + if add_base_tools: + self.add_base_tools() + self._load_tools_if_needed() + + def add_base_tools(self, add_python_interpreter: bool = False): + global _tools_are_initialized + global HUGGINGFACE_DEFAULT_TOOLS + if not _tools_are_initialized: + HUGGINGFACE_DEFAULT_TOOLS = setup_default_tools(logger) + _tools_are_initialized = True + for tool in HUGGINGFACE_DEFAULT_TOOLS.values(): + if tool.name != "python_interpreter" or add_python_interpreter: + self.add_tool(tool) + self._load_tools_if_needed() + + @property + def tools(self) -> Dict[str, Tool]: + """Get all tools currently in the toolbox""" + return self._tools + + def show_tool_descriptions(self, tool_description_template: str = None) -> str: + """ + Returns the description of all tools in the toolbox + + Args: + tool_description_template (`str`, *optional*): + The template to use to describe the tools. If not provided, the default template will be used. + """ + return "\n".join( + [get_tool_description_with_args(tool, tool_description_template) for tool in self._tools.values()] + ) + + def add_tool(self, tool: Tool): + """ + Adds a tool to the toolbox + + Args: + tool (`Tool`): + The tool to add to the toolbox. + """ + if tool.name in self._tools: + raise KeyError(f"Error: tool '{tool.name}' already exists in the toolbox.") + self._tools[tool.name] = tool + + def remove_tool(self, tool_name: str): + """ + Removes a tool from the toolbox + + Args: + tool_name (`str`): + The tool to remove from the toolbox. + """ + if tool_name not in self._tools: + raise KeyError( + f"Error: tool {tool_name} not found in toolbox for removal, should be instead one of {list(self._tools.keys())}." + ) + del self._tools[tool_name] + + def update_tool(self, tool: Tool): + """ + Updates a tool in the toolbox according to its name. + + Args: + tool (`Tool`): + The tool to update to the toolbox. + """ + if tool.name not in self._tools: + raise KeyError( + f"Error: tool {tool.name} not found in toolbox for update, should be instead one of {list(self._tools.keys())}." + ) + self._tools[tool.name] = tool + + def clear_toolbox(self): + """Clears the toolbox""" + self._tools = {} + + def _load_tools_if_needed(self): + for name, tool in self._tools.items(): + if not isinstance(tool, Tool): + task_or_repo_id = tool.task if tool.repo_id is None else tool.repo_id + self._tools[name] = load_tool(task_or_repo_id) + + def __repr__(self): + toolbox_description = "Toolbox contents:\n" + for tool in self._tools.values(): + toolbox_description += f"\t{tool.name}: {tool.description}\n" + return toolbox_description + + +class AgentError(Exception): + """Base class for other agent-related exceptions""" + + def __init__(self, message): + super().__init__(message) + self.message = message + + +class AgentParsingError(AgentError): + """Exception raised for errors in parsing in the agent""" + + pass + + +class AgentExecutionError(AgentError): + """Exception raised for errors in execution in the agent""" + + pass + + +class AgentMaxIterationsError(AgentError): + """Exception raised for errors in execution in the agent""" + + pass + + +class AgentGenerationError(AgentError): + """Exception raised for errors in generation in the agent""" + + pass + + +def format_prompt_with_tools(toolbox: Toolbox, prompt_template: str, tool_description_template: str) -> str: + tool_descriptions = toolbox.show_tool_descriptions(tool_description_template) + prompt = prompt_template.replace("<>", tool_descriptions) + + if "<>" in prompt: + tool_names = [f"'{tool_name}'" for tool_name in toolbox.tools.keys()] + prompt = prompt.replace("<>", ", ".join(tool_names)) + + return prompt + + +def show_agents_descriptions(managed_agents: list): + managed_agents_descriptions = """ +You can also give requests to team members. +Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'request', a long string explaning your request. +Given that this team member is a real human, you should be very verbose in your request. +Here is a list of the team members that you can call:""" + for agent in managed_agents.values(): + managed_agents_descriptions += f"\n- {agent.name}: {agent.description}" + return managed_agents_descriptions + + +def format_prompt_with_managed_agents_descriptions(prompt_template, managed_agents=None) -> str: + if managed_agents is not None: + return prompt_template.replace("<>", show_agents_descriptions(managed_agents)) + else: + return prompt_template.replace("<>", "") + + +def format_prompt_with_imports(prompt_template: str, authorized_imports: List[str]) -> str: + if "<>" not in prompt_template: + raise AgentError("Tag '<>' should be provided in the prompt.") + return prompt_template.replace("<>", str(authorized_imports)) + + +class Agent: + def __init__( + self, + tools: Union[List[Tool], Toolbox], + llm_engine: Callable = None, + system_prompt: Optional[str] = None, + tool_description_template: Optional[str] = None, + additional_args: Dict = {}, + max_iterations: int = 6, + tool_parser: Optional[Callable] = None, + add_base_tools: bool = False, + verbose: int = 0, + grammar: Optional[Dict[str, str]] = None, + managed_agents: Optional[List] = None, + step_callbacks: Optional[List[Callable]] = None, + monitor_metrics: bool = True, + ): + if system_prompt is None: + system_prompt = DEFAULT_REACT_CODE_SYSTEM_PROMPT + if tool_parser is None: + tool_parser = parse_json_tool_call + self.agent_name = self.__class__.__name__ + self.llm_engine = llm_engine + self.system_prompt_template = system_prompt + self.tool_description_template = ( + tool_description_template if tool_description_template else DEFAULT_TOOL_DESCRIPTION_TEMPLATE + ) + self.additional_args = additional_args + self.max_iterations = max_iterations + self.logger = logger + self.tool_parser = tool_parser + self.grammar = grammar + + self.managed_agents = None + if managed_agents is not None: + self.managed_agents = {agent.name: agent for agent in managed_agents} + + if isinstance(tools, Toolbox): + self._toolbox = tools + if add_base_tools: + if not is_torch_available(): + raise ImportError("Using the base tools requires torch to be installed.") + + self._toolbox.add_base_tools(add_python_interpreter=(self.__class__ == ReactJsonAgent)) + else: + self._toolbox = Toolbox(tools, add_base_tools=add_base_tools) + self._toolbox.add_tool(FinalAnswerTool()) + + self.system_prompt = format_prompt_with_tools( + self._toolbox, self.system_prompt_template, self.tool_description_template + ) + self.system_prompt = format_prompt_with_managed_agents_descriptions(self.system_prompt, self.managed_agents) + self.prompt = None + self.logs = [] + self.task = None + + if verbose == 0: + logger.setLevel(logging.WARNING) + elif verbose == 1: + logger.setLevel(logging.INFO) + elif verbose == 2: + logger.setLevel(logging.DEBUG) + + # Initialize step callbacks + self.step_callbacks = step_callbacks if step_callbacks is not None else [] + + # Initialize Monitor if monitor_metrics is True + self.monitor = None + if monitor_metrics: + self.monitor = Monitor(self.llm_engine) + self.step_callbacks.append(self.monitor.update_metrics) + + @property + def toolbox(self) -> Toolbox: + """Get the toolbox currently available to the agent""" + return self._toolbox + + def initialize_for_run(self): + self.token_count = 0 + self.system_prompt = format_prompt_with_tools( + self._toolbox, + self.system_prompt_template, + self.tool_description_template, + ) + self.system_prompt = format_prompt_with_managed_agents_descriptions(self.system_prompt, self.managed_agents) + if hasattr(self, "authorized_imports"): + self.system_prompt = format_prompt_with_imports( + self.system_prompt, list(set(LIST_SAFE_MODULES) | set(self.authorized_imports)) + ) + self.logs = [{"system_prompt": self.system_prompt, "task": self.task}] + self.logger.log(33, "======== New task ========") + self.logger.log(34, self.task) + self.logger.debug("System prompt is as follows:") + self.logger.debug(self.system_prompt) + + def write_inner_memory_from_logs(self, summary_mode: Optional[bool] = False) -> List[Dict[str, str]]: + """ + Reads past llm_outputs, actions, and observations or errors from the logs into a series of messages + that can be used as input to the LLM. + """ + prompt_message = {"role": MessageRole.SYSTEM, "content": self.logs[0]["system_prompt"]} + task_message = { + "role": MessageRole.USER, + "content": "Task: " + self.logs[0]["task"], + } + if summary_mode: + memory = [task_message] + else: + memory = [prompt_message, task_message] + for i, step_log in enumerate(self.logs[1:]): + if "llm_output" in step_log and not summary_mode: + thought_message = {"role": MessageRole.ASSISTANT, "content": step_log["llm_output"].strip()} + memory.append(thought_message) + if "facts" in step_log: + thought_message = { + "role": MessageRole.ASSISTANT, + "content": "[FACTS LIST]:\n" + step_log["facts"].strip(), + } + memory.append(thought_message) + + if "plan" in step_log and not summary_mode: + thought_message = {"role": MessageRole.ASSISTANT, "content": "[PLAN]:\n" + step_log["plan"].strip()} + memory.append(thought_message) + + if "tool_call" in step_log and summary_mode: + tool_call_message = { + "role": MessageRole.ASSISTANT, + "content": f"[STEP {i} TOOL CALL]: " + str(step_log["tool_call"]).strip(), + } + memory.append(tool_call_message) + + if "task" in step_log: + tool_call_message = { + "role": MessageRole.USER, + "content": "New task:\n" + step_log["task"], + } + memory.append(tool_call_message) + + if "error" in step_log or "observation" in step_log: + if "error" in step_log: + message_content = ( + f"[OUTPUT OF STEP {i}] -> Error:\n" + + str(step_log["error"]) + + "\nNow let's retry: take care not to repeat previous errors! If you have retried several times, try a completely different approach.\n" + ) + elif "observation" in step_log: + message_content = f"[OUTPUT OF STEP {i}] -> Observation:\n{step_log['observation']}" + tool_response_message = {"role": MessageRole.TOOL_RESPONSE, "content": message_content} + memory.append(tool_response_message) + + return memory + + def get_succinct_logs(self): + return [{key: value for key, value in log.items() if key != "agent_memory"} for log in self.logs] + + def extract_action(self, llm_output: str, split_token: str) -> str: + """ + Parse action from the LLM output + + Args: + llm_output (`str`): Output of the LLM + split_token (`str`): Separator for the action. Should match the example in the system prompt. + """ + try: + split = llm_output.split(split_token) + rationale, action = ( + split[-2], + split[-1], + ) # NOTE: using indexes starting from the end solves for when you have more than one split_token in the output + except Exception as e: + self.logger.error(e, exc_info=1) + raise AgentParsingError( + f"Error: No '{split_token}' token provided in your output.\nYour output:\n{llm_output}\n. Be sure to include an action, prefaced with '{split_token}'!" + ) + return rationale.strip(), action.strip() + + def execute_tool_call(self, tool_name: str, arguments: Dict[str, str]) -> Any: + """ + Execute tool with the provided input and returns the result. + This method replaces arguments with the actual values from the state if they refer to state variables. + + Args: + tool_name (`str`): Name of the Tool to execute (should be one from self.toolbox). + arguments (Dict[str, str]): Arguments passed to the Tool. + """ + available_tools = self.toolbox.tools + if self.managed_agents is not None: + available_tools = {**available_tools, **self.managed_agents} + if tool_name not in available_tools: + error_msg = f"Error: unknown tool {tool_name}, should be instead one of {list(available_tools.keys())}." + self.logger.error(error_msg, exc_info=1) + raise AgentExecutionError(error_msg) + + try: + if isinstance(arguments, str): + observation = available_tools[tool_name](arguments) + elif isinstance(arguments, dict): + for key, value in arguments.items(): + # if the value is the name of a state variable like "image.png", replace it with the actual value + if isinstance(value, str) and value in self.state: + arguments[key] = self.state[value] + observation = available_tools[tool_name](**arguments) + else: + raise AgentExecutionError( + f"Arguments passed to tool should be a dict or string: got a {type(arguments)}." + ) + return observation + except Exception as e: + if tool_name in self.toolbox.tools: + raise AgentExecutionError( + f"Error in tool call execution: {e}\nYou should only use this tool with a correct input.\n" + f"As a reminder, this tool's description is the following:\n{get_tool_description_with_args(available_tools[tool_name])}" + ) + elif tool_name in self.managed_agents: + raise AgentExecutionError( + f"Error in calling team member: {e}\nYou should only ask this team member with a correct request.\n" + f"As a reminder, this team member's description is the following:\n{available_tools[tool_name]}" + ) + + def log_rationale_code_action(self, rationale: str, code_action: str) -> None: + self.logger.warning("=== Agent thoughts:") + self.logger.log(31, rationale) + self.logger.warning(">>> Agent is executing the code below:") + if is_pygments_available(): + self.logger.log( + 31, highlight(code_action, PythonLexer(ensurenl=False), Terminal256Formatter(style="nord")) + ) + else: + self.logger.log(31, code_action) + self.logger.warning("====") + + def run(self, **kwargs): + """To be implemented in the child class""" + raise NotImplementedError + + +class CodeAgent(Agent): + """ + A class for an agent that solves the given task using a single block of code. It plans all its actions, then executes all in one shot. + """ + + def __init__( + self, + tools: List[Tool], + llm_engine: Optional[Callable] = None, + system_prompt: Optional[str] = None, + tool_description_template: Optional[str] = None, + grammar: Optional[Dict[str, str]] = None, + additional_authorized_imports: Optional[List[str]] = None, + **kwargs, + ): + if llm_engine is None: + llm_engine = HfApiEngine() + if system_prompt is None: + system_prompt = DEFAULT_CODE_SYSTEM_PROMPT + if tool_description_template is None: + tool_description_template = DEFAULT_TOOL_DESCRIPTION_TEMPLATE + super().__init__( + tools=tools, + llm_engine=llm_engine, + system_prompt=system_prompt, + tool_description_template=tool_description_template, + grammar=grammar, + **kwargs, + ) + + if not is_pygments_available(): + transformers_logging.warning_once( + logger, + "pygments isn't installed. Installing pygments will enable color syntax highlighting in the " + "CodeAgent.", + ) + + self.python_evaluator = evaluate_python_code + self.additional_authorized_imports = additional_authorized_imports if additional_authorized_imports else [] + self.authorized_imports = list(set(LIST_SAFE_MODULES) | set(self.additional_authorized_imports)) + self.system_prompt = self.system_prompt.replace("<>", str(self.authorized_imports)) + + def parse_code_blob(self, result: str) -> str: + """ + Override this method if you want to change the way the code is + cleaned in the `run` method. + """ + return parse_code_blob(result) + + def run(self, task: str, return_generated_code: bool = False, **kwargs): + """ + Runs the agent for the given task. + + Args: + task (`str`): The task to perform + return_generated_code (`bool`, *optional*, defaults to `False`): Whether to return the generated code instead of running it + kwargs (additional keyword arguments, *optional*): + Any keyword argument to send to the agent when evaluating the code. + + Example: + + ```py + from transformers.agents import CodeAgent + + agent = CodeAgent(tools=[]) + agent.run("What is the result of 2 power 3.7384?") + ``` + """ + self.task = task + if len(kwargs) > 0: + self.task += f"\nYou have been provided with these initial arguments: {str(kwargs)}." + self.state = kwargs.copy() + self.initialize_for_run() + + # Run LLM + prompt_message = {"role": MessageRole.SYSTEM, "content": self.system_prompt} + task_message = { + "role": MessageRole.USER, + "content": "Task: " + self.task, + } + + self.prompt = [prompt_message, task_message] + self.logger.info("====Executing with this prompt====") + self.logger.info(self.prompt) + + additional_args = {"grammar": self.grammar} if self.grammar is not None else {} + llm_output = self.llm_engine(self.prompt, stop_sequences=[""], **additional_args) + + if return_generated_code: + return llm_output + + # Parse + try: + rationale, code_action = self.extract_action(llm_output=llm_output, split_token="Code:") + except Exception as e: + self.logger.debug( + f"Error in extracting action, trying to parse the whole output as code. Error trace: {e}" + ) + rationale, code_action = "", llm_output + + try: + code_action = self.parse_code_blob(code_action) + except Exception as e: + error_msg = f"Error in code parsing: {e}. Be sure to provide correct code" + self.logger.error(error_msg, exc_info=1) + return error_msg + + # Execute + self.log_rationale_code_action(rationale, code_action) + try: + available_tools = {**BASE_PYTHON_TOOLS.copy(), **self.toolbox.tools} + output = self.python_evaluator( + code_action, + static_tools=available_tools, + custom_tools={}, + state=self.state, + authorized_imports=self.authorized_imports, + ) + self.logger.info(self.state["print_outputs"]) + return output + except Exception as e: + error_msg = f"Error in execution: {e}. Be sure to provide correct code." + self.logger.error(error_msg, exc_info=1) + return error_msg + + +class ReactAgent(Agent): + """ + This agent that solves the given task step by step, using the ReAct framework: + While the objective is not reached, the agent will perform a cycle of thinking and acting. + The action will be parsed from the LLM output: it consists in calls to tools from the toolbox, with arguments chosen by the LLM engine. + """ + + def __init__( + self, + tools: List[Tool], + llm_engine: Optional[Callable] = None, + system_prompt: Optional[str] = None, + tool_description_template: Optional[str] = None, + grammar: Optional[Dict[str, str]] = None, + plan_type: Optional[str] = None, + planning_interval: Optional[int] = None, + **kwargs, + ): + if llm_engine is None: + llm_engine = HfApiEngine() + if system_prompt is None: + system_prompt = DEFAULT_REACT_CODE_SYSTEM_PROMPT + if tool_description_template is None: + tool_description_template = DEFAULT_TOOL_DESCRIPTION_TEMPLATE + if plan_type is None: + plan_type = SUPPORTED_PLAN_TYPES[0] + else: + assert plan_type in SUPPORTED_PLAN_TYPES, f"plan type {plan_type} is not supported" + super().__init__( + tools=tools, + llm_engine=llm_engine, + system_prompt=system_prompt, + tool_description_template=tool_description_template, + grammar=grammar, + **kwargs, + ) + self.planning_interval = planning_interval + self.plan_type = plan_type + + def provide_final_answer(self, task) -> str: + """ + This method provides a final answer to the task, based on the logs of the agent's interactions. + """ + self.prompt = [ + { + "role": MessageRole.SYSTEM, + "content": "An agent tried to answer an user query but it got stuck and failed to do so. You are tasked with providing an answer instead. Here is the agent's memory:", + } + ] + self.prompt += self.write_inner_memory_from_logs()[1:] + self.prompt += [ + { + "role": MessageRole.USER, + "content": f"Based on the above, please provide an answer to the following user request:\n{task}", + } + ] + try: + return self.llm_engine(self.prompt) + except Exception as e: + return f"Error in generating final llm output: {e}." + + def run(self, task: str, stream: bool = False, reset: bool = True, **kwargs): + """ + Runs the agent for the given task. + + Args: + task (`str`): The task to perform + + Example: + ```py + from transformers.agents import ReactCodeAgent + agent = ReactCodeAgent(tools=[]) + agent.run("What is the result of 2 power 3.7384?") + ``` + """ + self.task = task + if len(kwargs) > 0: + self.task += f"\nYou have been provided with these initial arguments: {str(kwargs)}." + self.state = kwargs.copy() + if reset: + self.initialize_for_run() + else: + self.logs.append({"task": task}) + if stream: + return self.stream_run(task) + else: + return self.direct_run(task) + + def stream_run(self, task: str): + """ + Runs the agent in streaming mode, yielding steps as they are executed: should be launched only in the `run` method. + """ + final_answer = None + iteration = 0 + while final_answer is None and iteration < self.max_iterations: + step_start_time = time.time() + step_log_entry = {"iteration": iteration, "start_time": step_start_time} + try: + self.step(step_log_entry) + if "final_answer" in step_log_entry: + final_answer = step_log_entry["final_answer"] + except AgentError as e: + self.logger.error(e, exc_info=1) + step_log_entry["error"] = e + finally: + step_end_time = time.time() + step_log_entry["step_end_time"] = step_end_time + step_log_entry["step_duration"] = step_end_time - step_start_time + self.logs.append(step_log_entry) + for callback in self.step_callbacks: + callback(step_log_entry) + iteration += 1 + yield step_log_entry + + if final_answer is None and iteration == self.max_iterations: + error_message = "Reached max iterations." + final_step_log = {"error": AgentMaxIterationsError(error_message)} + self.logs.append(final_step_log) + self.logger.error(error_message, exc_info=1) + final_answer = self.provide_final_answer(task) + final_step_log["final_answer"] = final_answer + final_step_log["step_duration"] = 0 + for callback in self.step_callbacks: + callback(final_step_log) + yield final_step_log + + yield final_answer + + def direct_run(self, task: str): + """ + Runs the agent in direct mode, returning outputs only at the end: should be launched only in the `run` method. + """ + final_answer = None + iteration = 0 + while final_answer is None and iteration < self.max_iterations: + step_start_time = time.time() + step_log_entry = {"iteration": iteration, "start_time": step_start_time} + try: + if self.planning_interval is not None and iteration % self.planning_interval == 0: + self.planning_step(task, is_first_step=(iteration == 0), iteration=iteration) + self.step(step_log_entry) + if "final_answer" in step_log_entry: + final_answer = step_log_entry["final_answer"] + except AgentError as e: + self.logger.error(e, exc_info=1) + step_log_entry["error"] = e + finally: + step_end_time = time.time() + step_log_entry["step_end_time"] = step_end_time + step_log_entry["step_duration"] = step_end_time - step_start_time + self.logs.append(step_log_entry) + for callback in self.step_callbacks: + callback(step_log_entry) + iteration += 1 + + if final_answer is None and iteration == self.max_iterations: + error_message = "Reached max iterations." + final_step_log = {"error": AgentMaxIterationsError(error_message)} + self.logs.append(final_step_log) + self.logger.error(error_message, exc_info=1) + final_answer = self.provide_final_answer(task) + final_step_log["final_answer"] = final_answer + final_step_log["step_duration"] = 0 + for callback in self.step_callbacks: + callback(final_step_log) + + return final_answer + + def planning_step(self, task, is_first_step: bool = False, iteration: int = None): + """ + Used periodically by the agent to plan the next steps to reach the objective. + + Args: + task (`str`): The task to perform + is_first_step (`bool`): If this step is not the first one, the plan should be an update over a previous plan. + iteration (`int`): The number of the current step, used as an indication for the LLM. + """ + if is_first_step: + message_prompt_facts = {"role": MessageRole.SYSTEM, "content": SYSTEM_PROMPT_FACTS} + message_prompt_task = { + "role": MessageRole.USER, + "content": f"""Here is the task: +``` +{task} +``` +Now begin!""", + } + + answer_facts = self.llm_engine([message_prompt_facts, message_prompt_task]) + + message_system_prompt_plan = { + "role": MessageRole.SYSTEM, + "content": PROMPTS_FOR_INITIAL_PLAN[self.plan_type]["system"], + } + message_user_prompt_plan = { + "role": MessageRole.USER, + "content": PROMPTS_FOR_INITIAL_PLAN[self.plan_type]["user"].format( + task=task, + tool_descriptions=self._toolbox.show_tool_descriptions(self.tool_description_template), + managed_agents_descriptions=( + show_agents_descriptions(self.managed_agents) if self.managed_agents is not None else "" + ), + answer_facts=answer_facts, + ), + } + answer_plan = self.llm_engine( + [message_system_prompt_plan, message_user_prompt_plan], stop_sequences=[""] + ) + + final_plan_redaction = f"""Here is the plan of action that I will follow to solve the task: +``` +{answer_plan} +```""" + final_facts_redaction = f"""Here are the facts that I know so far: +``` +{answer_facts} +```""".strip() + self.logs.append({"plan": final_plan_redaction, "facts": final_facts_redaction}) + self.logger.log(36, "===== Initial plan =====") + self.logger.log(35, final_plan_redaction) + else: # update plan + agent_memory = self.write_inner_memory_from_logs( + summary_mode=False + ) # This will not log the plan but will log facts + + # Redact updated facts + facts_update_system_prompt = { + "role": MessageRole.SYSTEM, + "content": SYSTEM_PROMPT_FACTS_UPDATE, + } + facts_update_message = { + "role": MessageRole.USER, + "content": USER_PROMPT_FACTS_UPDATE, + } + facts_update = self.llm_engine([facts_update_system_prompt] + agent_memory + [facts_update_message]) + + # Redact updated plan + plan_update_message = { + "role": MessageRole.SYSTEM, + "content": PROMPTS_FOR_PLAN_UPDATE[self.plan_type]["system"].format(task=task), + } + plan_update_message_user = { + "role": MessageRole.USER, + "content": PROMPTS_FOR_PLAN_UPDATE[self.plan_type]["user"].format( + task=task, + tool_descriptions=self._toolbox.show_tool_descriptions(self.tool_description_template), + managed_agents_descriptions=( + show_agents_descriptions(self.managed_agents) if self.managed_agents is not None else "" + ), + facts_update=facts_update, + remaining_steps=(self.max_iterations - iteration), + ), + } + plan_update = self.llm_engine( + [plan_update_message] + agent_memory + [plan_update_message_user], stop_sequences=[""] + ) + + # Log final facts and plan + final_plan_redaction = PLAN_UPDATE_FINAL_PLAN_REDACTION.format(task=task, plan_update=plan_update) + final_facts_redaction = f"""Here is the updated list of the facts that I know: +``` +{facts_update} +```""" + self.logs.append({"plan": final_plan_redaction, "facts": final_facts_redaction}) + self.logger.log(36, "===== Updated plan =====") + self.logger.log(35, final_plan_redaction) + + +class ReactJsonAgent(ReactAgent): + """ + This agent that solves the given task step by step, using the ReAct framework: + While the objective is not reached, the agent will perform a cycle of thinking and acting. + The tool calls will be formulated by the LLM in JSON format, then parsed and executed. + """ + + def __init__( + self, + tools: List[Tool], + llm_engine: Optional[Callable] = None, + system_prompt: Optional[str] = None, + tool_description_template: Optional[str] = None, + grammar: Optional[Dict[str, str]] = None, + planning_interval: Optional[int] = None, + **kwargs, + ): + if llm_engine is None: + llm_engine = HfApiEngine() + if system_prompt is None: + system_prompt = DEFAULT_REACT_JSON_SYSTEM_PROMPT + if tool_description_template is None: + tool_description_template = DEFAULT_TOOL_DESCRIPTION_TEMPLATE + super().__init__( + tools=tools, + llm_engine=llm_engine, + system_prompt=system_prompt, + tool_description_template=tool_description_template, + grammar=grammar, + planning_interval=planning_interval, + **kwargs, + ) + + def step(self, log_entry: Dict[str, Any]): + """ + Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. + The errors are raised here, they are caught and logged in the run() method. + """ + agent_memory = self.write_inner_memory_from_logs() + + self.prompt = agent_memory + self.logger.debug("===== New step =====") + + # Add new step in logs + log_entry["agent_memory"] = agent_memory.copy() + + self.logger.info("===== Calling LLM with this last message: =====") + self.logger.info(self.prompt[-1]) + + try: + additional_args = {"grammar": self.grammar} if self.grammar is not None else {} + llm_output = self.llm_engine( + self.prompt, stop_sequences=["", "Observation:"], **additional_args + ) + except Exception as e: + raise AgentGenerationError(f"Error in generating llm output: {e}.") + self.logger.debug("===== Output message of the LLM: =====") + self.logger.debug(llm_output) + log_entry["llm_output"] = llm_output + + # Parse + self.logger.debug("===== Extracting action =====") + rationale, action = self.extract_action(llm_output=llm_output, split_token="Action:") + + try: + tool_name, arguments = self.tool_parser(action) + except Exception as e: + raise AgentParsingError(f"Could not parse the given action: {e}.") + + log_entry["rationale"] = rationale + log_entry["tool_call"] = {"tool_name": tool_name, "tool_arguments": arguments} + + # Execute + self.logger.warning("=== Agent thoughts:") + self.logger.log(31, rationale) + self.logger.warning(f">>> Calling tool: '{tool_name}' with arguments: {arguments}") + if tool_name == "final_answer": + if isinstance(arguments, dict): + if "answer" in arguments: + answer = arguments["answer"] + if ( + isinstance(answer, str) and answer in self.state.keys() + ): # if the answer is a state variable, return the value + answer = self.state[answer] + else: + answer = arguments + else: + answer = arguments + log_entry["final_answer"] = answer + return answer + else: + if arguments is None: + arguments = {} + observation = self.execute_tool_call(tool_name, arguments) + observation_type = type(observation) + if observation_type in [AgentImage, AgentAudio]: + if observation_type == AgentImage: + observation_name = "image.png" + elif observation_type == AgentAudio: + observation_name = "audio.mp3" + # TODO: observation naming could allow for different names of same type + + self.state[observation_name] = observation + updated_information = f"Stored '{observation_name}' in memory." + else: + updated_information = str(observation).strip() + self.logger.info(updated_information) + log_entry["observation"] = updated_information + return log_entry + + +class ReactCodeAgent(ReactAgent): + """ + This agent that solves the given task step by step, using the ReAct framework: + While the objective is not reached, the agent will perform a cycle of thinking and acting. + The tool calls will be formulated by the LLM in code format, then parsed and executed. + """ + + def __init__( + self, + tools: List[Tool], + llm_engine: Optional[Callable] = None, + system_prompt: Optional[str] = None, + tool_description_template: Optional[str] = None, + grammar: Optional[Dict[str, str]] = None, + additional_authorized_imports: Optional[List[str]] = None, + planning_interval: Optional[int] = None, + **kwargs, + ): + if llm_engine is None: + llm_engine = HfApiEngine() + if system_prompt is None: + system_prompt = DEFAULT_REACT_CODE_SYSTEM_PROMPT + if tool_description_template is None: + tool_description_template = DEFAULT_TOOL_DESCRIPTION_TEMPLATE + super().__init__( + tools=tools, + llm_engine=llm_engine, + system_prompt=system_prompt, + tool_description_template=tool_description_template, + grammar=grammar, + planning_interval=planning_interval, + **kwargs, + ) + + if not is_pygments_available(): + transformers_logging.warning_once( + logger, + "pygments isn't installed. Installing pygments will enable color syntax highlighting in the " + "ReactCodeAgent.", + ) + + self.python_evaluator = evaluate_python_code + self.additional_authorized_imports = additional_authorized_imports if additional_authorized_imports else [] + self.authorized_imports = list(set(LIST_SAFE_MODULES) | set(self.additional_authorized_imports)) + self.system_prompt = self.system_prompt.replace("<>", str(self.authorized_imports)) + self.custom_tools = {} + + def step(self, log_entry: Dict[str, Any]): + """ + Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. + The errors are raised here, they are caught and logged in the run() method. + """ + agent_memory = self.write_inner_memory_from_logs() + + self.prompt = agent_memory.copy() + self.logger.debug("===== New step =====") + + # Add new step in logs + log_entry["agent_memory"] = agent_memory.copy() + + self.logger.info("===== Calling LLM with these last messages: =====") + self.logger.info(self.prompt[-2:]) + + try: + additional_args = {"grammar": self.grammar} if self.grammar is not None else {} + llm_output = self.llm_engine( + self.prompt, stop_sequences=["", "Observation:"], **additional_args + ) + except Exception as e: + raise AgentGenerationError(f"Error in generating llm output: {e}.") + + self.logger.debug("=== Output message of the LLM:") + self.logger.debug(llm_output) + log_entry["llm_output"] = llm_output + + # Parse + self.logger.debug("=== Extracting action ===") + try: + rationale, raw_code_action = self.extract_action(llm_output=llm_output, split_token="Code:") + except Exception as e: + self.logger.debug(f"Error in extracting action, trying to parse the whole output. Error trace: {e}") + rationale, raw_code_action = llm_output, llm_output + + try: + code_action = parse_code_blob(raw_code_action) + except Exception as e: + error_msg = f"Error in code parsing: {e}. Make sure to provide correct code" + raise AgentParsingError(error_msg) + + log_entry["rationale"] = rationale + log_entry["tool_call"] = {"tool_name": "code interpreter", "tool_arguments": code_action} + + # Execute + self.log_rationale_code_action(rationale, code_action) + try: + static_tools = { + **BASE_PYTHON_TOOLS.copy(), + **self.toolbox.tools, + } + if self.managed_agents is not None: + static_tools = {**static_tools, **self.managed_agents} + result = self.python_evaluator( + code_action, + static_tools=static_tools, + custom_tools=self.custom_tools, + state=self.state, + authorized_imports=self.authorized_imports, + ) + self.logger.warning("Print outputs:") + self.logger.log(32, self.state["print_outputs"]) + observation = "Print outputs:\n" + self.state["print_outputs"] + if result is not None: + self.logger.warning("Last output from code snippet:") + self.logger.log(32, str(result)) + observation += "Last output from code snippet:\n" + str(result)[:100000] + log_entry["observation"] = observation + except Exception as e: + error_msg = f"Code execution failed due to the following error:\n{str(e)}" + if "'dict' object has no attribute 'read'" in str(e): + error_msg += "\nYou get this error because you passed a dict as input for one of the arguments instead of a string." + raise AgentExecutionError(error_msg) + for line in code_action.split("\n"): + if line[: len("final_answer")] == "final_answer": + self.logger.log(33, "Final answer:") + self.logger.log(32, result) + log_entry["final_answer"] = result + return result + + +LENGTH_TRUNCATE_REPORTS = 1000 + + +class ManagedAgent: + def __init__(self, agent, name, description, additional_prompting=None, provide_run_summary=False): + self.agent = agent + self.name = name + self.description = description + self.additional_prompting = additional_prompting + self.provide_run_summary = provide_run_summary + + def write_full_task(self, task): + full_task = f"""You're a helpful agent named '{self.name}'. +You have been submitted this task by your manager. +--- +Task: +{task} +--- +You're helping your manager solve a wider task: so make sure to not provide a one-line answer, but give as much information as possible so that they have a clear understanding of the answer. + +Your final_answer WILL HAVE to contain these parts: +### 1. Task outcome (short version): +### 2. Task outcome (extremely detailed version): +### 3. Additional context (if relevant): + +Put all these in your final_answer tool, everything that you do not pass as an argument to final_answer will be lost. +And even if your task resolution is not successful, please return as much context as possible, so that your manager can act upon this feedback. +<>""" + if self.additional_prompting: + full_task = full_task.replace("\n<>", self.additional_prompting).strip() + else: + full_task = full_task.replace("\n<>", "").strip() + return full_task + + def __call__(self, request, **kwargs): + full_task = self.write_full_task(request) + output = self.agent.run(full_task, **kwargs) + if self.provide_run_summary: + answer = f"Here is the final answer from your managed agent '{self.name}':\n" + answer += str(output) + answer += f"\n\nFor more detail, find below a summary of this agent's work:\nSUMMARY OF WORK FROM AGENT '{self.name}':\n" + for message in self.agent.write_inner_memory_from_logs(summary_mode=True): + content = message["content"] + if len(str(content)) < LENGTH_TRUNCATE_REPORTS or "[FACTS LIST]" in str(content): + answer += "\n" + str(content) + "\n---" + else: + answer += ( + "\n" + + str(content)[:LENGTH_TRUNCATE_REPORTS] + + "\n(...Step was truncated because too long)...\n---" + ) + answer += f"\nEND OF SUMMARY OF WORK FROM AGENT '{self.name}'." + return answer + else: + return output diff --git a/agents/default_tools.py b/agents/default_tools.py new file mode 100644 index 0000000..3946aa9 --- /dev/null +++ b/agents/default_tools.py @@ -0,0 +1,187 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2024 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import importlib.util +import json +import math +from dataclasses import dataclass +from math import sqrt +from typing import Dict + +from huggingface_hub import hf_hub_download, list_spaces + +from ..utils import is_offline_mode +from .python_interpreter import LIST_SAFE_MODULES, evaluate_python_code +from .tools import TOOL_CONFIG_FILE, TOOL_MAPPING, Tool + + +def custom_print(*args): + return None + + +BASE_PYTHON_TOOLS = { + "print": custom_print, + "isinstance": isinstance, + "range": range, + "float": float, + "int": int, + "bool": bool, + "str": str, + "set": set, + "list": list, + "dict": dict, + "tuple": tuple, + "round": round, + "ceil": math.ceil, + "floor": math.floor, + "log": math.log, + "exp": math.exp, + "sin": math.sin, + "cos": math.cos, + "tan": math.tan, + "asin": math.asin, + "acos": math.acos, + "atan": math.atan, + "atan2": math.atan2, + "degrees": math.degrees, + "radians": math.radians, + "pow": math.pow, + "sqrt": sqrt, + "len": len, + "sum": sum, + "max": max, + "min": min, + "abs": abs, + "enumerate": enumerate, + "zip": zip, + "reversed": reversed, + "sorted": sorted, + "all": all, + "any": any, + "map": map, + "filter": filter, + "ord": ord, + "chr": chr, + "next": next, + "iter": iter, + "divmod": divmod, + "callable": callable, + "getattr": getattr, + "hasattr": hasattr, + "setattr": setattr, + "issubclass": issubclass, + "type": type, +} + + +@dataclass +class PreTool: + name: str + inputs: Dict[str, str] + output_type: type + task: str + description: str + repo_id: str + + +HUGGINGFACE_DEFAULT_TOOLS_FROM_HUB = [ + "image-transformation", + "text-to-image", +] + + +def get_remote_tools(logger, organization="huggingface-tools"): + if is_offline_mode(): + logger.info("You are in offline mode, so remote tools are not available.") + return {} + + spaces = list_spaces(author=organization) + tools = {} + for space_info in spaces: + repo_id = space_info.id + resolved_config_file = hf_hub_download(repo_id, TOOL_CONFIG_FILE, repo_type="space") + with open(resolved_config_file, encoding="utf-8") as reader: + config = json.load(reader) + task = repo_id.split("/")[-1] + tools[config["name"]] = PreTool( + task=task, + description=config["description"], + repo_id=repo_id, + name=task, + inputs=config["inputs"], + output_type=config["output_type"], + ) + + return tools + + +def setup_default_tools(logger): + default_tools = {} + main_module = importlib.import_module("transformers") + tools_module = main_module.agents + + for task_name, tool_class_name in TOOL_MAPPING.items(): + tool_class = getattr(tools_module, tool_class_name) + tool_instance = tool_class() + default_tools[tool_class.name] = PreTool( + name=tool_instance.name, + inputs=tool_instance.inputs, + output_type=tool_instance.output_type, + task=task_name, + description=tool_instance.description, + repo_id=None, + ) + + return default_tools + + +class PythonInterpreterTool(Tool): + name = "python_interpreter" + description = "This is a tool that evaluates python code. It can be used to perform calculations." + + output_type = "string" + + def __init__(self, *args, authorized_imports=None, **kwargs): + if authorized_imports is None: + self.authorized_imports = list(set(LIST_SAFE_MODULES)) + else: + self.authorized_imports = list(set(LIST_SAFE_MODULES) | set(authorized_imports)) + self.inputs = { + "code": { + "type": "string", + "description": ( + "The code snippet to evaluate. All variables used in this snippet must be defined in this same snippet, " + f"else you will get an error. This code can only import the following python libraries: {authorized_imports}." + ), + } + } + super().__init__(*args, **kwargs) + + def forward(self, code): + output = str( + evaluate_python_code(code, static_tools=BASE_PYTHON_TOOLS, authorized_imports=self.authorized_imports) + ) + return output + + +class FinalAnswerTool(Tool): + name = "final_answer" + description = "Provides a final answer to the given problem." + inputs = {"answer": {"type": "any", "description": "The final answer to the problem"}} + output_type = "any" + + def forward(self, answer): + return answer diff --git a/agents/document_question_answering.py b/agents/document_question_answering.py new file mode 100644 index 0000000..23ae5b0 --- /dev/null +++ b/agents/document_question_answering.py @@ -0,0 +1,89 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import re + +import numpy as np +import torch + +from ..models.auto import AutoProcessor +from ..models.vision_encoder_decoder import VisionEncoderDecoderModel +from ..utils import is_vision_available +from .tools import PipelineTool + + +if is_vision_available(): + from PIL import Image + + +class DocumentQuestionAnsweringTool(PipelineTool): + default_checkpoint = "naver-clova-ix/donut-base-finetuned-docvqa" + description = "This is a tool that answers a question about an document (pdf). It returns a string that contains the answer to the question." + name = "document_qa" + pre_processor_class = AutoProcessor + model_class = VisionEncoderDecoderModel + + inputs = { + "document": { + "type": "image", + "description": "The image containing the information. Can be a PIL Image or a string path to the image.", + }, + "question": {"type": "string", "description": "The question in English"}, + } + output_type = "string" + + def __init__(self, *args, **kwargs): + if not is_vision_available(): + raise ValueError("Pillow must be installed to use the DocumentQuestionAnsweringTool.") + + super().__init__(*args, **kwargs) + + def encode(self, document: "Image", question: str): + task_prompt = "{user_input}" + prompt = task_prompt.replace("{user_input}", question) + decoder_input_ids = self.pre_processor.tokenizer( + prompt, add_special_tokens=False, return_tensors="pt" + ).input_ids + if isinstance(document, str): + img = Image.open(document).convert("RGB") + img_array = np.array(img).transpose(2, 0, 1) + document = torch.from_numpy(img_array) + pixel_values = self.pre_processor(document, return_tensors="pt").pixel_values + + return {"decoder_input_ids": decoder_input_ids, "pixel_values": pixel_values} + + def forward(self, inputs): + return self.model.generate( + inputs["pixel_values"].to(self.device), + decoder_input_ids=inputs["decoder_input_ids"].to(self.device), + max_length=self.model.decoder.config.max_position_embeddings, + early_stopping=True, + pad_token_id=self.pre_processor.tokenizer.pad_token_id, + eos_token_id=self.pre_processor.tokenizer.eos_token_id, + use_cache=True, + num_beams=1, + bad_words_ids=[[self.pre_processor.tokenizer.unk_token_id]], + return_dict_in_generate=True, + ).sequences + + def decode(self, outputs): + sequence = self.pre_processor.batch_decode(outputs)[0] + sequence = sequence.replace(self.pre_processor.tokenizer.eos_token, "") + sequence = sequence.replace(self.pre_processor.tokenizer.pad_token, "") + sequence = re.sub(r"<.*?>", "", sequence, count=1).strip() # remove first task start token + sequence = self.pre_processor.token2json(sequence) + + return sequence["answer"] diff --git a/agents/evaluate_agent.py b/agents/evaluate_agent.py new file mode 100644 index 0000000..90dfd4f --- /dev/null +++ b/agents/evaluate_agent.py @@ -0,0 +1,414 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from .agents import BASE_PYTHON_TOOLS +from .python_interpreter import InterpreterError, evaluate + + +### Fake tools for test +def classifier(text, labels): + return f"This is the classification of {text} along {labels}." + + +def translator(text, src_lang, tgt_lang): + return f"This is the translation of {text} from {src_lang} to {tgt_lang}." + + +def speaker(text): + return f"This is actually a sound reading {text}." + + +def transcriber(audio): + if "sound" not in audio: + raise ValueError(f"`audio` ({audio}) is not a sound.") + return f"This is the transcribed text from {audio}." + + +def image_generator(prompt): + return f"This is actually an image representing {prompt}." + + +def image_captioner(image): + if "image" not in image: + raise ValueError(f"`image` ({image}) is not an image.") + return f"This is a description of {image}." + + +def image_transformer(image, prompt): + if "image" not in image: + raise ValueError(f"`image` ({image}) is not an image.") + return f"This is a transformation of {image} according to {prompt}." + + +def question_answerer(text, question): + return f"This is the answer to {question} from {text}." + + +def image_qa(image, question): + if "image" not in image: + raise ValueError(f"`image` ({image}) is not an image.") + return f"This is the answer to {question} from {image}." + + +def text_downloader(url): + return f"This is the content of {url}." + + +def summarizer(text): + return f"This is a summary of {text}." + + +def video_generator(prompt, seconds=2): + return f"A video of {prompt}" + + +def document_qa(image, question): + return f"This is the answer to {question} from the document {image}." + + +def image_segmenter(image, prompt): + return f"This is the mask of {prompt} in {image}" + + +TEST_TOOLS = { + "text_classifier": classifier, + "translator": translator, + "text_reader": speaker, + "summarizer": summarizer, + "transcriber": transcriber, + "image_generator": image_generator, + "image_captioner": image_captioner, + "image_transformer": image_transformer, + "text_qa": question_answerer, + "text_downloader": text_downloader, + "image_qa": image_qa, + "video_generator": video_generator, + "document_qa": document_qa, + "image_segmenter": image_segmenter, +} + + +class Problem: + """ + A class regrouping all the information to solve a problem on which we will evaluate agents. + + Args: + task (`str` ou `list[str]`): + One or several descriptions of the task to perform. If a list, it should contain variations on the + phrasing, but for the same task. + inputs (`list[str]` or `dict[str, str]`): + The inputs that will be fed to the tools. For this testing environment, only strings are accepted as + values. Pass along a dictionary when you want to specify the values of each inputs, or just the list of + inputs expected (the value used will be `<>` in this case). + answer (`str` or `list[str]`): + The theoretical answer (or list of possible valid answers) to the problem, as code. + """ + + def __init__(self, task, inputs, answer): + self.task = task + self.inputs = inputs + self.answer = answer + + +### The list of problems the agent will be evaluated on. +EVALUATION_TASKS = [ + Problem( + task=[ + "Is the following `text` (in Spanish) positive or negative?", + "Is the text in the variable `text` (in Spanish) positive or negative?", + "Translate the following `text` from Spanish to English then tell me if its positive or negative.", + ], + inputs=["text"], + answer="""text_classifier(translator(text, src_lang="Spanish", tgt_lang="English"), labels=["positive", "negative"])""", + ), + Problem( + task=[ + "Tell me out loud what the `image` contains.", + "Describe the following `image` out loud.", + "Find what is in the picture stored in `image` then read it out loud.", + ], + inputs=["image"], + answer=[ + "text_reader(image_captioner(image))", + "text_reader(image_qa(image, question='What is in the image?'))", + ], + ), + Problem( + task=[ + "Generate an image from the text given in `text_input`. Then transform it according to the text in `prompt`.", + "Use the following `text_input` to generate an image, then transform it by using the text in `prompt`.", + ], + inputs=["text_input", "prompt"], + answer="image_transformer(image_generator(text_input), prompt)", + ), + Problem( + task=[ + "Download the content of `url`, summarize it then generate an image from its content.", + "Use a summary of the web page at `url` to generate an image.", + "Summarize the content of the web page at `url`, and use the result to generate an image.", + ], + inputs=["url"], + answer="image_generator(summarizer(text_downloader(url)))", + ), + Problem( + task=[ + "Transform the following `image` using the prompt in `text`. The prompt is in Spanish.", + "Use the text prompt in `text` (in Spanish) to transform the following `image`.", + "Translate the `text` from Spanish to English then use it to transform the picture in `image`.", + ], + inputs=["text", "image"], + answer="image_transformer(image, translator(text, src_lang='Spanish', tgt_lang='English'))", + ), + Problem( + task=[ + "Download the content of `url`, summarize it then read it out loud to me.", + "Read me a summary of the web page at `url`.", + ], + inputs=["url"], + answer="text_reader(summarizer(text_downloader(url)))", + ), + Problem( + task=[ + "Generate an image from the text given in `text_input`.", + ], + inputs=["text_input"], + answer="image_generator(text_input)", + ), + Problem( + task=[ + "Replace the beaver in the `image` by the `prompt`.", + "Transform the `image` so that it contains the `prompt`.", + "Use `prompt` to transform this `image`.", + ], + inputs=["image", "prompt"], + answer="image_transformer(image, prompt)", + ), + Problem( + task=[ + "Provide me the summary of the `text`, then read it to me before transcribing it and translating it in French.", + "Summarize `text`, read it out loud then transcribe the audio and translate it in French.", + "Read me a summary of the `text` out loud. Transcribe this and translate it in French.", + ], + inputs=["text"], + answer="translator(transcriber(text_reader(summarizer(text))), src_lang='English', tgt_lang='French')", + ), + Problem( + task=["Generate a video of the `prompt`", "Animate a `prompt`", "Make me a short video using `prompt`."], + inputs={"prompt": "A lobster swimming"}, + answer="video_generator('A lobster swimming')", + ), + Problem( + task=[ + "Download the following file `url`, summarize it in a few words and generate a video from it." + "Fetch the file at this `url`, summarize it, and create an animation out of it." + ], + inputs=["url"], + answer="video_generator(summarizer(text_downloader(url)))", + ), +] + + +def get_theoretical_tools(agent_answer, theoretical_answer, code_answer): + if not isinstance(theoretical_answer, list): + return {name for name in TEST_TOOLS if name in code_answer} + + if isinstance(agent_answer, dict): + for one_answer, one_code in zip(theoretical_answer, code_answer): + if one_answer in agent_answer.values(): + return {name for name in TEST_TOOLS if name in one_code} + + for one_answer, one_code in zip(theoretical_answer, code_answer): + if agent_answer == one_answer: + return {name for name in TEST_TOOLS if name in one_code} + + return {name for name in TEST_TOOLS if name in code_answer[0]} + + +def evaluate_code(code, inputs=None, state=None, verbose=False, return_interpretor_error=False): + tools = BASE_PYTHON_TOOLS.copy() + for name, tool in TEST_TOOLS.items(): + if name not in code: + continue + tools[name] = tool + + if isinstance(inputs, dict): + inputs = inputs.copy() + elif inputs is not None: + inputs = {inp: f"<<{inp}>>" for inp in inputs} + + if state is not None: + state.update(inputs) + else: + state = inputs + + try: + return evaluate(code, tools, state) + except InterpreterError as e: + return str(e) + except Exception as e: + if verbose: + print(e) + return None + + +def score_code(agent_answer, theoretical_answer, verbose: bool = False): + if verbose: + print(agent_answer, theoretical_answer) + theoretical_answer = theoretical_answer if isinstance(theoretical_answer, list) else [theoretical_answer] + + if agent_answer in theoretical_answer: + if verbose: + print("Perfect!") + return 1 + elif isinstance(agent_answer, dict) and any(v in theoretical_answer for v in agent_answer.values()): + if verbose: + print("Almsot perfect, result in state!") + return 0.75 + else: + if verbose: + print("Result is not the right one but code executed.") + return 0.3 + + +def evaluate_one_result(code, agent_answer, theoretical_answer, answer, verbose=False): + tools_in_code = {name for name in TEST_TOOLS if f"`{name}`" in code} + theoretical_tools = get_theoretical_tools(agent_answer, theoretical_answer, answer) + if tools_in_code == theoretical_tools: + tool_selection_score = 1.0 + tool_selection_errors = None + else: + missing_tools = len(theoretical_tools - tools_in_code) + unexpected_tools = len(tools_in_code - theoretical_tools) + tool_selection_score = max(0, 1.0 - 0.25 * missing_tools - 0.25 * unexpected_tools) + + tool_selection_errors = { + "selected_tools": tools_in_code, + "theoretical_tools": theoretical_tools, + } + + tools_in_code = {name for name in TEST_TOOLS if name in code} + if tools_in_code == theoretical_tools: + tool_used_score = 1.0 + tool_used_errors = None + else: + missing_tools = len(theoretical_tools - tools_in_code) + unexpected_tools = len(tools_in_code - theoretical_tools) + tool_used_score = max(0, 1.0 - 0.25 * missing_tools - 0.25 * unexpected_tools) + + tool_used_errors = { + "selected_tools": tools_in_code, + "theoretical_tools": theoretical_tools, + } + + score = score_code(agent_answer, theoretical_answer, verbose=verbose) + if score < 1.0: + code_errors = { + "code_produced": code, + "evaluation": agent_answer, + "theoretical_answer": theoretical_answer, + } + else: + code_errors = None + + return (tool_selection_score, tool_used_score, score), (tool_selection_errors, tool_used_errors, code_errors) + + +def evaluate_agent(agent, batch_size=8, verbose=False, return_errors=False): + """ + Evaluates a new agent on all `EVALUATION_TASKS`. + + Example: + + ```py + agent = NewOpenAiAgent(model="text-davinci-003", api_key=your_api_key) + bads = new_evaluate_agent(agent) + for bad in bads: + print(bad) + ``` + """ + # Sanity check + agent_tools = set(agent.toolbox.keys()) + if agent_tools != set(TEST_TOOLS): + missing_tools = set(TEST_TOOLS) - agent_tools + unexpected_tools = set(agent_tools) - TEST_TOOLS + raise ValueError( + f"Fix the test tools in the evaluate_agent module. Tools mising: {missing_tools}. Extra tools: {unexpected_tools}." + ) + + eval_tasks = [] + eval_idx = [] + for idx, pb in enumerate(EVALUATION_TASKS): + if isinstance(pb.task, list): + eval_tasks.extend(pb.task) + eval_idx.extend([idx] * len(pb.task)) + else: + eval_tasks.append(pb.task) + eval_idx.append(idx) + + tool_selection_score = 0 + tool_used_score = 0 + code_score = 0 + + if return_errors: + tool_selection_errors = {} + tool_used_errors = {} + code_errors = {} + + for start_idx in range(0, len(eval_tasks), batch_size): + end_idx = min(start_idx + batch_size, len(eval_tasks)) + batch_tasks = eval_tasks[start_idx:end_idx] + + results = [agent.run(task, return_generated_code=True) for task in batch_tasks] + + for idx, result in enumerate(results): + problem = EVALUATION_TASKS[eval_idx[start_idx + idx]] + if verbose: + print(f"====Task {start_idx + idx}====\n{batch_tasks[idx]}\n") + code = agent.extract_action(result, split_token="Answer:") + + # Evaluate agent answer and code answer + agent_answer = evaluate_code(code, problem.inputs, verbose=verbose) + if isinstance(problem.answer, list): + theoretical_answer = [evaluate_code(answer, problem.inputs) for answer in problem.answer] + else: + theoretical_answer = evaluate_code(problem.answer, problem.inputs) + + scores, errors = evaluate_one_result( + code, agent_answer, theoretical_answer, problem.answer, verbose=verbose + ) + + tool_selection_score += scores[0] + tool_used_score += scores[1] + code_score += scores[2] + + if return_errors: + if errors[0] is not None: + tool_selection_errors[batch_tasks[idx]] = errors[0] + if errors[1] is not None: + tool_used_errors[batch_tasks[idx]] = errors[1] + if errors[2] is not None: + code_errors[batch_tasks[idx]] = errors[2] + + scores = { + "tool selection score": 100 * (tool_selection_score / len(eval_tasks)), + "tool used score": 100 * (tool_used_score / len(eval_tasks)), + "code score": 100 * (code_score / len(eval_tasks)), + } + + if return_errors: + return scores, tool_selection_errors, tool_used_errors, code_errors + else: + return scores diff --git a/agents/image_question_answering.py b/agents/image_question_answering.py new file mode 100644 index 0000000..de0efb7 --- /dev/null +++ b/agents/image_question_answering.py @@ -0,0 +1,58 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +from PIL import Image + +from ..models.auto import AutoModelForVisualQuestionAnswering, AutoProcessor +from ..utils import requires_backends +from .tools import PipelineTool + + +class ImageQuestionAnsweringTool(PipelineTool): + default_checkpoint = "dandelin/vilt-b32-finetuned-vqa" + description = ( + "This is a tool that answers a question about an image. It " + "returns a text that is the answer to the question." + ) + name = "image_qa" + pre_processor_class = AutoProcessor + model_class = AutoModelForVisualQuestionAnswering + + inputs = { + "image": { + "type": "image", + "description": "The image containing the information. Can be a PIL Image or a string path to the image.", + }, + "question": {"type": "string", "description": "The question in English"}, + } + output_type = "string" + + def __init__(self, *args, **kwargs): + requires_backends(self, ["vision"]) + super().__init__(*args, **kwargs) + + def encode(self, image: "Image", question: str): + return self.pre_processor(image, question, return_tensors="pt") + + def forward(self, inputs): + with torch.no_grad(): + return self.model(**inputs).logits + + def decode(self, outputs): + idx = outputs.argmax(-1).item() + return self.model.config.id2label[idx] diff --git a/agents/llm_engine.py b/agents/llm_engine.py new file mode 100644 index 0000000..afa4d62 --- /dev/null +++ b/agents/llm_engine.py @@ -0,0 +1,238 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2024 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from copy import deepcopy +from enum import Enum +from typing import Dict, List, Optional + +from huggingface_hub import InferenceClient + +from .. import AutoTokenizer +from ..pipelines.base import Pipeline +from ..utils import logging + + +logger = logging.get_logger(__name__) + + +class MessageRole(str, Enum): + USER = "user" + ASSISTANT = "assistant" + SYSTEM = "system" + TOOL_CALL = "tool-call" + TOOL_RESPONSE = "tool-response" + + @classmethod + def roles(cls): + return [r.value for r in cls] + + +def get_clean_message_list(message_list: List[Dict[str, str]], role_conversions: Dict[str, str] = {}): + """ + Subsequent messages with the same role will be concatenated to a single message. + + Args: + message_list (`List[Dict[str, str]]`): List of chat messages. + """ + final_message_list = [] + message_list = deepcopy(message_list) # Avoid modifying the original list + for message in message_list: + if not set(message.keys()) == {"role", "content"}: + raise ValueError("Message should contain only 'role' and 'content' keys!") + + role = message["role"] + if role not in MessageRole.roles(): + raise ValueError(f"Incorrect role {role}, only {MessageRole.roles()} are supported for now.") + + if role in role_conversions: + message["role"] = role_conversions[role] + + if len(final_message_list) > 0 and message["role"] == final_message_list[-1]["role"]: + final_message_list[-1]["content"] += "\n=======\n" + message["content"] + else: + final_message_list.append(message) + return final_message_list + + +llama_role_conversions = { + MessageRole.TOOL_RESPONSE: MessageRole.USER, +} + + +class HfEngine: + def __init__(self, model_id: Optional[str] = None): + self.last_input_token_count = None + self.last_output_token_count = None + if model_id is None: + model_id = "HuggingFaceTB/SmolLM2-1.7B-Instruct" + logger.warning(f"Using default model for token counting: '{model_id}'") + try: + self.tokenizer = AutoTokenizer.from_pretrained(model_id) + except Exception as e: + logger.warning(f"Failed to load tokenizer for model {model_id}: {e}. Loading default tokenizer instead.") + self.tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct") + + def get_token_counts(self): + return { + "input_token_count": self.last_input_token_count, + "output_token_count": self.last_output_token_count, + } + + def generate( + self, messages: List[Dict[str, str]], stop_sequences: Optional[List[str]] = None, grammar: Optional[str] = None + ): + raise NotImplementedError + + def __call__( + self, messages: List[Dict[str, str]], stop_sequences: Optional[List[str]] = None, grammar: Optional[str] = None + ) -> str: + """Process the input messages and return the model's response. + + This method sends a list of messages to the Hugging Face Inference API, optionally with stop sequences and grammar customization. + + Parameters: + messages (`List[Dict[str, str]]`): + A list of message dictionaries to be processed. Each dictionary should have the structure `{"role": "user/system", "content": "message content"}`. + stop_sequences (`List[str]`, *optional*): + A list of strings that will stop the generation if encountered in the model's output. + grammar (`str`, *optional*): + The grammar or formatting structure to use in the model's response. + + Returns: + `str`: The text content of the model's response. + + Example: + ```python + >>> engine = HfApiEngine( + ... model="meta-llama/Meta-Llama-3.1-8B-Instruct", + ... token="your_hf_token_here", + ... max_tokens=2000 + ... ) + >>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}] + >>> response = engine(messages, stop_sequences=["END"]) + >>> print(response) + "Quantum mechanics is the branch of physics that studies..." + ``` + """ + if not isinstance(messages, List): + raise ValueError("Messages should be a list of dictionaries with 'role' and 'content' keys.") + if stop_sequences is None: + stop_sequences = [] + response = self.generate(messages, stop_sequences, grammar) + self.last_input_token_count = len(self.tokenizer.apply_chat_template(messages, tokenize=True)) + self.last_output_token_count = len(self.tokenizer.encode(response)) + + # Remove stop sequences from LLM output + for stop_seq in stop_sequences: + if response[-len(stop_seq) :] == stop_seq: + response = response[: -len(stop_seq)] + return response + + +class HfApiEngine(HfEngine): + """A class to interact with Hugging Face's Inference API for language model interaction. + + This engine allows you to communicate with Hugging Face's models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization. + + Parameters: + model (`str`, *optional*, defaults to `"meta-llama/Meta-Llama-3.1-8B-Instruct"`): + The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. + token (`str`, *optional*): + Token used by the Hugging Face API for authentication. + If not provided, the class will use the token stored in the Hugging Face CLI configuration. + max_tokens (`int`, *optional*, defaults to 1500): + The maximum number of tokens allowed in the output. + timeout (`int`, *optional*, defaults to 120): + Timeout for the API request, in seconds. + + Raises: + ValueError: + If the model name is not provided. + """ + + def __init__( + self, + model: str = "meta-llama/Meta-Llama-3.1-8B-Instruct", + token: Optional[str] = None, + max_tokens: Optional[int] = 1500, + timeout: Optional[int] = 120, + ): + super().__init__(model_id=model) + self.model = model + self.client = InferenceClient(self.model, token=token, timeout=timeout) + self.max_tokens = max_tokens + + def generate( + self, messages: List[Dict[str, str]], stop_sequences: Optional[List[str]] = None, grammar: Optional[str] = None + ) -> str: + # Get clean message list + messages = get_clean_message_list(messages, role_conversions=llama_role_conversions) + + # Send messages to the Hugging Face Inference API + if grammar is not None: + response = self.client.chat_completion( + messages, stop=stop_sequences, max_tokens=self.max_tokens, response_format=grammar + ) + else: + response = self.client.chat_completion(messages, stop=stop_sequences, max_tokens=self.max_tokens) + + response = response.choices[0].message.content + return response + + +class TransformersEngine(HfEngine): + """This engine uses a pre-initialized local text-generation pipeline.""" + + def __init__(self, pipeline: Pipeline, model_id: Optional[str] = None): + super().__init__(model_id) + self.pipeline = pipeline + + def generate( + self, + messages: List[Dict[str, str]], + stop_sequences: Optional[List[str]] = None, + grammar: Optional[str] = None, + max_length: int = 1500, + ) -> str: + # Get clean message list + messages = get_clean_message_list(messages, role_conversions=llama_role_conversions) + + # Get LLM output + if stop_sequences is not None and len(stop_sequences) > 0: + stop_strings = stop_sequences + else: + stop_strings = None + + output = self.pipeline( + messages, + stop_strings=stop_strings, + max_length=max_length, + tokenizer=self.pipeline.tokenizer, + ) + + response = output[0]["generated_text"][-1]["content"] + return response + + +DEFAULT_JSONAGENT_REGEX_GRAMMAR = { + "type": "regex", + "value": 'Thought: .+?\\nAction:\\n\\{\\n\\s{4}"action":\\s"[^"\\n]+",\\n\\s{4}"action_input":\\s"[^"\\n]+"\\n\\}\\n', +} + +DEFAULT_CODEAGENT_REGEX_GRAMMAR = { + "type": "regex", + "value": "Thought: .+?\\nCode:\\n```(?:py|python)?\\n(?:.|\\s)+?\\n```", +} diff --git a/agents/monitoring.py b/agents/monitoring.py new file mode 100644 index 0000000..7126e72 --- /dev/null +++ b/agents/monitoring.py @@ -0,0 +1,117 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2024 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from ..utils import logging +from .agent_types import AgentAudio, AgentImage, AgentText + + +logger = logging.get_logger(__name__) + + +def pull_message(step_log: dict, test_mode: bool = True): + try: + from gradio import ChatMessage + except ImportError: + if test_mode: + + class ChatMessage: + def __init__(self, role, content, metadata=None): + self.role = role + self.content = content + self.metadata = metadata + else: + raise ImportError("Gradio should be installed in order to launch a gradio demo.") + + if step_log.get("rationale"): + yield ChatMessage(role="assistant", content=step_log["rationale"]) + if step_log.get("tool_call"): + used_code = step_log["tool_call"]["tool_name"] == "code interpreter" + content = step_log["tool_call"]["tool_arguments"] + if used_code: + content = f"```py\n{content}\n```" + yield ChatMessage( + role="assistant", + metadata={"title": f"๐Ÿ› ๏ธ Used tool {step_log['tool_call']['tool_name']}"}, + content=str(content), + ) + if step_log.get("observation"): + yield ChatMessage(role="assistant", content=f"```\n{step_log['observation']}\n```") + if step_log.get("error"): + yield ChatMessage( + role="assistant", + content=str(step_log["error"]), + metadata={"title": "๐Ÿ’ฅ Error"}, + ) + + +def stream_to_gradio(agent, task: str, test_mode: bool = False, **kwargs): + """Runs an agent with the given task and streams the messages from the agent as gradio ChatMessages.""" + + try: + from gradio import ChatMessage + except ImportError: + if test_mode: + + class ChatMessage: + def __init__(self, role, content, metadata=None): + self.role = role + self.content = content + self.metadata = metadata + else: + raise ImportError("Gradio should be installed in order to launch a gradio demo.") + + for step_log in agent.run(task, stream=True, **kwargs): + if isinstance(step_log, dict): + for message in pull_message(step_log, test_mode=test_mode): + yield message + + final_answer = step_log # Last log is the run's final_answer + + if isinstance(final_answer, AgentText): + yield ChatMessage(role="assistant", content=f"**Final answer:**\n```\n{final_answer.to_string()}\n```") + elif isinstance(final_answer, AgentImage): + yield ChatMessage( + role="assistant", + content={"path": final_answer.to_string(), "mime_type": "image/png"}, + ) + elif isinstance(final_answer, AgentAudio): + yield ChatMessage( + role="assistant", + content={"path": final_answer.to_string(), "mime_type": "audio/wav"}, + ) + else: + yield ChatMessage(role="assistant", content=str(final_answer)) + + +class Monitor: + def __init__(self, tracked_llm_engine): + self.step_durations = [] + self.tracked_llm_engine = tracked_llm_engine + if getattr(self.tracked_llm_engine, "last_input_token_count", "Not found") != "Not found": + self.total_input_token_count = 0 + self.total_output_token_count = 0 + + def update_metrics(self, step_log): + step_duration = step_log["step_duration"] + self.step_durations.append(step_duration) + logger.info(f"Step {len(self.step_durations)}:") + logger.info(f"- Time taken: {step_duration:.2f} seconds (valid only if step succeeded)") + + if getattr(self.tracked_llm_engine, "last_input_token_count", None) is not None: + self.total_input_token_count += self.tracked_llm_engine.last_input_token_count + self.total_output_token_count += self.tracked_llm_engine.last_output_token_count + logger.info(f"- Input tokens: {self.total_input_token_count}") + logger.info(f"- Output tokens: {self.total_output_token_count}") diff --git a/agents/prompts.py b/agents/prompts.py new file mode 100644 index 0000000..7a84b1d --- /dev/null +++ b/agents/prompts.py @@ -0,0 +1,789 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2024 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import re + +from ..utils import cached_file + + +# docstyle-ignore +CHAT_MESSAGE_PROMPT = """ +Human: <> + +Assistant: """ + + +DEFAULT_PROMPTS_REPO = "huggingface-tools/default-prompts" +PROMPT_FILES = {"chat": "chat_prompt_template.txt", "run": "run_prompt_template.txt"} + + +def download_prompt(prompt_or_repo_id, agent_name, mode="run"): + """ + Downloads and caches the prompt from a repo and returns it contents (if necessary). + """ + if prompt_or_repo_id is None: + prompt_or_repo_id = DEFAULT_PROMPTS_REPO + + # prompt is considered a repo ID when it does not contain any kind of space + if re.search("\\s", prompt_or_repo_id) is not None: + return prompt_or_repo_id + + prompt_file = cached_file( + prompt_or_repo_id, PROMPT_FILES[mode], repo_type="dataset", user_agent={"agent": agent_name} + ) + with open(prompt_file, "r", encoding="utf-8") as f: + return f.read() + + +DEFAULT_CODE_SYSTEM_PROMPT = """You will be given a task to solve, your job is to come up with a series of simple commands in Python that will perform the task. +To help you, I will give you access to a set of tools that you can use. Each tool is a Python function and has a description explaining the task it performs, the inputs it expects and the outputs it returns. +You should first explain which tool you will use to perform the task and for what reason, then write the code in Python. +Each instruction in Python should be a simple assignment. You can print intermediate results if it makes sense to do so. +In the end, use tool 'final_answer' to return your answer, its argument will be what gets returned. +You can use imports in your code, but only from the following list of modules: <> +Be sure to provide a 'Code:' token, else the run will fail. + +Tools: +<> + +Examples: +--- +Task: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French." + +Thought: I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image. +Code: +```py +translated_question = translator(question=question, src_lang="French", tgt_lang="English") +print(f"The translated question is {translated_question}.") +answer = image_qa(image=image, question=translated_question) +final_answer(f"The answer is {answer}") +``` + +--- +Task: "Identify the oldest person in the `document` and create an image showcasing the result." + +Thought: I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer. +Code: +```py +answer = document_qa(document, question="What is the oldest person?") +print(f"The answer is {answer}.") +image = image_generator(answer) +final_answer(image) +``` + +--- +Task: "Generate an image using the text given in the variable `caption`." + +Thought: I will use the following tool: `image_generator` to generate an image. +Code: +```py +image = image_generator(prompt=caption) +final_answer(image) +``` + +--- +Task: "Summarize the text given in the variable `text` and read it out loud." + +Thought: I will use the following tools: `summarizer` to create a summary of the input text, then `text_reader` to read it out loud. +Code: +```py +summarized_text = summarizer(text) +print(f"Summary: {summarized_text}") +audio_summary = text_reader(summarized_text) +final_answer(audio_summary) +``` + +--- +Task: "Answer the question in the variable `question` about the text in the variable `text`. Use the answer to generate an image." + +Thought: I will use the following tools: `text_qa` to create the answer, then `image_generator` to generate an image according to the answer. +Code: +```py +answer = text_qa(text=text, question=question) +print(f"The answer is {answer}.") +image = image_generator(answer) +final_answer(image) +``` + +--- +Task: "Caption the following `image`." + +Thought: I will use the following tool: `image_captioner` to generate a caption for the image. +Code: +```py +caption = image_captioner(image) +final_answer(caption) +``` + +--- +Above example were using tools that might not exist for you. You only have acces to those Tools: +<> + +Remember to make sure that variables you use are all defined. +Be sure to provide a 'Code:\n```' sequence before the code and '```' after, else you will get an error. +DO NOT pass the arguments as a dict as in 'answer = ask_search_agent({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = ask_search_agent(query="What is the place where James Bond lives?")'. + +Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000. +""" + + +DEFAULT_REACT_JSON_SYSTEM_PROMPT = """You are an expert assistant who can solve any task using JSON tool calls. You will be given a task to solve as best you can. +To do so, you have been given access to the following tools: <> +The way you use the tools is by specifying a json blob, ending with ''. +Specifically, this json should have an `action` key (name of the tool to use) and an `action_input` key (input to the tool). + +The $ACTION_JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. It should be formatted in json. Do not try to escape special characters. Here is the template of a valid $ACTION_JSON_BLOB: +{ + "action": $TOOL_NAME, + "action_input": $INPUT +} + +Make sure to have the $INPUT as a dictionary in the right format for the tool you are using, and do not put variable names as input if you can find the right values. + +You should ALWAYS use the following format: + +Thought: you should always think about one action to take. Then use the action as follows: +Action: +$ACTION_JSON_BLOB +Observation: the result of the action +... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $ACTION_JSON_BLOB must only use a SINGLE action at a time.) + +You can use the result of the previous action as input for the next action. +The observation will always be a string: it can represent a file, like "image_1.jpg". +Then you can use it as input for the next action. You can do it for instance as follows: + +Observation: "image_1.jpg" + +Thought: I need to transform the image that I received in the previous observation to make it green. +Action: +{ + "action": "image_transformer", + "action_input": {"image": "image_1.jpg"} +} + +To provide the final answer to the task, use an action blob with "action": "final_answer" tool. It is the only way to complete the task, else you will be stuck on a loop. So your final output should look like this: +Action: +{ + "action": "final_answer", + "action_input": {"answer": "insert your final answer here"} +} + + +Here are a few examples using notional tools: +--- +Task: "Generate an image of the oldest person in this document." + +Thought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer. +Action: +{ + "action": "document_qa", + "action_input": {"document": "document.pdf", "question": "Who is the oldest person mentioned?"} +} +Observation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland." + + +Thought: I will now generate an image showcasing the oldest person. +Action: +{ + "action": "image_generator", + "action_input": {"prompt": "A portrait of John Doe, a 55-year-old man living in Canada."} +} +Observation: "image.png" + +Thought: I will now return the generated image. +Action: +{ + "action": "final_answer", + "action_input": "image.png" +} + +--- +Task: "What is the result of the following operation: 5 + 3 + 1294.678?" + +Thought: I will use python code evaluator to compute the result of the operation and then return the final answer using the `final_answer` tool +Action: +{ + "action": "python_interpreter", + "action_input": {"code": "5 + 3 + 1294.678"} +} +Observation: 1302.678 + +Thought: Now that I know the result, I will now return it. +Action: +{ + "action": "final_answer", + "action_input": "1302.678" +} + +--- +Task: "Which city has the highest population , Guangzhou or Shanghai?" + +Thought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities. +Action: +{ + "action": "search", + "action_input": "Population Guangzhou" +} +Observation: ['Guangzhou has a population of 15 million inhabitants as of 2021.'] + + +Thought: Now let's get the population of Shanghai using the tool 'search'. +Action: +{ + "action": "search", + "action_input": "Population Shanghai" +} +Observation: '26 million (2019)' + +Thought: Now I know that Shanghai has a larger population. Let's return the result. +Action: +{ + "action": "final_answer", + "action_input": "Shanghai" +} + + +Above example were using notional tools that might not exist for you. You only have acces to those tools: +<> + +Here are the rules you should always follow to solve your task: +1. ALWAYS provide a 'Thought:' sequence, and an 'Action:' sequence that ends with , else you will fail. +2. Always use the right arguments for the tools. Never use variable names in the 'action_input' field, use the value instead. +3. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself. +4. Never re-do a tool call that you previously did with the exact same parameters. + +Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000. +""" + + +DEFAULT_REACT_CODE_SYSTEM_PROMPT = """You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can. +To do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code. +To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences. + +At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use. +Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '' sequence. +During each intermediate step, you can use 'print()' to save whatever important information you will then need. +These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step. +In the end you have to return a final answer using the `final_answer` tool. + +Here are a few examples using notional tools: +--- +Task: "Generate an image of the oldest person in this document." + +Thought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer. +Code: +```py +answer = document_qa(document=document, question="Who is the oldest person mentioned?") +print(answer) +``` +Observation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland." + +Thought: I will now generate an image showcasing the oldest person. +Code: +```py +image = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.") +final_answer(image) +``` + +--- +Task: "What is the result of the following operation: 5 + 3 + 1294.678?" + +Thought: I will use python code to compute the result of the operation and then return the final answer using the `final_answer` tool +Code: +```py +result = 5 + 3 + 1294.678 +final_answer(result) +``` + +--- +Task: "Which city has the highest population: Guangzhou or Shanghai?" + +Thought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities. +Code: +```py +population_guangzhou = search("Guangzhou population") +print("Population Guangzhou:", population_guangzhou) +population_shanghai = search("Shanghai population") +print("Population Shanghai:", population_shanghai) +``` +Observation: +Population Guangzhou: ['Guangzhou has a population of 15 million inhabitants as of 2021.'] +Population Shanghai: '26 million (2019)' + +Thought: Now I know that Shanghai has the highest population. +Code: +```py +final_answer("Shanghai") +``` + +--- +Task: "What is the current age of the pope, raised to the power 0.36?" + +Thought: I will use the tool `wiki` to get the age of the pope, then raise it to the power 0.36. +Code: +```py +pope_age = wiki(query="current pope age") +print("Pope age:", pope_age) +``` +Observation: +Pope age: "The pope Francis is currently 85 years old." + +Thought: I know that the pope is 85 years old. Let's compute the result using python code. +Code: +```py +pope_current_age = 85 ** 0.36 +final_answer(pope_current_age) +``` + +Above example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you have acces to those tools (and no other tool): + +<> + +<> + +Here are the rules you should always follow to solve your task: +1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```' sequence, else you will fail. +2. Use only variables that you have defined! +3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = wiki({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = wiki(query="What is the place where James Bond lives?")'. +4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block. +5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters. +6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'. +7. Never create any notional variables in our code, as having these in your logs might derail you from the true variables. +8. You can use imports in your code, but only from the following list of modules: <> +9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist. +10. Don't give up! You're in charge of solving the task, not providing directions to solve it. + +Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000. +""" + +SYSTEM_PROMPT_FACTS = """Below I will present you a task. + +You will now build a comprehensive preparatory survey of which facts we have at our disposal and which ones we still need. +To do so, you will have to read the task and identify things that must be discovered in order to successfully complete it. +Don't make any assumptions. For each item, provide a thorough reasoning. Here is how you will structure this survey: + +--- +### 1. Facts given in the task +List here the specific facts given in the task that could help you (there might be nothing here). + +### 2. Facts to look up +List here any facts that we may need to look up. +Also list where to find each of these, for instance a website, a file... - maybe the task contains some sources that you should re-use here. + +### 3. Facts to derive +List here anything that we want to derive from the above by logical reasoning, for instance computation or simulation. + +Keep in mind that "facts" will typically be specific names, dates, values, etc. Your answer should use the below headings: +### 1. Facts given in the task +### 2. Facts to look up +### 3. Facts to derive +Do not add anything else.""" + +SYSTEM_PROMPT_PLAN = """You are a world expert at making efficient plans to solve any task using a set of carefully crafted tools. + +Now for the given task, develop a step-by-step high-level plan taking into account the above inputs and list of facts. +This plan should involve individual tasks based on the avilable tools, that if executed correctly will yield the correct answer. +Do not skip steps, do not add any superfluous steps. Only write the high-level plan, DO NOT DETAIL INDIVIDUAL TOOL CALLS. +After writing the final step of the plan, write the '\n' tag and stop there.""" + +USER_PROMPT_PLAN = """ +Here is your task: + +Task: +``` +{task} +``` + +Your plan can leverage any of these tools: +{tool_descriptions} + +{managed_agents_descriptions} + +List of facts that you know: +``` +{answer_facts} +``` + +Now begin! Write your plan below.""" + +SYSTEM_PROMPT_FACTS_UPDATE = """ +You are a world expert at gathering known and unknown facts based on a conversation. +Below you will find a task, and ahistory of attempts made to solve the task. You will have to produce a list of these: +### 1. Facts given in the task +### 2. Facts that we have learned +### 3. Facts still to look up +### 4. Facts still to derive +Find the task and history below.""" + +USER_PROMPT_FACTS_UPDATE = """Earlier we've built a list of facts. +But since in your previous steps you may have learned useful new facts or invalidated some false ones. +Please update your list of facts based on the previous history, and provide these headings: +### 1. Facts given in the task +### 2. Facts that we have learned +### 3. Facts still to look up +### 4. Facts still to derive + +Now write your new list of facts below.""" + +SYSTEM_PROMPT_PLAN_UPDATE = """You are a world expert at making efficient plans to solve any task using a set of carefully crafted tools. + +You have been given a task: +``` +{task} +``` + +Find below the record of what has been tried so far to solve it. Then you will be asked to make an updated plan to solve the task. +If the previous tries so far have met some success, you can make an updated plan based on these actions. +If you are stalled, you can make a completely new plan starting from scratch. +""" + +USER_PROMPT_PLAN_UPDATE = """You're still working towards solving this task: +``` +{task} +``` + +You have access to these tools and only these: +{tool_descriptions} + +{managed_agents_descriptions} + +Here is the up to date list of facts that you know: +``` +{facts_update} +``` + +Now for the given task, develop a step-by-step high-level plan taking into account the above inputs and list of facts. +This plan should involve individual tasks based on the avilable tools, that if executed correctly will yield the correct answer. +Beware that you have {remaining_steps} steps remaining. +Do not skip steps, do not add any superfluous steps. Only write the high-level plan, DO NOT DETAIL INDIVIDUAL TOOL CALLS. +After writing the final step of the plan, write the '\n' tag and stop there. + +Now write your new plan below.""" + +SYSTEM_PROMPT_PLAN_STRUCTURED = """Output a step-by-step plan to solve the task using the given tools. +This plan should involve individual tasks based on the avilable tools, that if executed correctly will yield the correct answer. Each step should be structured as follows: +Step #n: { + "description": + "tool": , + "params": { + + } + "output_var": +} +Each step must be necessary to reach the final answer. Steps should reuse outputs produced by earlier steps. The last step must be the final answer. + +Below are some examples: + +Example 1: +------ +Inputs: +--- +Task: +How many encoder blocks were in the first attention-only ML architecture published? + +[FACTS LIST]: +### 1. Facts given in the task +- The paper first introduced an attention-only ML architecture. +- The specific information required is the page number where the number of encoder blocks is stated. +- No local files are provided for access. + +### 2. Facts to look up +- The title and authors of the paper that first introduced an attention-only ML architecture. + - Source: Online search (e.g., Google Scholar, arXiv, or other academic databases) +- The full text of the identified paper. + - Source: Online academic repositories (e.g., arXiv, journal websites) +- The specific page number in the paper where the number of encoder blocks is mentioned. + - Source: The content of the identified paper + +### 3. Facts to derive +- By identifying the correct paper and locating the specific page, we will derive the page number where the number of encoder blocks is stated. + - Logical steps: Identify the correct paper, access its content, search for the term "encoder blocks," and note the page number where this information is found. +``` + +[STEP 1 TOOL CALL]: {'tool_name': 'code interpreter', 'tool_arguments': '# Step 1: Identify the title and authors of the paper that first introduced an attention-only ML architecture.\nanswer = ask_search_agent(query="Can you find the title and authors of the paper that first introduced an attention-only machine learning architecture? Please provide the full citation.")\nprint(answer)'} +[OUTPUT OF STEP 1] Observation: **Title**: Attention Is All You Need +**Authors**: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin +[STEP 2 TOOL CALL]: {'tool_name': 'code interpreter', 'tool_arguments': '# Step 1: Find the full text of the identified paper on arXiv\\npaper_url = "https://arxiv.org/pdf/1706.03762.pdf"\\nprint(paper_url)'} +[OUTPUT OF STEP 2] Observation: https://arxiv.org/pdf/1706.03762.pdf +--- + +Output plan: +--- +Step #1: { + "description": "Open the PDF of the paper from the provided URL and search within the text of the paper for the mention of "encoder blocks"", + "tool": "inspect_file_as_text", + "params": { + "file_path": "https://arxiv.org/pdf/1706.03762.pdf", + "question": "On which page is the number of encoder blocks mentioned?" + }, + "output_var": "page_number" +} + +Step #2: { + "description": "Provide the final answer", + "tool": "final_answer", + "params": { + "answer": "{page_number}" + }, + "output_var": "" +} +------ + +Example 2: +------ +Inputs: +--- +Task: +How many golf balls fits into a Boeing-747? + +[FACTS LIST]: +### 1. Facts given in the task +- The task requires calculating the number of golf balls that fir into a Boeing-747 +### 2. Facts to look up +- The volume of a golf ball +- The volume of a Boeing-747 +### 3. Facts to derive +- Once the volumes are known the final answer can be calculated +--- +Output plan: +--- +Step #1: { + "description": "Find the volume of a Boeing-747", + "tool": "web_search", + "params": { + "query": "What is the internal volume of a Boeing-747 in cubic meters?" + }, + "output_var": "boeing_volume" +} + +Step #2: { + "description": "Find the volume of a standard golf ball", + "tool": "ask_search_agent", + "params": { + "query": "What is the volume of a standard golf ball in cubic centimeters?" + }, + "output_var": "golf_ball_volume" +} + +Step #3: { + "description": "Convert the volume of a golf ball from cubic centimeters to cubic meters. Calculate the number of golf balls that fit into the Boeing-747 by dividing the internal volume of the Boeing-747 by the volume of a golf ball.", + "tool": "python_code", + "params": { + "code": "golf_ball_volume_m3 = golf_ball_volume / 1e6\nnumber_of_golf_balls = boeing_volume / golf_ball_volume_m3" + }, + "output_var": "number_of_golf_balls" +} + +Step #4: { + "description": "Provide the final answer", + "tool": "final_answer", + "params": { + "answer": "{number_of_golf_balls}" + }, + "output_var": "" +} +------ +Above example were using tools that might not exist for you. +Your goal is to create a plan to solve the task.""" + +USER_PROMPT_PLAN_STRUCTURED = """ +Here are your inputs: + +Task: +``` +{task} +``` + +Your plan can leverage any of these tools: +{tool_descriptions} +These tools are Python functions which you can call with code. You also have access to a Python interpreter so you can run Python code. + +List of facts that you know: +``` +{answer_facts} +``` + +Now for the given task, create a plan taking into account the list of facts. +After writing the final step of the plan, write the '\n' tag and stop there. Output the plan only and nothing else.""" + +SYSTEM_PROMPT_PLAN_UPDATE_STRUCTURED = """Output a step-by-step plan to solve the task using the given tools. +This plan should involve individual tasks based on the avilable tools, that if executed correctly will yield the correct answer. Each step should be structured as follows: +Step #n: {{ + "description": + "tool": , + "params": {{ + + }} + "output_var": +}} +Each step must be necessary to reach the final answer. Steps should reuse outputs produced by earlier steps. The last step must be the final answer. + +Below are some examples: + +Example 1: +------ +Inputs: +--- +Task: +How many encoder blocks were in the first attention-only ML architecture published? + +[FACTS LIST]: +### 1. Facts given in the task +- The paper first introduced an attention-only ML architecture. +- The specific information required is the page number where the number of encoder blocks is stated. +- No local files are provided for access. + +### 2. Facts to look up +- The title and authors of the paper that first introduced an attention-only ML architecture. + - Source: Online search (e.g., Google Scholar, arXiv, or other academic databases) +- The full text of the identified paper. + - Source: Online academic repositories (e.g., arXiv, journal websites) +- The specific page number in the paper where the number of encoder blocks is mentioned. + - Source: The content of the identified paper + +### 3. Facts to derive +- By identifying the correct paper and locating the specific page, we will derive the page number where the number of encoder blocks is stated. + - Logical steps: Identify the correct paper, access its content, search for the term "encoder blocks," and note the page number where this information is found. +``` + +[STEP 1 TOOL CALL]: {{'tool_name': 'code interpreter', 'tool_arguments': '# Step 1: Identify the title and authors of the paper that first introduced an attention-only ML architecture.\nanswer = ask_search_agent(query="Can you find the title and authors of the paper that first introduced an attention-only machine learning architecture? Please provide the full citation.")\nprint(answer)'}} +[OUTPUT OF STEP 1] Observation: **Title**: Attention Is All You Need +**Authors**: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin +[STEP 2 TOOL CALL]: {{'tool_name': 'code interpreter', 'tool_arguments': '# Step 1: Find the full text of the identified paper on arXiv\\npaper_url = "https://arxiv.org/pdf/1706.03762.pdf"\\nprint(paper_url)'}} +[OUTPUT OF STEP 2] Observation: https://arxiv.org/pdf/1706.03762.pdf +--- + +Output plan: +--- +Step #1: {{ + "description": "Open the PDF of the paper from the provided URL and search within the text of the paper for the mention of "encoder blocks"", + "tool": "inspect_file_as_text", + "params": {{ + "file_path": "https://arxiv.org/pdf/1706.03762.pdf", + "question": "On which page is the number of encoder blocks mentioned?" + }}, + "output_var": "page_number" +}} + +Step #2: {{ + "description": "Provide the final answer", + "tool": "final_answer", + "params": {{ + "answer": "{{page_number}}" + }}, + "output_var": "" +}} +------ + +Example 2: +------ +Inputs: +--- +Task: +How many golf balls fits into a Boeing-747? + +[FACTS LIST]: +### 1. Facts given in the task +- The task requires calculating the number of golf balls that fir into a Boeing-747 +### 2. Facts to look up +- The volume of a golf ball +- The volume of a Boeing-747 +### 3. Facts to derive +- Once the volumes are known the final answer can be calculated +--- +Output plan: +--- +Step #1: {{ + "description": "Find the volume of a Boeing-747", + "tool": "web_search", + "params": {{ + "query": "What is the internal volume of a Boeing-747 in cubic meters?" + }}, + "output_var": "boeing_volume" +}} + +Step #2: {{ + "description": "Find the volume of a standard golf ball", + "tool": "ask_search_agent", + "params": {{ + "query": "What is the volume of a standard golf ball in cubic centimeters?" + }}, + "output_var": "golf_ball_volume" +}} + +Step #3: {{ + "description": "Convert the volume of a golf ball from cubic centimeters to cubic meters. Calculate the number of golf balls that fit into the Boeing-747 by dividing the internal volume of the Boeing-747 by the volume of a golf ball.", + "tool": "python_code", + "params": {{ + "code": "golf_ball_volume_m3 = golf_ball_volume / 1e6\nnumber_of_golf_balls = boeing_volume / golf_ball_volume_m3" + }}, + "output_var": "number_of_golf_balls" +}} + +Step #4: {{ + "description": "Provide the final answer", + "tool": "final_answer", + "params": {{ + "answer": "{{number_of_golf_balls}}" + }}, + "output_var": "" +}} +------ +Above example were using tools that might not exist for you. +Find below the record of what has been tried so far to solve it. Your goal is to create an updated plan to solve the task.""" + +USER_PROMPT_PLAN_UPDATE_STRUCTURED = """ +Here are your inputs: + +Task: +``` +{task} +``` + +Your plan can leverage any of these tools: +{tool_descriptions} +These tools are Python functions which you can call with code. You also have access to a Python interpreter so you can run Python code. + +List of facts that you know: +``` +{facts_update} +``` + +Now for the given task, create a plan taking into account the above inputs and list of facts. +Beware that you have {remaining_steps} steps remaining. +After writing the final step of the plan, write the '\n' tag and stop there. Output the plan only and nothing else.""" + +PLAN_UPDATE_FINAL_PLAN_REDACTION = """I still need to solve the task I was given: +``` +{task} +``` + +Here is my new/updated plan of action to solve the task: +``` +{plan_update} +```""" + +SUPPORTED_PLAN_TYPES = ["default", "structured"] + +PROMPTS_FOR_INITIAL_PLAN = { + "default": {"system": SYSTEM_PROMPT_PLAN, "user": USER_PROMPT_PLAN}, + "structured": {"system": SYSTEM_PROMPT_PLAN_STRUCTURED, "user": USER_PROMPT_PLAN_STRUCTURED}, +} + +PROMPTS_FOR_PLAN_UPDATE = { + "default": {"system": SYSTEM_PROMPT_PLAN_UPDATE, "user": USER_PROMPT_PLAN_UPDATE}, + "structured": {"system": SYSTEM_PROMPT_PLAN_UPDATE_STRUCTURED, "user": USER_PROMPT_PLAN_UPDATE_STRUCTURED}, +} diff --git a/agents/python_interpreter.py b/agents/python_interpreter.py new file mode 100644 index 0000000..6e90f35 --- /dev/null +++ b/agents/python_interpreter.py @@ -0,0 +1,908 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2024 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import ast +import builtins +import difflib +from collections.abc import Mapping +from importlib import import_module +from typing import Any, Callable, Dict, List, Optional + +import numpy as np + +from ..utils import is_pandas_available + + +if is_pandas_available(): + import pandas as pd + + +class InterpreterError(ValueError): + """ + An error raised when the interpretor cannot evaluate a Python expression, due to syntax error or unsupported + operations. + """ + + pass + + +ERRORS = { + name: getattr(builtins, name) + for name in dir(builtins) + if isinstance(getattr(builtins, name), type) and issubclass(getattr(builtins, name), BaseException) +} + + +LIST_SAFE_MODULES = [ + "random", + "collections", + "math", + "time", + "queue", + "itertools", + "re", + "stat", + "statistics", + "unicodedata", +] + +PRINT_OUTPUTS, MAX_LEN_OUTPUT = "", 50000 +OPERATIONS_COUNT, MAX_OPERATIONS = 0, 10000000 + + +class BreakException(Exception): + pass + + +class ContinueException(Exception): + pass + + +class ReturnException(Exception): + def __init__(self, value): + self.value = value + + +def get_iterable(obj): + if isinstance(obj, list): + return obj + elif hasattr(obj, "__iter__"): + return list(obj) + else: + raise InterpreterError("Object is not iterable") + + +def evaluate_unaryop(expression, state, static_tools, custom_tools): + operand = evaluate_ast(expression.operand, state, static_tools, custom_tools) + if isinstance(expression.op, ast.USub): + return -operand + elif isinstance(expression.op, ast.UAdd): + return operand + elif isinstance(expression.op, ast.Not): + return not operand + elif isinstance(expression.op, ast.Invert): + return ~operand + else: + raise InterpreterError(f"Unary operation {expression.op.__class__.__name__} is not supported.") + + +def evaluate_lambda(lambda_expression, state, static_tools, custom_tools): + args = [arg.arg for arg in lambda_expression.args.args] + + def lambda_func(*values): + new_state = state.copy() + for arg, value in zip(args, values): + new_state[arg] = value + return evaluate_ast(lambda_expression.body, new_state, static_tools, custom_tools) + + return lambda_func + + +def evaluate_while(while_loop, state, static_tools, custom_tools): + max_iterations = 1000 + iterations = 0 + while evaluate_ast(while_loop.test, state, static_tools, custom_tools): + for node in while_loop.body: + try: + evaluate_ast(node, state, static_tools, custom_tools) + except BreakException: + return None + except ContinueException: + break + iterations += 1 + if iterations > max_iterations: + raise InterpreterError(f"Maximum number of {max_iterations} iterations in While loop exceeded") + return None + + +def create_function(func_def, state, static_tools, custom_tools): + def new_func(*args, **kwargs): + func_state = state.copy() + arg_names = [arg.arg for arg in func_def.args.args] + default_values = [evaluate_ast(d, state, static_tools, custom_tools) for d in func_def.args.defaults] + + # Apply default values + defaults = dict(zip(arg_names[-len(default_values) :], default_values)) + + # Set positional arguments + for name, value in zip(arg_names, args): + func_state[name] = value + + # # Set keyword arguments + for name, value in kwargs.items(): + func_state[name] = value + + # Handle variable arguments + if func_def.args.vararg: + vararg_name = func_def.args.vararg.arg + func_state[vararg_name] = args + + if func_def.args.kwarg: + kwarg_name = func_def.args.kwarg.arg + func_state[kwarg_name] = kwargs + + # Set default values for arguments that were not provided + for name, value in defaults.items(): + if name not in func_state: + func_state[name] = value + + # Update function state with self and __class__ + if func_def.args.args and func_def.args.args[0].arg == "self": + if args: + func_state["self"] = args[0] + func_state["__class__"] = args[0].__class__ + + result = None + try: + for stmt in func_def.body: + result = evaluate_ast(stmt, func_state, static_tools, custom_tools) + except ReturnException as e: + result = e.value + return result + + return new_func + + +def create_class(class_name, class_bases, class_body): + class_dict = {} + for key, value in class_body.items(): + class_dict[key] = value + return type(class_name, tuple(class_bases), class_dict) + + +def evaluate_function_def(func_def, state, static_tools, custom_tools): + custom_tools[func_def.name] = create_function(func_def, state, static_tools, custom_tools) + return custom_tools[func_def.name] + + +def evaluate_class_def(class_def, state, static_tools, custom_tools): + class_name = class_def.name + bases = [evaluate_ast(base, state, static_tools, custom_tools) for base in class_def.bases] + class_dict = {} + + for stmt in class_def.body: + if isinstance(stmt, ast.FunctionDef): + class_dict[stmt.name] = evaluate_function_def(stmt, state, static_tools, custom_tools) + elif isinstance(stmt, ast.Assign): + for target in stmt.targets: + if isinstance(target, ast.Name): + class_dict[target.id] = evaluate_ast(stmt.value, state, static_tools, custom_tools) + elif isinstance(target, ast.Attribute): + class_dict[target.attr] = evaluate_ast(stmt.value, state, static_tools, custom_tools) + else: + raise InterpreterError(f"Unsupported statement in class body: {stmt.__class__.__name__}") + + new_class = type(class_name, tuple(bases), class_dict) + state[class_name] = new_class + return new_class + + +def evaluate_augassign(expression, state, static_tools, custom_tools): + # Helper function to get current value and set new value based on the target type + def get_current_value(target): + if isinstance(target, ast.Name): + return state.get(target.id, 0) + elif isinstance(target, ast.Subscript): + obj = evaluate_ast(target.value, state, static_tools, custom_tools) + key = evaluate_ast(target.slice, state, static_tools, custom_tools) + return obj[key] + elif isinstance(target, ast.Attribute): + obj = evaluate_ast(target.value, state, static_tools, custom_tools) + return getattr(obj, target.attr) + elif isinstance(target, ast.Tuple): + return tuple(get_current_value(elt) for elt in target.elts) + elif isinstance(target, ast.List): + return [get_current_value(elt) for elt in target.elts] + else: + raise InterpreterError("AugAssign not supported for {type(target)} targets.") + + current_value = get_current_value(expression.target) + value_to_add = evaluate_ast(expression.value, state, static_tools, custom_tools) + + # Determine the operation and apply it + if isinstance(expression.op, ast.Add): + if isinstance(current_value, list): + if not isinstance(value_to_add, list): + raise InterpreterError(f"Cannot add non-list value {value_to_add} to a list.") + updated_value = current_value + value_to_add + else: + updated_value = current_value + value_to_add + elif isinstance(expression.op, ast.Sub): + updated_value = current_value - value_to_add + elif isinstance(expression.op, ast.Mult): + updated_value = current_value * value_to_add + elif isinstance(expression.op, ast.Div): + updated_value = current_value / value_to_add + elif isinstance(expression.op, ast.Mod): + updated_value = current_value % value_to_add + elif isinstance(expression.op, ast.Pow): + updated_value = current_value**value_to_add + elif isinstance(expression.op, ast.FloorDiv): + updated_value = current_value // value_to_add + elif isinstance(expression.op, ast.BitAnd): + updated_value = current_value & value_to_add + elif isinstance(expression.op, ast.BitOr): + updated_value = current_value | value_to_add + elif isinstance(expression.op, ast.BitXor): + updated_value = current_value ^ value_to_add + elif isinstance(expression.op, ast.LShift): + updated_value = current_value << value_to_add + elif isinstance(expression.op, ast.RShift): + updated_value = current_value >> value_to_add + else: + raise InterpreterError(f"Operation {type(expression.op).__name__} is not supported.") + + # Update the state + set_value(expression.target, updated_value, state, static_tools, custom_tools) + + return updated_value + + +def evaluate_boolop(node, state, static_tools, custom_tools): + if isinstance(node.op, ast.And): + for value in node.values: + if not evaluate_ast(value, state, static_tools, custom_tools): + return False + return True + elif isinstance(node.op, ast.Or): + for value in node.values: + if evaluate_ast(value, state, static_tools, custom_tools): + return True + return False + + +def evaluate_binop(binop, state, static_tools, custom_tools): + # Recursively evaluate the left and right operands + left_val = evaluate_ast(binop.left, state, static_tools, custom_tools) + right_val = evaluate_ast(binop.right, state, static_tools, custom_tools) + + # Determine the operation based on the type of the operator in the BinOp + if isinstance(binop.op, ast.Add): + return left_val + right_val + elif isinstance(binop.op, ast.Sub): + return left_val - right_val + elif isinstance(binop.op, ast.Mult): + return left_val * right_val + elif isinstance(binop.op, ast.Div): + return left_val / right_val + elif isinstance(binop.op, ast.Mod): + return left_val % right_val + elif isinstance(binop.op, ast.Pow): + return left_val**right_val + elif isinstance(binop.op, ast.FloorDiv): + return left_val // right_val + elif isinstance(binop.op, ast.BitAnd): + return left_val & right_val + elif isinstance(binop.op, ast.BitOr): + return left_val | right_val + elif isinstance(binop.op, ast.BitXor): + return left_val ^ right_val + elif isinstance(binop.op, ast.LShift): + return left_val << right_val + elif isinstance(binop.op, ast.RShift): + return left_val >> right_val + else: + raise NotImplementedError(f"Binary operation {type(binop.op).__name__} is not implemented.") + + +def evaluate_assign(assign, state, static_tools, custom_tools): + result = evaluate_ast(assign.value, state, static_tools, custom_tools) + if len(assign.targets) == 1: + target = assign.targets[0] + set_value(target, result, state, static_tools, custom_tools) + else: + if len(assign.targets) != len(result): + raise InterpreterError(f"Assign failed: expected {len(result)} values but got {len(assign.targets)}.") + expanded_values = [] + for tgt in assign.targets: + if isinstance(tgt, ast.Starred): + expanded_values.extend(result) + else: + expanded_values.append(result) + for tgt, val in zip(assign.targets, expanded_values): + set_value(tgt, val, state, static_tools, custom_tools) + return result + + +def set_value(target, value, state, static_tools, custom_tools): + if isinstance(target, ast.Name): + if target.id in static_tools: + raise InterpreterError(f"Cannot assign to name '{target.id}': doing this would erase the existing tool!") + state[target.id] = value + elif isinstance(target, ast.Tuple): + if not isinstance(value, tuple): + if hasattr(value, "__iter__") and not isinstance(value, (str, bytes)): + value = tuple(value) + else: + raise InterpreterError("Cannot unpack non-tuple value") + if len(target.elts) != len(value): + raise InterpreterError("Cannot unpack tuple of wrong size") + for i, elem in enumerate(target.elts): + set_value(elem, value[i], state, static_tools, custom_tools) + elif isinstance(target, ast.Subscript): + obj = evaluate_ast(target.value, state, static_tools, custom_tools) + key = evaluate_ast(target.slice, state, static_tools, custom_tools) + obj[key] = value + elif isinstance(target, ast.Attribute): + obj = evaluate_ast(target.value, state, static_tools, custom_tools) + setattr(obj, target.attr, value) + + +def evaluate_call(call, state, static_tools, custom_tools): + if not (isinstance(call.func, ast.Attribute) or isinstance(call.func, ast.Name)): + raise InterpreterError(f"This is not a correct function: {call.func}).") + if isinstance(call.func, ast.Attribute): + obj = evaluate_ast(call.func.value, state, static_tools, custom_tools) + func_name = call.func.attr + if not hasattr(obj, func_name): + raise InterpreterError(f"Object {obj} has no attribute {func_name}") + func = getattr(obj, func_name) + + elif isinstance(call.func, ast.Name): + func_name = call.func.id + if func_name in state: + func = state[func_name] + elif func_name in static_tools: + func = static_tools[func_name] + elif func_name in custom_tools: + func = custom_tools[func_name] + elif func_name in ERRORS: + func = ERRORS[func_name] + else: + raise InterpreterError( + f"It is not permitted to evaluate other functions than the provided tools or functions defined in previous code (tried to execute {call.func.id})." + ) + + args = [] + for arg in call.args: + if isinstance(arg, ast.Starred): + args.extend(evaluate_ast(arg.value, state, static_tools, custom_tools)) + else: + args.append(evaluate_ast(arg, state, static_tools, custom_tools)) + + args = [] + for arg in call.args: + if isinstance(arg, ast.Starred): + unpacked = evaluate_ast(arg.value, state, static_tools, custom_tools) + if not hasattr(unpacked, "__iter__") or isinstance(unpacked, (str, bytes)): + raise InterpreterError(f"Cannot unpack non-iterable value {unpacked}") + args.extend(unpacked) + else: + args.append(evaluate_ast(arg, state, static_tools, custom_tools)) + + kwargs = {keyword.arg: evaluate_ast(keyword.value, state, static_tools, custom_tools) for keyword in call.keywords} + + if isinstance(func, type) and len(func.__module__.split(".")) > 1: # Check for user-defined classes + # Instantiate the class using its constructor + obj = func.__new__(func) # Create a new instance of the class + if hasattr(obj, "__init__"): # Check if the class has an __init__ method + obj.__init__(*args, **kwargs) # Call the __init__ method correctly + return obj + else: + if func_name == "super": + if not args: + if "__class__" in state and "self" in state: + return super(state["__class__"], state["self"]) + else: + raise InterpreterError("super() needs at least one argument") + cls = args[0] + if not isinstance(cls, type): + raise InterpreterError("super() argument 1 must be type") + if len(args) == 1: + return super(cls) + elif len(args) == 2: + instance = args[1] + return super(cls, instance) + else: + raise InterpreterError("super() takes at most 2 arguments") + else: + if func_name == "print": + output = " ".join(map(str, args)) + global PRINT_OUTPUTS + PRINT_OUTPUTS += output + "\n" + # cap the number of lines + return None + else: # Assume it's a callable object + output = func(*args, **kwargs) + return output + + +def evaluate_subscript(subscript, state, static_tools, custom_tools): + index = evaluate_ast(subscript.slice, state, static_tools, custom_tools) + value = evaluate_ast(subscript.value, state, static_tools, custom_tools) + + if isinstance(value, str) and isinstance(index, str): + raise InterpreterError("You're trying to subscript a string with a string index, which is impossible") + if isinstance(value, pd.core.indexing._LocIndexer): + parent_object = value.obj + return parent_object.loc[index] + if isinstance(value, (pd.DataFrame, pd.Series, np.ndarray)): + return value[index] + elif isinstance(value, pd.core.groupby.generic.DataFrameGroupBy): + return value[index] + elif isinstance(index, slice): + return value[index] + elif isinstance(value, (list, tuple)): + if not (-len(value) <= index < len(value)): + raise InterpreterError(f"Index {index} out of bounds for list of length {len(value)}") + return value[int(index)] + elif isinstance(value, str): + if not (-len(value) <= index < len(value)): + raise InterpreterError(f"Index {index} out of bounds for string of length {len(value)}") + return value[index] + elif index in value: + return value[index] + elif isinstance(index, str) and isinstance(value, Mapping): + close_matches = difflib.get_close_matches(index, list(value.keys())) + if len(close_matches) > 0: + return value[close_matches[0]] + raise InterpreterError(f"Could not index {value} with '{index}'.") + + +def evaluate_name(name, state, static_tools, custom_tools): + if name.id in state: + return state[name.id] + elif name.id in static_tools: + return static_tools[name.id] + elif name.id in ERRORS: + return ERRORS[name.id] + close_matches = difflib.get_close_matches(name.id, list(state.keys())) + if len(close_matches) > 0: + return state[close_matches[0]] + raise InterpreterError(f"The variable `{name.id}` is not defined.") + + +def evaluate_condition(condition, state, static_tools, custom_tools): + left = evaluate_ast(condition.left, state, static_tools, custom_tools) + comparators = [evaluate_ast(c, state, static_tools, custom_tools) for c in condition.comparators] + ops = [type(op) for op in condition.ops] + + result = True + current_left = left + + for op, comparator in zip(ops, comparators): + if op == ast.Eq: + current_result = current_left == comparator + elif op == ast.NotEq: + current_result = current_left != comparator + elif op == ast.Lt: + current_result = current_left < comparator + elif op == ast.LtE: + current_result = current_left <= comparator + elif op == ast.Gt: + current_result = current_left > comparator + elif op == ast.GtE: + current_result = current_left >= comparator + elif op == ast.Is: + current_result = current_left is comparator + elif op == ast.IsNot: + current_result = current_left is not comparator + elif op == ast.In: + current_result = current_left in comparator + elif op == ast.NotIn: + current_result = current_left not in comparator + else: + raise InterpreterError(f"Operator not supported: {op}") + + result = result & current_result + current_left = comparator + + if isinstance(result, bool) and not result: + break + + return result if isinstance(result, (bool, pd.Series)) else result.all() + + +def evaluate_if(if_statement, state, static_tools, custom_tools): + result = None + test_result = evaluate_ast(if_statement.test, state, static_tools, custom_tools) + if test_result: + for line in if_statement.body: + line_result = evaluate_ast(line, state, static_tools, custom_tools) + if line_result is not None: + result = line_result + else: + for line in if_statement.orelse: + line_result = evaluate_ast(line, state, static_tools, custom_tools) + if line_result is not None: + result = line_result + return result + + +def evaluate_for(for_loop, state, static_tools, custom_tools): + result = None + iterator = evaluate_ast(for_loop.iter, state, static_tools, custom_tools) + for counter in iterator: + set_value(for_loop.target, counter, state, static_tools, custom_tools) + for node in for_loop.body: + try: + line_result = evaluate_ast(node, state, static_tools, custom_tools) + if line_result is not None: + result = line_result + except BreakException: + break + except ContinueException: + continue + else: + continue + break + return result + + +def evaluate_listcomp(listcomp, state, static_tools, custom_tools): + def inner_evaluate(generators, index, current_state): + if index >= len(generators): + return [evaluate_ast(listcomp.elt, current_state, static_tools, custom_tools)] + generator = generators[index] + iter_value = evaluate_ast(generator.iter, current_state, static_tools, custom_tools) + result = [] + for value in iter_value: + new_state = current_state.copy() + if isinstance(generator.target, ast.Tuple): + for idx, elem in enumerate(generator.target.elts): + new_state[elem.id] = value[idx] + else: + new_state[generator.target.id] = value + if all(evaluate_ast(if_clause, new_state, static_tools, custom_tools) for if_clause in generator.ifs): + result.extend(inner_evaluate(generators, index + 1, new_state)) + return result + + return inner_evaluate(listcomp.generators, 0, state) + + +def evaluate_try(try_node, state, static_tools, custom_tools): + try: + for stmt in try_node.body: + evaluate_ast(stmt, state, static_tools, custom_tools) + except Exception as e: + matched = False + for handler in try_node.handlers: + if handler.type is None or isinstance(e, evaluate_ast(handler.type, state, static_tools, custom_tools)): + matched = True + if handler.name: + state[handler.name] = e + for stmt in handler.body: + evaluate_ast(stmt, state, static_tools, custom_tools) + break + if not matched: + raise e + else: + if try_node.orelse: + for stmt in try_node.orelse: + evaluate_ast(stmt, state, static_tools, custom_tools) + finally: + if try_node.finalbody: + for stmt in try_node.finalbody: + evaluate_ast(stmt, state, static_tools, custom_tools) + + +def evaluate_raise(raise_node, state, static_tools, custom_tools): + if raise_node.exc is not None: + exc = evaluate_ast(raise_node.exc, state, static_tools, custom_tools) + else: + exc = None + if raise_node.cause is not None: + cause = evaluate_ast(raise_node.cause, state, static_tools, custom_tools) + else: + cause = None + if exc is not None: + if cause is not None: + raise exc from cause + else: + raise exc + else: + raise InterpreterError("Re-raise is not supported without an active exception") + + +def evaluate_assert(assert_node, state, static_tools, custom_tools): + test_result = evaluate_ast(assert_node.test, state, static_tools, custom_tools) + if not test_result: + if assert_node.msg: + msg = evaluate_ast(assert_node.msg, state, static_tools, custom_tools) + raise AssertionError(msg) + else: + # Include the failing condition in the assertion message + test_code = ast.unparse(assert_node.test) + raise AssertionError(f"Assertion failed: {test_code}") + + +def evaluate_with(with_node, state, static_tools, custom_tools): + contexts = [] + for item in with_node.items: + context_expr = evaluate_ast(item.context_expr, state, static_tools, custom_tools) + if item.optional_vars: + state[item.optional_vars.id] = context_expr.__enter__() + contexts.append(state[item.optional_vars.id]) + else: + context_var = context_expr.__enter__() + contexts.append(context_var) + + try: + for stmt in with_node.body: + evaluate_ast(stmt, state, static_tools, custom_tools) + except Exception as e: + for context in reversed(contexts): + context.__exit__(type(e), e, e.__traceback__) + raise + else: + for context in reversed(contexts): + context.__exit__(None, None, None) + + +def import_modules(expression, state, authorized_imports): + def check_module_authorized(module_name): + module_path = module_name.split(".") + module_subpaths = [".".join(module_path[:i]) for i in range(1, len(module_path) + 1)] + return any(subpath in authorized_imports for subpath in module_subpaths) + + if isinstance(expression, ast.Import): + for alias in expression.names: + if check_module_authorized(alias.name): + module = import_module(alias.name) + state[alias.asname or alias.name] = module + else: + raise InterpreterError( + f"Import of {alias.name} is not allowed. Authorized imports are: {str(authorized_imports)}" + ) + return None + elif isinstance(expression, ast.ImportFrom): + if check_module_authorized(expression.module): + module = __import__(expression.module, fromlist=[alias.name for alias in expression.names]) + for alias in expression.names: + state[alias.asname or alias.name] = getattr(module, alias.name) + else: + raise InterpreterError(f"Import from {expression.module} is not allowed.") + return None + + +def evaluate_dictcomp(dictcomp, state, static_tools, custom_tools): + result = {} + for gen in dictcomp.generators: + iter_value = evaluate_ast(gen.iter, state, static_tools, custom_tools) + for value in iter_value: + new_state = state.copy() + set_value(gen.target, value, new_state, static_tools, custom_tools) + if all(evaluate_ast(if_clause, new_state, static_tools, custom_tools) for if_clause in gen.ifs): + key = evaluate_ast(dictcomp.key, new_state, static_tools, custom_tools) + val = evaluate_ast(dictcomp.value, new_state, static_tools, custom_tools) + result[key] = val + return result + + +def evaluate_ast( + expression: ast.AST, + state: Dict[str, Any], + static_tools: Dict[str, Callable], + custom_tools: Dict[str, Callable], + authorized_imports: List[str] = LIST_SAFE_MODULES, +): + """ + Evaluate an abstract syntax tree using the content of the variables stored in a state and only evaluating a given + set of functions. + + This function will recurse trough the nodes of the tree provided. + + Args: + expression (`ast.AST`): + The code to evaluate, as an abstract syntax tree. + state (`Dict[str, Any]`): + A dictionary mapping variable names to values. The `state` is updated if need be when the evaluation + encounters assignements. + static_tools (`Dict[str, Callable]`): + Functions that may be called during the evaluation. Trying to change one of these static_tools will raise an error. + custom_tools (`Dict[str, Callable]`): + Functions that may be called during the evaluation. These static_tools can be overwritten. + authorized_imports (`List[str]`): + The list of modules that can be imported by the code. By default, only a few safe modules are allowed. + Add more at your own risk! + """ + global OPERATIONS_COUNT + if OPERATIONS_COUNT >= MAX_OPERATIONS: + raise InterpreterError( + f"Reached the max number of operations of {MAX_OPERATIONS}. Maybe there is an infinite loop somewhere in the code, or you're just asking too many calculations." + ) + OPERATIONS_COUNT += 1 + if isinstance(expression, ast.Assign): + # Assignement -> we evaluate the assignment which should update the state + # We return the variable assigned as it may be used to determine the final result. + return evaluate_assign(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.AugAssign): + return evaluate_augassign(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Call): + # Function call -> we return the value of the function call + return evaluate_call(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Constant): + # Constant -> just return the value + return expression.value + elif isinstance(expression, ast.Tuple): + return tuple(evaluate_ast(elt, state, static_tools, custom_tools) for elt in expression.elts) + elif isinstance(expression, (ast.ListComp, ast.GeneratorExp)): + return evaluate_listcomp(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.UnaryOp): + return evaluate_unaryop(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Starred): + return evaluate_ast(expression.value, state, static_tools, custom_tools) + elif isinstance(expression, ast.BoolOp): + # Boolean operation -> evaluate the operation + return evaluate_boolop(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Break): + raise BreakException() + elif isinstance(expression, ast.Continue): + raise ContinueException() + elif isinstance(expression, ast.BinOp): + # Binary operation -> execute operation + return evaluate_binop(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Compare): + # Comparison -> evaluate the comparison + return evaluate_condition(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Lambda): + return evaluate_lambda(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.FunctionDef): + return evaluate_function_def(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Dict): + # Dict -> evaluate all keys and values + keys = [evaluate_ast(k, state, static_tools, custom_tools) for k in expression.keys] + values = [evaluate_ast(v, state, static_tools, custom_tools) for v in expression.values] + return dict(zip(keys, values)) + elif isinstance(expression, ast.Expr): + # Expression -> evaluate the content + return evaluate_ast(expression.value, state, static_tools, custom_tools) + elif isinstance(expression, ast.For): + # For loop -> execute the loop + return evaluate_for(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.FormattedValue): + # Formatted value (part of f-string) -> evaluate the content and return + return evaluate_ast(expression.value, state, static_tools, custom_tools) + elif isinstance(expression, ast.If): + # If -> execute the right branch + return evaluate_if(expression, state, static_tools, custom_tools) + elif hasattr(ast, "Index") and isinstance(expression, ast.Index): + return evaluate_ast(expression.value, state, static_tools, custom_tools) + elif isinstance(expression, ast.JoinedStr): + return "".join([str(evaluate_ast(v, state, static_tools, custom_tools)) for v in expression.values]) + elif isinstance(expression, ast.List): + # List -> evaluate all elements + return [evaluate_ast(elt, state, static_tools, custom_tools) for elt in expression.elts] + elif isinstance(expression, ast.Name): + # Name -> pick up the value in the state + return evaluate_name(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Subscript): + # Subscript -> return the value of the indexing + return evaluate_subscript(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.IfExp): + test_val = evaluate_ast(expression.test, state, static_tools, custom_tools) + if test_val: + return evaluate_ast(expression.body, state, static_tools, custom_tools) + else: + return evaluate_ast(expression.orelse, state, static_tools, custom_tools) + elif isinstance(expression, ast.Attribute): + value = evaluate_ast(expression.value, state, static_tools, custom_tools) + return getattr(value, expression.attr) + elif isinstance(expression, ast.Slice): + return slice( + evaluate_ast(expression.lower, state, static_tools, custom_tools) + if expression.lower is not None + else None, + evaluate_ast(expression.upper, state, static_tools, custom_tools) + if expression.upper is not None + else None, + evaluate_ast(expression.step, state, static_tools, custom_tools) if expression.step is not None else None, + ) + elif isinstance(expression, ast.DictComp): + return evaluate_dictcomp(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.While): + return evaluate_while(expression, state, static_tools, custom_tools) + elif isinstance(expression, (ast.Import, ast.ImportFrom)): + return import_modules(expression, state, authorized_imports) + elif isinstance(expression, ast.ClassDef): + return evaluate_class_def(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Try): + return evaluate_try(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Raise): + return evaluate_raise(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Assert): + return evaluate_assert(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.With): + return evaluate_with(expression, state, static_tools, custom_tools) + elif isinstance(expression, ast.Set): + return {evaluate_ast(elt, state, static_tools, custom_tools) for elt in expression.elts} + elif isinstance(expression, ast.Return): + raise ReturnException( + evaluate_ast(expression.value, state, static_tools, custom_tools) if expression.value else None + ) + else: + # For now we refuse anything else. Let's add things as we need them. + raise InterpreterError(f"{expression.__class__.__name__} is not supported.") + + +def truncate_print_outputs(print_outputs: str, max_len_outputs: int = MAX_LEN_OUTPUT) -> str: + if len(print_outputs) < max_len_outputs: + return print_outputs + else: + return f"Print outputs:\n{print_outputs[:max_len_outputs]}\n_Print outputs have been truncated over the limit of {max_len_outputs} characters._\n" + + +def evaluate_python_code( + code: str, + static_tools: Optional[Dict[str, Callable]] = None, + custom_tools: Optional[Dict[str, Callable]] = None, + state: Optional[Dict[str, Any]] = None, + authorized_imports: List[str] = LIST_SAFE_MODULES, +): + """ + Evaluate a python expression using the content of the variables stored in a state and only evaluating a given set + of functions. + + This function will recurse through the nodes of the tree provided. + + Args: + code (`str`): + The code to evaluate. + static_tools (`Dict[str, Callable]`): + The functions that may be called during the evaluation. + These tools cannot be overwritten in the code: any assignment to their name will raise an error. + custom_tools (`Dict[str, Callable]`): + The functions that may be called during the evaluation. + These tools can be overwritten in the code: any assignment to their name will overwrite them. + state (`Dict[str, Any]`): + A dictionary mapping variable names to values. The `state` should contain the initial inputs but will be + updated by this function to contain all variables as they are evaluated. + The print outputs will be stored in the state under the key 'print_outputs'. + """ + try: + expression = ast.parse(code) + except SyntaxError as e: + raise SyntaxError(f"The code generated by the agent is not valid.\n{e}") + if state is None: + state = {} + if static_tools is None: + static_tools = {} + if custom_tools is None: + custom_tools = {} + result = None + global PRINT_OUTPUTS + PRINT_OUTPUTS = "" + global OPERATIONS_COUNT + OPERATIONS_COUNT = 0 + try: + for node in expression.body: + result = evaluate_ast(node, state, static_tools, custom_tools, authorized_imports) + state["print_outputs"] = truncate_print_outputs(PRINT_OUTPUTS, max_len_outputs=MAX_LEN_OUTPUT) + return result + except InterpreterError as e: + msg = truncate_print_outputs(PRINT_OUTPUTS, max_len_outputs=MAX_LEN_OUTPUT) + msg += f"EXECUTION FAILED:\nEvaluation stopped at line '{ast.get_source_segment(code, node)}' because of the following error:\n{e}" + raise InterpreterError(msg) diff --git a/agents/search.py b/agents/search.py new file mode 100644 index 0000000..1c2c339 --- /dev/null +++ b/agents/search.py @@ -0,0 +1,77 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import re + +import requests +from requests.exceptions import RequestException + +from .tools import Tool + + +class DuckDuckGoSearchTool(Tool): + name = "web_search" + description = """Perform a web search based on your query (think a Google search) then returns the top search results as a list of dict elements. + Each result has keys 'title', 'href' and 'body'.""" + inputs = {"query": {"type": "string", "description": "The search query to perform."}} + output_type = "any" + + def forward(self, query: str) -> str: + try: + from duckduckgo_search import DDGS + except ImportError: + raise ImportError( + "You must install package `duckduckgo_search` to run this tool: for instance run `pip install duckduckgo-search`." + ) + results = DDGS().text(query, max_results=7) + return results + + +class VisitWebpageTool(Tool): + name = "visit_webpage" + description = "Visits a webpage at the given url and returns its content as a markdown string." + inputs = { + "url": { + "type": "string", + "description": "The url of the webpage to visit.", + } + } + output_type = "string" + + def forward(self, url: str) -> str: + try: + from markdownify import markdownify + except ImportError: + raise ImportError( + "You must install package `markdownify` to run this tool: for instance run `pip install markdownify`." + ) + try: + # Send a GET request to the URL + response = requests.get(url) + response.raise_for_status() # Raise an exception for bad status codes + + # Convert the HTML content to Markdown + markdown_content = markdownify(response.text).strip() + + # Remove multiple line breaks + markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content) + + return markdown_content + + except RequestException as e: + return f"Error fetching the webpage: {str(e)}" + except Exception as e: + return f"An unexpected error occurred: {str(e)}" diff --git a/agents/speech_to_text.py b/agents/speech_to_text.py new file mode 100644 index 0000000..8061651 --- /dev/null +++ b/agents/speech_to_text.py @@ -0,0 +1,39 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2024 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from ..models.whisper import WhisperForConditionalGeneration, WhisperProcessor +from .tools import PipelineTool + + +class SpeechToTextTool(PipelineTool): + default_checkpoint = "distil-whisper/distil-large-v3" + description = "This is a tool that transcribes an audio into text. It returns the transcribed text." + name = "transcriber" + pre_processor_class = WhisperProcessor + model_class = WhisperForConditionalGeneration + + inputs = {"audio": {"type": "audio", "description": "The audio to transcribe"}} + output_type = "string" + + def encode(self, audio): + return self.pre_processor(audio, return_tensors="pt") + + def forward(self, inputs): + return self.model.generate(inputs["input_features"]) + + def decode(self, outputs): + return self.pre_processor.batch_decode(outputs, skip_special_tokens=True)[0] diff --git a/agents/text_to_speech.py b/agents/text_to_speech.py new file mode 100644 index 0000000..ed41ef6 --- /dev/null +++ b/agents/text_to_speech.py @@ -0,0 +1,67 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2024 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch + +from ..models.speecht5 import SpeechT5ForTextToSpeech, SpeechT5HifiGan, SpeechT5Processor +from ..utils import is_datasets_available +from .tools import PipelineTool + + +if is_datasets_available(): + from datasets import load_dataset + + +class TextToSpeechTool(PipelineTool): + default_checkpoint = "microsoft/speecht5_tts" + description = ( + "This is a tool that reads an English text out loud. It returns a waveform object containing the sound." + ) + name = "text_to_speech" + pre_processor_class = SpeechT5Processor + model_class = SpeechT5ForTextToSpeech + post_processor_class = SpeechT5HifiGan + + inputs = {"text": {"type": "string", "description": "The text to read out loud (in English)"}} + output_type = "audio" + + def setup(self): + if self.post_processor is None: + self.post_processor = "microsoft/speecht5_hifigan" + super().setup() + + def encode(self, text, speaker_embeddings=None): + inputs = self.pre_processor(text=text, return_tensors="pt", truncation=True) + + if speaker_embeddings is None: + if not is_datasets_available(): + raise ImportError("Datasets needs to be installed if not passing speaker embeddings.") + + embeddings_dataset = load_dataset( + "Matthijs/cmu-arctic-xvectors", split="validation", trust_remote_code=True + ) + speaker_embeddings = torch.tensor(embeddings_dataset[7305]["xvector"]).unsqueeze(0) + + return {"input_ids": inputs["input_ids"], "speaker_embeddings": speaker_embeddings} + + def forward(self, inputs): + with torch.no_grad(): + return self.model.generate_speech(**inputs) + + def decode(self, outputs): + with torch.no_grad(): + return self.post_processor(outputs).cpu().detach() diff --git a/agents/tools.py b/agents/tools.py new file mode 100644 index 0000000..7597046 --- /dev/null +++ b/agents/tools.py @@ -0,0 +1,1003 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import ast +import base64 +import importlib +import inspect +import io +import json +import os +import tempfile +from functools import lru_cache, wraps +from pathlib import Path +from typing import Any, Callable, Dict, List, Optional, Union + +from huggingface_hub import create_repo, get_collection, hf_hub_download, metadata_update, upload_folder +from huggingface_hub.utils import RepositoryNotFoundError, build_hf_headers, get_session +from packaging import version + +from ..dynamic_module_utils import ( + custom_object_save, + get_class_from_dynamic_module, + get_imports, +) +from ..models.auto import AutoProcessor +from ..utils import ( + CONFIG_NAME, + TypeHintParsingException, + cached_file, + get_json_schema, + is_accelerate_available, + is_torch_available, + is_vision_available, + logging, +) +from .agent_types import ImageType, handle_agent_inputs, handle_agent_outputs + + +logger = logging.get_logger(__name__) + + +if is_torch_available(): + import torch + +if is_accelerate_available(): + from accelerate import PartialState + from accelerate.utils import send_to_device + + +TOOL_CONFIG_FILE = "tool_config.json" + + +def get_repo_type(repo_id, repo_type=None, **hub_kwargs): + if repo_type is not None: + return repo_type + try: + hf_hub_download(repo_id, TOOL_CONFIG_FILE, repo_type="space", **hub_kwargs) + return "space" + except RepositoryNotFoundError: + try: + hf_hub_download(repo_id, TOOL_CONFIG_FILE, repo_type="model", **hub_kwargs) + return "model" + except RepositoryNotFoundError: + raise EnvironmentError(f"`{repo_id}` does not seem to be a valid repo identifier on the Hub.") + except Exception: + return "model" + except Exception: + return "space" + + +# docstyle-ignore +APP_FILE_TEMPLATE = """from transformers import launch_gradio_demo +from {module_name} import {class_name} + +launch_gradio_demo({class_name}) +""" + + +def validate_after_init(cls, do_validate_forward: bool = True): + original_init = cls.__init__ + + @wraps(original_init) + def new_init(self, *args, **kwargs): + original_init(self, *args, **kwargs) + if not isinstance(self, PipelineTool): + self.validate_arguments(do_validate_forward=do_validate_forward) + + cls.__init__ = new_init + return cls + + +CONVERSION_DICT = {"str": "string", "int": "integer", "float": "number"} + + +class Tool: + """ + A base class for the functions used by the agent. Subclass this and implement the `__call__` method as well as the + following class attributes: + + - **description** (`str`) -- A short description of what your tool does, the inputs it expects and the output(s) it + will return. For instance 'This is a tool that downloads a file from a `url`. It takes the `url` as input, and + returns the text contained in the file'. + - **name** (`str`) -- A performative name that will be used for your tool in the prompt to the agent. For instance + `"text-classifier"` or `"image_generator"`. + - **inputs** (`Dict[str, Dict[str, Union[str, type]]]`) -- The dict of modalities expected for the inputs. + It has one `type`key and a `description`key. + This is used by `launch_gradio_demo` or to make a nice space from your tool, and also can be used in the generated + description for your tool. + - **output_type** (`type`) -- The type of the tool output. This is used by `launch_gradio_demo` + or to make a nice space from your tool, and also can be used in the generated description for your tool. + + You can also override the method [`~Tool.setup`] if your tool as an expensive operation to perform before being + usable (such as loading a model). [`~Tool.setup`] will be called the first time you use your tool, but not at + instantiation. + """ + + name: str + description: str + inputs: Dict[str, Dict[str, Union[str, type]]] + output_type: type + + def __init__(self, *args, **kwargs): + self.is_initialized = False + + def __init_subclass__(cls, **kwargs): + super().__init_subclass__(**kwargs) + validate_after_init(cls, do_validate_forward=False) + + def validate_arguments(self, do_validate_forward: bool = True): + required_attributes = { + "description": str, + "name": str, + "inputs": dict, + "output_type": str, + } + authorized_types = ["string", "integer", "number", "image", "audio", "any", "boolean"] + + for attr, expected_type in required_attributes.items(): + attr_value = getattr(self, attr, None) + if attr_value is None: + raise TypeError(f"You must set an attribute {attr}.") + if not isinstance(attr_value, expected_type): + raise TypeError( + f"Attribute {attr} should have type {expected_type.__name__}, got {type(attr_value)} instead." + ) + for input_name, input_content in self.inputs.items(): + assert isinstance(input_content, dict), f"Input '{input_name}' should be a dictionary." + assert ( + "type" in input_content and "description" in input_content + ), f"Input '{input_name}' should have keys 'type' and 'description', has only {list(input_content.keys())}." + if input_content["type"] not in authorized_types: + raise Exception( + f"Input '{input_name}': type '{input_content['type']}' is not an authorized value, should be one of {authorized_types}." + ) + + assert getattr(self, "output_type", None) in authorized_types + if do_validate_forward: + if not isinstance(self, PipelineTool): + signature = inspect.signature(self.forward) + if not set(signature.parameters.keys()) == set(self.inputs.keys()): + raise Exception( + "Tool's 'forward' method should take 'self' as its first argument, then its next arguments should match the keys of tool attribute 'inputs'." + ) + + def forward(self, *args, **kwargs): + return NotImplemented("Write this method in your subclass of `Tool`.") + + def __call__(self, *args, **kwargs): + args, kwargs = handle_agent_inputs(*args, **kwargs) + outputs = self.forward(*args, **kwargs) + return handle_agent_outputs(outputs, self.output_type) + + def setup(self): + """ + Overwrite this method here for any operation that is expensive and needs to be executed before you start using + your tool. Such as loading a big model. + """ + self.is_initialized = True + + def save(self, output_dir): + """ + Saves the relevant code files for your tool so it can be pushed to the Hub. This will copy the code of your + tool in `output_dir` as well as autogenerate: + + - a config file named `tool_config.json` + - an `app.py` file so that your tool can be converted to a space + - a `requirements.txt` containing the names of the module used by your tool (as detected when inspecting its + code) + + You should only use this method to save tools that are defined in a separate module (not `__main__`). + + Args: + output_dir (`str`): The folder in which you want to save your tool. + """ + os.makedirs(output_dir, exist_ok=True) + # Save module file + if self.__module__ == "__main__": + raise ValueError( + f"We can't save the code defining {self} in {output_dir} as it's been defined in __main__. You " + "have to put this code in a separate module so we can include it in the saved folder." + ) + module_files = custom_object_save(self, output_dir) + + module_name = self.__class__.__module__ + last_module = module_name.split(".")[-1] + full_name = f"{last_module}.{self.__class__.__name__}" + + # Save config file + config_file = os.path.join(output_dir, "tool_config.json") + if os.path.isfile(config_file): + with open(config_file, "r", encoding="utf-8") as f: + tool_config = json.load(f) + else: + tool_config = {} + + tool_config = { + "tool_class": full_name, + "description": self.description, + "name": self.name, + "inputs": self.inputs, + "output_type": str(self.output_type), + } + with open(config_file, "w", encoding="utf-8") as f: + f.write(json.dumps(tool_config, indent=2, sort_keys=True) + "\n") + + # Save app file + app_file = os.path.join(output_dir, "app.py") + with open(app_file, "w", encoding="utf-8") as f: + f.write(APP_FILE_TEMPLATE.format(module_name=last_module, class_name=self.__class__.__name__)) + + # Save requirements file + requirements_file = os.path.join(output_dir, "requirements.txt") + imports = [] + for module in module_files: + imports.extend(get_imports(module)) + imports = list(set(imports)) + with open(requirements_file, "w", encoding="utf-8") as f: + f.write("\n".join(imports) + "\n") + + @classmethod + def from_hub( + cls, + repo_id: str, + token: Optional[str] = None, + **kwargs, + ): + """ + Loads a tool defined on the Hub. + + + + Loading a tool from the Hub means that you'll download the tool and execute it locally. + ALWAYS inspect the tool you're downloading before loading it within your runtime, as you would do when + installing a package using pip/npm/apt. + + + + Args: + repo_id (`str`): + The name of the repo on the Hub where your tool is defined. + token (`str`, *optional*): + The token to identify you on hf.co. If unset, will use the token generated when running + `huggingface-cli login` (stored in `~/.huggingface`). + kwargs (additional keyword arguments, *optional*): + Additional keyword arguments that will be split in two: all arguments relevant to the Hub (such as + `cache_dir`, `revision`, `subfolder`) will be used when downloading the files for your tool, and the + others will be passed along to its init. + """ + hub_kwargs_names = [ + "cache_dir", + "force_download", + "resume_download", + "proxies", + "revision", + "repo_type", + "subfolder", + "local_files_only", + ] + hub_kwargs = {k: v for k, v in kwargs.items() if k in hub_kwargs_names} + + # Try to get the tool config first. + hub_kwargs["repo_type"] = get_repo_type(repo_id, **hub_kwargs) + resolved_config_file = cached_file( + repo_id, + TOOL_CONFIG_FILE, + token=token, + **hub_kwargs, + _raise_exceptions_for_gated_repo=False, + _raise_exceptions_for_missing_entries=False, + _raise_exceptions_for_connection_errors=False, + ) + is_tool_config = resolved_config_file is not None + if resolved_config_file is None: + resolved_config_file = cached_file( + repo_id, + CONFIG_NAME, + token=token, + **hub_kwargs, + _raise_exceptions_for_gated_repo=False, + _raise_exceptions_for_missing_entries=False, + _raise_exceptions_for_connection_errors=False, + ) + if resolved_config_file is None: + raise EnvironmentError( + f"{repo_id} does not appear to provide a valid configuration in `tool_config.json` or `config.json`." + ) + + with open(resolved_config_file, encoding="utf-8") as reader: + config = json.load(reader) + + if not is_tool_config: + if "custom_tool" not in config: + raise EnvironmentError( + f"{repo_id} does not provide a mapping to custom tools in its configuration `config.json`." + ) + custom_tool = config["custom_tool"] + else: + custom_tool = config + + tool_class = custom_tool["tool_class"] + tool_class = get_class_from_dynamic_module(tool_class, repo_id, token=token, **hub_kwargs) + + if len(tool_class.name) == 0: + tool_class.name = custom_tool["name"] + if tool_class.name != custom_tool["name"]: + logger.warning( + f"{tool_class.__name__} implements a different name in its configuration and class. Using the tool " + "configuration name." + ) + tool_class.name = custom_tool["name"] + + if len(tool_class.description) == 0: + tool_class.description = custom_tool["description"] + if tool_class.description != custom_tool["description"]: + logger.warning( + f"{tool_class.__name__} implements a different description in its configuration and class. Using the " + "tool configuration description." + ) + tool_class.description = custom_tool["description"] + + if tool_class.inputs != custom_tool["inputs"]: + tool_class.inputs = custom_tool["inputs"] + if tool_class.output_type != custom_tool["output_type"]: + tool_class.output_type = custom_tool["output_type"] + + if not isinstance(tool_class.inputs, dict): + tool_class.inputs = ast.literal_eval(tool_class.inputs) + + return tool_class(**kwargs) + + def push_to_hub( + self, + repo_id: str, + commit_message: str = "Upload tool", + private: Optional[bool] = None, + token: Optional[Union[bool, str]] = None, + create_pr: bool = False, + ) -> str: + """ + Upload the tool to the Hub. + + For this method to work properly, your tool must have been defined in a separate module (not `__main__`). + For instance: + ``` + from my_tool_module import MyTool + my_tool = MyTool() + my_tool.push_to_hub("my-username/my-space") + ``` + + Parameters: + repo_id (`str`): + The name of the repository you want to push your tool to. It should contain your organization name when + pushing to a given organization. + commit_message (`str`, *optional*, defaults to `"Upload tool"`): + Message to commit while pushing. + private (`bool`, *optional*): + Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists. + token (`bool` or `str`, *optional*): + The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated + when running `huggingface-cli login` (stored in `~/.huggingface`). + create_pr (`bool`, *optional*, defaults to `False`): + Whether or not to create a PR with the uploaded files or directly commit. + """ + repo_url = create_repo( + repo_id=repo_id, + token=token, + private=private, + exist_ok=True, + repo_type="space", + space_sdk="gradio", + ) + repo_id = repo_url.repo_id + metadata_update(repo_id, {"tags": ["tool"]}, repo_type="space") + + with tempfile.TemporaryDirectory() as work_dir: + # Save all files. + self.save(work_dir) + logger.info(f"Uploading the following files to {repo_id}: {','.join(os.listdir(work_dir))}") + return upload_folder( + repo_id=repo_id, + commit_message=commit_message, + folder_path=work_dir, + token=token, + create_pr=create_pr, + repo_type="space", + ) + + @staticmethod + def from_space( + space_id: str, name: str, description: str, api_name: Optional[str] = None, token: Optional[str] = None + ): + """ + Creates a [`Tool`] from a Space given its id on the Hub. + + Args: + space_id (`str`): + The id of the Space on the Hub. + name (`str`): + The name of the tool. + description (`str`): + The description of the tool. + api_name (`str`, *optional*): + The specific api_name to use, if the space has several tabs. If not precised, will default to the first available api. + token (`str`, *optional*): + Add your token to access private spaces or increase your GPU quotas. + Returns: + [`Tool`]: + The Space, as a tool. + + Examples: + ``` + image_generator = Tool.from_space( + space_id="black-forest-labs/FLUX.1-schnell", + name="image-generator", + description="Generate an image from a prompt" + ) + image = image_generator("Generate an image of a cool surfer in Tahiti") + ``` + ``` + face_swapper = Tool.from_space( + "tuan2308/face-swap", + "face_swapper", + "Tool that puts the face shown on the first image on the second image. You can give it paths to images.", + ) + image = face_swapper('./aymeric.jpeg', './ruth.jpg') + ``` + """ + from gradio_client import Client, handle_file + from gradio_client.utils import is_http_url_like + + class SpaceToolWrapper(Tool): + def __init__( + self, + space_id: str, + name: str, + description: str, + api_name: Optional[str] = None, + token: Optional[str] = None, + ): + self.client = Client(space_id, hf_token=token) + self.name = name + self.description = description + space_description = self.client.view_api(return_format="dict", print_info=False)["named_endpoints"] + + # If api_name is not defined, take the first of the available APIs for this space + if api_name is None: + api_name = list(space_description.keys())[0] + logger.warning( + f"Since `api_name` was not defined, it was automatically set to the first avilable API: `{api_name}`." + ) + self.api_name = api_name + + try: + space_description_api = space_description[api_name] + except KeyError: + raise KeyError(f"Could not find specified {api_name=} among available api names.") + + self.inputs = {} + for parameter in space_description_api["parameters"]: + if not parameter["parameter_has_default"]: + parameter_type = parameter["type"]["type"] + if parameter_type == "object": + parameter_type = "any" + self.inputs[parameter["parameter_name"]] = { + "type": parameter_type, + "description": parameter["python_type"]["description"], + } + output_component = space_description_api["returns"][0]["component"] + if output_component == "Image": + self.output_type = "image" + elif output_component == "Audio": + self.output_type = "audio" + else: + self.output_type = "any" + + def sanitize_argument_for_prediction(self, arg): + if isinstance(arg, ImageType): + temp_file = tempfile.NamedTemporaryFile(suffix=".png", delete=False) + arg.save(temp_file.name) + arg = temp_file.name + if (isinstance(arg, (str, Path)) and Path(arg).exists() and Path(arg).is_file()) or is_http_url_like( + arg + ): + arg = handle_file(arg) + return arg + + def forward(self, *args, **kwargs): + # Preprocess args and kwargs: + args = list(args) + for i, arg in enumerate(args): + args[i] = self.sanitize_argument_for_prediction(arg) + for arg_name, arg in kwargs.items(): + kwargs[arg_name] = self.sanitize_argument_for_prediction(arg) + + output = self.client.predict(*args, api_name=self.api_name, **kwargs) + if isinstance(output, tuple) or isinstance(output, list): + return output[ + 0 + ] # Sometime the space also returns the generation seed, in which case the result is at index 0 + return output + + return SpaceToolWrapper(space_id, name, description, api_name=api_name, token=token) + + @staticmethod + def from_gradio(gradio_tool): + """ + Creates a [`Tool`] from a gradio tool. + """ + import inspect + + class GradioToolWrapper(Tool): + def __init__(self, _gradio_tool): + self.name = _gradio_tool.name + self.description = _gradio_tool.description + self.output_type = "string" + self._gradio_tool = _gradio_tool + func_args = list(inspect.signature(_gradio_tool.run).parameters.items()) + self.inputs = { + key: {"type": CONVERSION_DICT[value.annotation], "description": ""} for key, value in func_args + } + self.forward = self._gradio_tool.run + + return GradioToolWrapper(gradio_tool) + + @staticmethod + def from_langchain(langchain_tool): + """ + Creates a [`Tool`] from a langchain tool. + """ + + class LangChainToolWrapper(Tool): + def __init__(self, _langchain_tool): + self.name = _langchain_tool.name.lower() + self.description = _langchain_tool.description + self.inputs = _langchain_tool.args.copy() + for input_content in self.inputs.values(): + if "title" in input_content: + input_content.pop("title") + input_content["description"] = "" + self.output_type = "string" + self.langchain_tool = _langchain_tool + + def forward(self, *args, **kwargs): + tool_input = kwargs.copy() + for index, argument in enumerate(args): + if index < len(self.inputs): + input_key = next(iter(self.inputs)) + tool_input[input_key] = argument + return self.langchain_tool.run(tool_input) + + return LangChainToolWrapper(langchain_tool) + + +DEFAULT_TOOL_DESCRIPTION_TEMPLATE = """ +- {{ tool.name }}: {{ tool.description }} + Takes inputs: {{tool.inputs}} + Returns an output of type: {{tool.output_type}} +""" + + +def get_tool_description_with_args(tool: Tool, description_template: str = DEFAULT_TOOL_DESCRIPTION_TEMPLATE) -> str: + compiled_template = compile_jinja_template(description_template) + rendered = compiled_template.render( + tool=tool, + ) + return rendered + + +@lru_cache +def compile_jinja_template(template): + try: + import jinja2 + from jinja2.exceptions import TemplateError + from jinja2.sandbox import ImmutableSandboxedEnvironment + except ImportError: + raise ImportError("template requires jinja2 to be installed.") + + if version.parse(jinja2.__version__) < version.parse("3.1.0"): + raise ImportError("template requires jinja2>=3.1.0 to be installed. Your version is " f"{jinja2.__version__}.") + + def raise_exception(message): + raise TemplateError(message) + + jinja_env = ImmutableSandboxedEnvironment(trim_blocks=True, lstrip_blocks=True) + jinja_env.globals["raise_exception"] = raise_exception + return jinja_env.from_string(template) + + +class PipelineTool(Tool): + """ + A [`Tool`] tailored towards Transformer models. On top of the class attributes of the base class [`Tool`], you will + need to specify: + + - **model_class** (`type`) -- The class to use to load the model in this tool. + - **default_checkpoint** (`str`) -- The default checkpoint that should be used when the user doesn't specify one. + - **pre_processor_class** (`type`, *optional*, defaults to [`AutoProcessor`]) -- The class to use to load the + pre-processor + - **post_processor_class** (`type`, *optional*, defaults to [`AutoProcessor`]) -- The class to use to load the + post-processor (when different from the pre-processor). + + Args: + model (`str` or [`PreTrainedModel`], *optional*): + The name of the checkpoint to use for the model, or the instantiated model. If unset, will default to the + value of the class attribute `default_checkpoint`. + pre_processor (`str` or `Any`, *optional*): + The name of the checkpoint to use for the pre-processor, or the instantiated pre-processor (can be a + tokenizer, an image processor, a feature extractor or a processor). Will default to the value of `model` if + unset. + post_processor (`str` or `Any`, *optional*): + The name of the checkpoint to use for the post-processor, or the instantiated pre-processor (can be a + tokenizer, an image processor, a feature extractor or a processor). Will default to the `pre_processor` if + unset. + device (`int`, `str` or `torch.device`, *optional*): + The device on which to execute the model. Will default to any accelerator available (GPU, MPS etc...), the + CPU otherwise. + device_map (`str` or `dict`, *optional*): + If passed along, will be used to instantiate the model. + model_kwargs (`dict`, *optional*): + Any keyword argument to send to the model instantiation. + token (`str`, *optional*): + The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated when + running `huggingface-cli login` (stored in `~/.huggingface`). + hub_kwargs (additional keyword arguments, *optional*): + Any additional keyword argument to send to the methods that will load the data from the Hub. + """ + + pre_processor_class = AutoProcessor + model_class = None + post_processor_class = AutoProcessor + default_checkpoint = None + description = "This is a pipeline tool" + name = "pipeline" + inputs = {"prompt": str} + output_type = str + + def __init__( + self, + model=None, + pre_processor=None, + post_processor=None, + device=None, + device_map=None, + model_kwargs=None, + token=None, + **hub_kwargs, + ): + if not is_torch_available(): + raise ImportError("Please install torch in order to use this tool.") + + if not is_accelerate_available(): + raise ImportError("Please install accelerate in order to use this tool.") + + if model is None: + if self.default_checkpoint is None: + raise ValueError("This tool does not implement a default checkpoint, you need to pass one.") + model = self.default_checkpoint + if pre_processor is None: + pre_processor = model + + self.model = model + self.pre_processor = pre_processor + self.post_processor = post_processor + self.device = device + self.device_map = device_map + self.model_kwargs = {} if model_kwargs is None else model_kwargs + if device_map is not None: + self.model_kwargs["device_map"] = device_map + self.hub_kwargs = hub_kwargs + self.hub_kwargs["token"] = token + + super().__init__() + + def setup(self): + """ + Instantiates the `pre_processor`, `model` and `post_processor` if necessary. + """ + if isinstance(self.pre_processor, str): + self.pre_processor = self.pre_processor_class.from_pretrained(self.pre_processor, **self.hub_kwargs) + + if isinstance(self.model, str): + self.model = self.model_class.from_pretrained(self.model, **self.model_kwargs, **self.hub_kwargs) + + if self.post_processor is None: + self.post_processor = self.pre_processor + elif isinstance(self.post_processor, str): + self.post_processor = self.post_processor_class.from_pretrained(self.post_processor, **self.hub_kwargs) + + if self.device is None: + if self.device_map is not None: + self.device = list(self.model.hf_device_map.values())[0] + else: + self.device = PartialState().default_device + + if self.device_map is None: + self.model.to(self.device) + + super().setup() + + def encode(self, raw_inputs): + """ + Uses the `pre_processor` to prepare the inputs for the `model`. + """ + return self.pre_processor(raw_inputs) + + def forward(self, inputs): + """ + Sends the inputs through the `model`. + """ + with torch.no_grad(): + return self.model(**inputs) + + def decode(self, outputs): + """ + Uses the `post_processor` to decode the model output. + """ + return self.post_processor(outputs) + + def __call__(self, *args, **kwargs): + args, kwargs = handle_agent_inputs(*args, **kwargs) + + if not self.is_initialized: + self.setup() + + encoded_inputs = self.encode(*args, **kwargs) + + tensor_inputs = {k: v for k, v in encoded_inputs.items() if isinstance(v, torch.Tensor)} + non_tensor_inputs = {k: v for k, v in encoded_inputs.items() if not isinstance(v, torch.Tensor)} + + encoded_inputs = send_to_device(tensor_inputs, self.device) + outputs = self.forward({**encoded_inputs, **non_tensor_inputs}) + outputs = send_to_device(outputs, "cpu") + decoded_outputs = self.decode(outputs) + + return handle_agent_outputs(decoded_outputs, self.output_type) + + +def launch_gradio_demo(tool_class: Tool): + """ + Launches a gradio demo for a tool. The corresponding tool class needs to properly implement the class attributes + `inputs` and `output_type`. + + Args: + tool_class (`type`): The class of the tool for which to launch the demo. + """ + try: + import gradio as gr + except ImportError: + raise ImportError("Gradio should be installed in order to launch a gradio demo.") + + tool = tool_class() + + def fn(*args, **kwargs): + return tool(*args, **kwargs) + + TYPE_TO_COMPONENT_CLASS_MAPPING = { + "image": gr.Image, + "audio": gr.Audio, + "string": gr.Textbox, + "integer": gr.Textbox, + "number": gr.Textbox, + } + + gradio_inputs = [] + for input_name, input_details in tool_class.inputs.items(): + input_gradio_component_class = TYPE_TO_COMPONENT_CLASS_MAPPING[input_details["type"]] + new_component = input_gradio_component_class(label=input_name) + gradio_inputs.append(new_component) + + output_gradio_componentclass = TYPE_TO_COMPONENT_CLASS_MAPPING[tool_class.output_type] + gradio_output = output_gradio_componentclass(label=input_name) + + gr.Interface( + fn=fn, + inputs=gradio_inputs, + outputs=gradio_output, + title=tool_class.__name__, + article=tool.description, + ).launch() + + +TOOL_MAPPING = { + "document_question_answering": "DocumentQuestionAnsweringTool", + "image_question_answering": "ImageQuestionAnsweringTool", + "speech_to_text": "SpeechToTextTool", + "text_to_speech": "TextToSpeechTool", + "translation": "TranslationTool", + "python_interpreter": "PythonInterpreterTool", + "web_search": "DuckDuckGoSearchTool", +} + + +def load_tool(task_or_repo_id, model_repo_id=None, token=None, **kwargs): + """ + Main function to quickly load a tool, be it on the Hub or in the Transformers library. + + + + Loading a tool means that you'll download the tool and execute it locally. + ALWAYS inspect the tool you're downloading before loading it within your runtime, as you would do when + installing a package using pip/npm/apt. + + + + Args: + task_or_repo_id (`str`): + The task for which to load the tool or a repo ID of a tool on the Hub. Tasks implemented in Transformers + are: + + - `"document_question_answering"` + - `"image_question_answering"` + - `"speech_to_text"` + - `"text_to_speech"` + - `"translation"` + + model_repo_id (`str`, *optional*): + Use this argument to use a different model than the default one for the tool you selected. + token (`str`, *optional*): + The token to identify you on hf.co. If unset, will use the token generated when running `huggingface-cli + login` (stored in `~/.huggingface`). + kwargs (additional keyword arguments, *optional*): + Additional keyword arguments that will be split in two: all arguments relevant to the Hub (such as + `cache_dir`, `revision`, `subfolder`) will be used when downloading the files for your tool, and the others + will be passed along to its init. + """ + if task_or_repo_id in TOOL_MAPPING: + tool_class_name = TOOL_MAPPING[task_or_repo_id] + main_module = importlib.import_module("transformers") + tools_module = main_module.agents + tool_class = getattr(tools_module, tool_class_name) + return tool_class(model_repo_id, token=token, **kwargs) + else: + logger.warning_once( + f"You're loading a tool from the Hub from {model_repo_id}. Please make sure this is a source that you " + f"trust as the code within that tool will be executed on your machine. Always verify the code of " + f"the tools that you load. We recommend specifying a `revision` to ensure you're loading the " + f"code that you have checked." + ) + return Tool.from_hub(task_or_repo_id, model_repo_id=model_repo_id, token=token, **kwargs) + + +def add_description(description): + """ + A decorator that adds a description to a function. + """ + + def inner(func): + func.description = description + func.name = func.__name__ + return func + + return inner + + +## Will move to the Hub +class EndpointClient: + def __init__(self, endpoint_url: str, token: Optional[str] = None): + self.headers = { + **build_hf_headers(token=token), + "Content-Type": "application/json", + } + self.endpoint_url = endpoint_url + + @staticmethod + def encode_image(image): + _bytes = io.BytesIO() + image.save(_bytes, format="PNG") + b64 = base64.b64encode(_bytes.getvalue()) + return b64.decode("utf-8") + + @staticmethod + def decode_image(raw_image): + if not is_vision_available(): + raise ImportError( + "This tool returned an image but Pillow is not installed. Please install it (`pip install Pillow`)." + ) + + from PIL import Image + + b64 = base64.b64decode(raw_image) + _bytes = io.BytesIO(b64) + return Image.open(_bytes) + + def __call__( + self, + inputs: Optional[Union[str, Dict, List[str], List[List[str]]]] = None, + params: Optional[Dict] = None, + data: Optional[bytes] = None, + output_image: bool = False, + ) -> Any: + # Build payload + payload = {} + if inputs: + payload["inputs"] = inputs + if params: + payload["parameters"] = params + + # Make API call + response = get_session().post(self.endpoint_url, headers=self.headers, json=payload, data=data) + + # By default, parse the response for the user. + if output_image: + return self.decode_image(response.content) + else: + return response.json() + + +class ToolCollection: + """ + Tool collections enable loading all Spaces from a collection in order to be added to the agent's toolbox. + + > [!NOTE] + > Only Spaces will be fetched, so you can feel free to add models and datasets to your collection if you'd + > like for this collection to showcase them. + + Args: + collection_slug (str): + The collection slug referencing the collection. + token (str, *optional*): + The authentication token if the collection is private. + + Example: + + ```py + >>> from transformers import ToolCollection, ReactCodeAgent + + >>> image_tool_collection = ToolCollection(collection_slug="huggingface-tools/diffusion-tools-6630bb19a942c2306a2cdb6f") + >>> agent = ReactCodeAgent(tools=[*image_tool_collection.tools], add_base_tools=True) + + >>> agent.run("Please draw me a picture of rivers and lakes.") + ``` + """ + + def __init__(self, collection_slug: str, token: Optional[str] = None): + self._collection = get_collection(collection_slug, token=token) + self._hub_repo_ids = {item.item_id for item in self._collection.items if item.item_type == "space"} + self.tools = {Tool.from_hub(repo_id) for repo_id in self._hub_repo_ids} + + +def tool(tool_function: Callable) -> Tool: + """ + Converts a function into an instance of a Tool subclass. + + Args: + tool_function: Your function. Should have type hints for each input and a type hint for the output. + Should also have a docstring description including an 'Args:' part where each argument is described. + """ + parameters = get_json_schema(tool_function)["function"] + if "return" not in parameters: + raise TypeHintParsingException("Tool return type not found: make sure your function has a return type hint!") + class_name = f"{parameters['name'].capitalize()}Tool" + + class SpecificTool(Tool): + name = parameters["name"] + description = parameters["description"] + inputs = parameters["parameters"]["properties"] + output_type = parameters["return"]["type"] + + @wraps(tool_function) + def forward(self, *args, **kwargs): + return tool_function(*args, **kwargs) + + original_signature = inspect.signature(tool_function) + new_parameters = [inspect.Parameter("self", inspect.Parameter.POSITIONAL_OR_KEYWORD)] + list( + original_signature.parameters.values() + ) + new_signature = original_signature.replace(parameters=new_parameters) + SpecificTool.forward.__signature__ = new_signature + + SpecificTool.__name__ = class_name + return SpecificTool() diff --git a/agents/translation.py b/agents/translation.py new file mode 100644 index 0000000..7ae61f9 --- /dev/null +++ b/agents/translation.py @@ -0,0 +1,279 @@ +#!/usr/bin/env python +# coding=utf-8 + +# Copyright 2023 The HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from ..models.auto import AutoModelForSeq2SeqLM, AutoTokenizer +from .tools import PipelineTool + + +LANGUAGE_CODES = { + "Acehnese Arabic": "ace_Arab", + "Acehnese Latin": "ace_Latn", + "Mesopotamian Arabic": "acm_Arab", + "Ta'izzi-Adeni Arabic": "acq_Arab", + "Tunisian Arabic": "aeb_Arab", + "Afrikaans": "afr_Latn", + "South Levantine Arabic": "ajp_Arab", + "Akan": "aka_Latn", + "Amharic": "amh_Ethi", + "North Levantine Arabic": "apc_Arab", + "Modern Standard Arabic": "arb_Arab", + "Modern Standard Arabic Romanized": "arb_Latn", + "Najdi Arabic": "ars_Arab", + "Moroccan Arabic": "ary_Arab", + "Egyptian Arabic": "arz_Arab", + "Assamese": "asm_Beng", + "Asturian": "ast_Latn", + "Awadhi": "awa_Deva", + "Central Aymara": "ayr_Latn", + "South Azerbaijani": "azb_Arab", + "North Azerbaijani": "azj_Latn", + "Bashkir": "bak_Cyrl", + "Bambara": "bam_Latn", + "Balinese": "ban_Latn", + "Belarusian": "bel_Cyrl", + "Bemba": "bem_Latn", + "Bengali": "ben_Beng", + "Bhojpuri": "bho_Deva", + "Banjar Arabic": "bjn_Arab", + "Banjar Latin": "bjn_Latn", + "Standard Tibetan": "bod_Tibt", + "Bosnian": "bos_Latn", + "Buginese": "bug_Latn", + "Bulgarian": "bul_Cyrl", + "Catalan": "cat_Latn", + "Cebuano": "ceb_Latn", + "Czech": "ces_Latn", + "Chokwe": "cjk_Latn", + "Central Kurdish": "ckb_Arab", + "Crimean Tatar": "crh_Latn", + "Welsh": "cym_Latn", + "Danish": "dan_Latn", + "German": "deu_Latn", + "Southwestern Dinka": "dik_Latn", + "Dyula": "dyu_Latn", + "Dzongkha": "dzo_Tibt", + "Greek": "ell_Grek", + "English": "eng_Latn", + "Esperanto": "epo_Latn", + "Estonian": "est_Latn", + "Basque": "eus_Latn", + "Ewe": "ewe_Latn", + "Faroese": "fao_Latn", + "Fijian": "fij_Latn", + "Finnish": "fin_Latn", + "Fon": "fon_Latn", + "French": "fra_Latn", + "Friulian": "fur_Latn", + "Nigerian Fulfulde": "fuv_Latn", + "Scottish Gaelic": "gla_Latn", + "Irish": "gle_Latn", + "Galician": "glg_Latn", + "Guarani": "grn_Latn", + "Gujarati": "guj_Gujr", + "Haitian Creole": "hat_Latn", + "Hausa": "hau_Latn", + "Hebrew": "heb_Hebr", + "Hindi": "hin_Deva", + "Chhattisgarhi": "hne_Deva", + "Croatian": "hrv_Latn", + "Hungarian": "hun_Latn", + "Armenian": "hye_Armn", + "Igbo": "ibo_Latn", + "Ilocano": "ilo_Latn", + "Indonesian": "ind_Latn", + "Icelandic": "isl_Latn", + "Italian": "ita_Latn", + "Javanese": "jav_Latn", + "Japanese": "jpn_Jpan", + "Kabyle": "kab_Latn", + "Jingpho": "kac_Latn", + "Kamba": "kam_Latn", + "Kannada": "kan_Knda", + "Kashmiri Arabic": "kas_Arab", + "Kashmiri Devanagari": "kas_Deva", + "Georgian": "kat_Geor", + "Central Kanuri Arabic": "knc_Arab", + "Central Kanuri Latin": "knc_Latn", + "Kazakh": "kaz_Cyrl", + "Kabiyรจ": "kbp_Latn", + "Kabuverdianu": "kea_Latn", + "Khmer": "khm_Khmr", + "Kikuyu": "kik_Latn", + "Kinyarwanda": "kin_Latn", + "Kyrgyz": "kir_Cyrl", + "Kimbundu": "kmb_Latn", + "Northern Kurdish": "kmr_Latn", + "Kikongo": "kon_Latn", + "Korean": "kor_Hang", + "Lao": "lao_Laoo", + "Ligurian": "lij_Latn", + "Limburgish": "lim_Latn", + "Lingala": "lin_Latn", + "Lithuanian": "lit_Latn", + "Lombard": "lmo_Latn", + "Latgalian": "ltg_Latn", + "Luxembourgish": "ltz_Latn", + "Luba-Kasai": "lua_Latn", + "Ganda": "lug_Latn", + "Luo": "luo_Latn", + "Mizo": "lus_Latn", + "Standard Latvian": "lvs_Latn", + "Magahi": "mag_Deva", + "Maithili": "mai_Deva", + "Malayalam": "mal_Mlym", + "Marathi": "mar_Deva", + "Minangkabau Arabic ": "min_Arab", + "Minangkabau Latin": "min_Latn", + "Macedonian": "mkd_Cyrl", + "Plateau Malagasy": "plt_Latn", + "Maltese": "mlt_Latn", + "Meitei Bengali": "mni_Beng", + "Halh Mongolian": "khk_Cyrl", + "Mossi": "mos_Latn", + "Maori": "mri_Latn", + "Burmese": "mya_Mymr", + "Dutch": "nld_Latn", + "Norwegian Nynorsk": "nno_Latn", + "Norwegian Bokmรฅl": "nob_Latn", + "Nepali": "npi_Deva", + "Northern Sotho": "nso_Latn", + "Nuer": "nus_Latn", + "Nyanja": "nya_Latn", + "Occitan": "oci_Latn", + "West Central Oromo": "gaz_Latn", + "Odia": "ory_Orya", + "Pangasinan": "pag_Latn", + "Eastern Panjabi": "pan_Guru", + "Papiamento": "pap_Latn", + "Western Persian": "pes_Arab", + "Polish": "pol_Latn", + "Portuguese": "por_Latn", + "Dari": "prs_Arab", + "Southern Pashto": "pbt_Arab", + "Ayacucho Quechua": "quy_Latn", + "Romanian": "ron_Latn", + "Rundi": "run_Latn", + "Russian": "rus_Cyrl", + "Sango": "sag_Latn", + "Sanskrit": "san_Deva", + "Santali": "sat_Olck", + "Sicilian": "scn_Latn", + "Shan": "shn_Mymr", + "Sinhala": "sin_Sinh", + "Slovak": "slk_Latn", + "Slovenian": "slv_Latn", + "Samoan": "smo_Latn", + "Shona": "sna_Latn", + "Sindhi": "snd_Arab", + "Somali": "som_Latn", + "Southern Sotho": "sot_Latn", + "Spanish": "spa_Latn", + "Tosk Albanian": "als_Latn", + "Sardinian": "srd_Latn", + "Serbian": "srp_Cyrl", + "Swati": "ssw_Latn", + "Sundanese": "sun_Latn", + "Swedish": "swe_Latn", + "Swahili": "swh_Latn", + "Silesian": "szl_Latn", + "Tamil": "tam_Taml", + "Tatar": "tat_Cyrl", + "Telugu": "tel_Telu", + "Tajik": "tgk_Cyrl", + "Tagalog": "tgl_Latn", + "Thai": "tha_Thai", + "Tigrinya": "tir_Ethi", + "Tamasheq Latin": "taq_Latn", + "Tamasheq Tifinagh": "taq_Tfng", + "Tok Pisin": "tpi_Latn", + "Tswana": "tsn_Latn", + "Tsonga": "tso_Latn", + "Turkmen": "tuk_Latn", + "Tumbuka": "tum_Latn", + "Turkish": "tur_Latn", + "Twi": "twi_Latn", + "Central Atlas Tamazight": "tzm_Tfng", + "Uyghur": "uig_Arab", + "Ukrainian": "ukr_Cyrl", + "Umbundu": "umb_Latn", + "Urdu": "urd_Arab", + "Northern Uzbek": "uzn_Latn", + "Venetian": "vec_Latn", + "Vietnamese": "vie_Latn", + "Waray": "war_Latn", + "Wolof": "wol_Latn", + "Xhosa": "xho_Latn", + "Eastern Yiddish": "ydd_Hebr", + "Yoruba": "yor_Latn", + "Yue Chinese": "yue_Hant", + "Chinese Simplified": "zho_Hans", + "Chinese Traditional": "zho_Hant", + "Standard Malay": "zsm_Latn", + "Zulu": "zul_Latn", +} + + +class TranslationTool(PipelineTool): + """ + Example: + + ```py + from transformers.agents import TranslationTool + + translator = TranslationTool() + translator("This is a super nice API!", src_lang="English", tgt_lang="French") + ``` + """ + + lang_to_code = LANGUAGE_CODES + default_checkpoint = "facebook/nllb-200-distilled-600M" + description = ( + "This is a tool that translates text from a language to another." + f"Both `src_lang`and `tgt_lang` should belong to this list of languages: {list(lang_to_code.keys())}." + ) + name = "translator" + pre_processor_class = AutoTokenizer + model_class = AutoModelForSeq2SeqLM + + inputs = { + "text": {"type": "string", "description": "The text to translate"}, + "src_lang": { + "type": "string", + "description": "The language of the text to translate. Written in plain English, such as 'Romanian', or 'Albanian'", + }, + "tgt_lang": { + "type": "string", + "description": "The language for the desired ouput language. Written in plain English, such as 'Romanian', or 'Albanian'", + }, + } + output_type = "string" + + def encode(self, text, src_lang, tgt_lang): + if src_lang not in self.lang_to_code: + raise ValueError(f"{src_lang} is not a supported language.") + if tgt_lang not in self.lang_to_code: + raise ValueError(f"{tgt_lang} is not a supported language.") + src_lang = self.lang_to_code[src_lang] + tgt_lang = self.lang_to_code[tgt_lang] + return self.pre_processor._build_translation_inputs( + text, return_tensors="pt", src_lang=src_lang, tgt_lang=tgt_lang + ) + + def forward(self, inputs): + return self.model.generate(**inputs) + + def decode(self, outputs): + return self.post_processor.decode(outputs[0].tolist(), skip_special_tokens=True) diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000..8879933 --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,19 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +SOURCEDIR = source +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) \ No newline at end of file diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..4c08929 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,267 @@ + + +# Generating the documentation + +To generate the documentation, you first have to build it. Several packages are necessary to build the doc, +you can install them with the following command, at the root of the code repository: + +```bash +pip install -e ".[docs]" +``` + +Then you need to install our special tool that builds the documentation: + +```bash +pip install git+https://github.com/huggingface/doc-builder +``` + +--- +**NOTE** + +You only need to generate the documentation to inspect it locally (if you're planning changes and want to +check how they look before committing for instance). You don't have to commit the built documentation. + +--- + +## Building the documentation + +Once you have setup the `doc-builder` and additional packages, you can generate the documentation by +typing the following command: + +```bash +doc-builder build accelerate docs/source/ --build_dir ~/tmp/test-build +``` + +You can adapt the `--build_dir` to set any temporary folder that you prefer. This command will create it and generate +the MDX files that will be rendered as the documentation on the main website. You can inspect them in your favorite +Markdown editor. + +## Previewing the documentation + +To preview the docs, first install the `watchdog` module with: + +```bash +pip install watchdog +``` + +Then run the following command: + +```bash +doc-builder preview {package_name} {path_to_docs} +``` + +For example: + +```bash +doc-builder preview accelerate docs/source/ +``` + +The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives. + +--- +**NOTE** + +The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again). + +--- + +## Adding a new element to the navigation bar + +Accepted files are Markdown (.md). + +Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting +the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/accelerate/blob/main/docs/source/_toctree.yml) file. + +## Renaming section headers and moving sections + +It helps to keep the old links working when renaming the section header and/or moving sections from one document to another. This is because the old links are likely to be used in Issues, Forums, and Social media and it'd make for a much more superior user experience if users reading those months later could still easily navigate to the originally intended information. + +Therefore, we simply keep a little map of moved sections at the end of the document where the original section was. The key is to preserve the original anchor. + +So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file: + +``` +Sections that were moved: + +[ Section A ] +``` +and of course, if you moved it to another file, then: + +``` +Sections that were moved: + +[ Section A ] +``` + +Use the relative style to link to the new file so that the versioned docs continue to work. + + +## Writing Documentation - Specification + +The `huggingface/accelerate` documentation follows the +[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style for docstrings, +although we can write them directly in Markdown. + +### Adding a new tutorial + +Adding a new tutorial or section is done in two steps: + +- Add a new file under `./source`. This file can either be ReStructuredText (.rst) or Markdown (.md). +- Link that file in `./source/_toctree.yml` on the correct toc-tree. + +Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so +depending on the intended targets (beginners, more advanced users, or researchers) it should go in sections two, three, or +four. + +### Writing source documentation + +Values that should be put in `code` should either be surrounded by backticks: \`like so\`. Note that argument names +and objects like True, None, or any strings should usually be put in `code`. + +When mentioning a class, function, or method, it is recommended to use our syntax for internal links so that our tool +adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or +function to be in the main package. + +If you want to create a link to some internal class or function, you need to +provide its path. For instance: \[\`utils.gather\`\]. This will be converted into a link with +`utils.gather` in the description. To get rid of the path and only keep the name of the object you are +linking to in the description, add a ~: \[\`~utils.gather\`\] will generate a link with `gather` in the description. + +The same works for methods so you can either use \[\`XXXClass.method\`\] or \[~\`XXXClass.method\`\]. + +#### Defining arguments in a method + +Arguments should be defined with the `Args:` (or `Arguments:` or `Parameters:`) prefix, followed by a line return and +an indentation. The argument should be followed by its type, with its shape if it is a tensor, a colon, and its +description: + +``` + Args: + n_layers (`int`): The number of layers of the model. +``` + +If the description is too long to fit in one line (more than 119 characters in total), another indentation is necessary +before writing the description after the argument. + +Finally, to maintain uniformity if any *one* description is too long to fit on one line, the +rest of the parameters should follow suit and have an indention before their description. + +Here's an example showcasing everything so far: + +``` + Args: + gradient_accumulation_steps (`int`, *optional*, default to 1): + The number of steps that should pass before gradients are accumulated. A number > 1 should be combined with `Accelerator.accumulate`. + cpu (`bool`, *optional*): + Whether or not to force the script to execute on CPU. Will ignore GPU available if set to `True` and force the execution on one process only. +``` + +For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the +following signature: + +``` +def my_function(x: str = None, a: float = 1): +``` + +then its documentation should look like this: + +``` + Args: + x (`str`, *optional*): + This argument controls ... and has a description longer than 119 chars. + a (`float`, *optional*, defaults to 1): + This argument is used to ... and has a description longer than 119 chars. +``` + +Note that we always omit the "defaults to \`None\`" when None is the default for any argument. Also note that even +if the first line describing your argument type and its default gets long, you can't break it on several lines. You can +however write as many lines as you want in the indented description (see the example above with `input_ids`). + +#### Writing a multi-line code block + +Multi-line code blocks can be useful for displaying examples. They are done between two lines of three backticks as usual in Markdown: + + +```` +```python +# first line of code +# second line +# etc +``` +```` + +#### Writing a return block + +The return block should be introduced with the `Returns:` prefix, followed by a line return and an indentation. +The first line should be the type of the return, followed by a line return. No need to indent further for the elements +building the return. + +Here's an example of a single value return: + +``` + Returns: + `List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token. +``` + +Here's an example of a tuple return, comprising several objects: + +``` + Returns: + `tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs: + - ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` -- + Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss. + - **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- + Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). +``` + +## Styling the docstring + +We have an automatic script running with the `make style` comment that will make sure that: +- the docstrings fully take advantage of the line width +- all code examples are formatted using black, like the code of the Transformers library + +This script may have some weird failures if you made a syntax mistake or if you uncover a bug. Therefore, it's +recommended to commit your changes before running `make style`, so you can revert the changes done by that script +easily. + +## Writing documentation examples + +The syntax for Example docstrings can look as follows: + +``` + Example: + + ```python + >>> import time + >>> from accelerate import Accelerator + >>> accelerator = Accelerator() + >>> if accelerator.is_main_process: + ... time.sleep(2) + >>> else: + ... print("I'm waiting for the main process to finish its sleep...") + >>> accelerator.wait_for_everyone() + >>> # Should print on every process at the same time + >>> print("Everyone is here") + ``` +``` + +The docstring should give a minimal, clear example of how the respective function +is to be used in inference and also include the expected (ideally sensible) +output. +Often, readers will try out the example before even going through the function +or class definitions. Therefore, it is of utmost importance that the example +works as expected. \ No newline at end of file diff --git a/docs/source/_config.py b/docs/source/_config.py new file mode 100644 index 0000000..f49e4e4 --- /dev/null +++ b/docs/source/_config.py @@ -0,0 +1,14 @@ +# docstyle-ignore +INSTALL_CONTENT = """ +# Transformers installation +! pip install transformers datasets evaluate accelerate +# To install from source instead of the last release, comment the command above and uncomment the following one. +# ! pip install git+https://github.com/huggingface/transformers.git +""" + +notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}] +black_avoid_patterns = { + "{processor_class}": "FakeProcessorClass", + "{model_class}": "FakeModelClass", + "{object_class}": "FakeObjectClass", +} diff --git a/docs/source/_redirects.yml b/docs/source/_redirects.yml new file mode 100644 index 0000000..ff70547 --- /dev/null +++ b/docs/source/_redirects.yml @@ -0,0 +1,5 @@ +# Optimizing inference + +perf_infer_gpu_many: perf_infer_gpu_one +transformers_agents: agents +quantization: quantization/overview diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml new file mode 100644 index 0000000..6e325e4 --- /dev/null +++ b/docs/source/_toctree.yml @@ -0,0 +1,984 @@ +- sections: + - local: index + title: ๐Ÿค— Transformers + - local: quicktour + title: Quick tour + - local: installation + title: Installation + - local: add_new_model + title: Adding a new model to `transformers` + title: Get started +- sections: + - local: pipeline_tutorial + title: Run inference with pipelines + - local: autoclass_tutorial + title: Write portable code with AutoClass + - local: preprocessing + title: Preprocess data + - local: training + title: Fine-tune a pretrained model + - local: run_scripts + title: Train with a script + - local: accelerate + title: Set up distributed training with ๐Ÿค— Accelerate + - local: peft + title: Load and train adapters with ๐Ÿค— PEFT + - local: model_sharing + title: Share your model + - local: agents + title: Agents 101 + - local: agents_advanced + title: Agents, supercharged - Multi-agents, External tools, and more + - local: llm_tutorial + title: Generation with LLMs + - local: conversations + title: Chatting with Transformers + title: Tutorials +- sections: + - isExpanded: false + sections: + - local: tasks/sequence_classification + title: Text classification + - local: tasks/token_classification + title: Token classification + - local: tasks/question_answering + title: Question answering + - local: tasks/language_modeling + title: Causal language modeling + - local: tasks/masked_language_modeling + title: Masked language modeling + - local: tasks/translation + title: Translation + - local: tasks/summarization + title: Summarization + - local: tasks/multiple_choice + title: Multiple choice + title: Natural Language Processing + - isExpanded: false + sections: + - local: tasks/audio_classification + title: Audio classification + - local: tasks/asr + title: Automatic speech recognition + title: Audio + - isExpanded: false + sections: + - local: tasks/image_classification + title: Image classification + - local: tasks/semantic_segmentation + title: Image segmentation + - local: tasks/video_classification + title: Video classification + - local: tasks/object_detection + title: Object detection + - local: tasks/zero_shot_object_detection + title: Zero-shot object detection + - local: tasks/zero_shot_image_classification + title: Zero-shot image classification + - local: tasks/monocular_depth_estimation + title: Depth estimation + - local: tasks/image_to_image + title: Image-to-Image + - local: tasks/image_feature_extraction + title: Image Feature Extraction + - local: tasks/mask_generation + title: Mask Generation + - local: tasks/keypoint_detection + title: Keypoint Detection + - local: tasks/knowledge_distillation_for_image_classification + title: Knowledge Distillation for Computer Vision + title: Computer Vision + - isExpanded: false + sections: + - local: tasks/image_captioning + title: Image captioning + - local: tasks/document_question_answering + title: Document Question Answering + - local: tasks/visual_question_answering + title: Visual Question Answering + - local: tasks/text-to-speech + title: Text to speech + - local: tasks/image_text_to_text + title: Image-text-to-text + - local: tasks/video_text_to_text + title: Video-text-to-text + title: Multimodal + - isExpanded: false + sections: + - local: generation_strategies + title: Customize the generation strategy + - local: kv_cache + title: Best Practices for Generation with Cache + title: Generation + - isExpanded: false + sections: + - local: tasks/idefics + title: Image tasks with IDEFICS + - local: tasks/prompting + title: LLM prompting guide + title: Prompting + title: Task Guides +- sections: + - local: fast_tokenizers + title: Use fast tokenizers from ๐Ÿค— Tokenizers + - local: multilingual + title: Run inference with multilingual models + - local: create_a_model + title: Use model-specific APIs + - local: custom_models + title: Share a custom model + - local: chat_templating + title: Chat templates + - local: trainer + title: Trainer + - local: sagemaker + title: Run training on Amazon SageMaker + - local: serialization + title: Export to ONNX + - local: tflite + title: Export to TFLite + - local: torchscript + title: Export to TorchScript + - local: benchmarks + title: Benchmarks + - local: notebooks + title: Notebooks with examples + - local: community + title: Community resources + - local: troubleshooting + title: Troubleshoot + - local: gguf + title: Interoperability with GGUF files + - local: tiktoken + title: Interoperability with TikToken files + - local: modular_transformers + title: Modularity in `transformers` + - local: how_to_hack_models + title: Model Hacking (overwriting a class to your usage) + title: Developer guides +- sections: + - local: quantization/overview + title: Getting started + - local: quantization/bitsandbytes + title: bitsandbytes + - local: quantization/gptq + title: GPTQ + - local: quantization/awq + title: AWQ + - local: quantization/aqlm + title: AQLM + - local: quantization/quanto + title: Quanto + - local: quantization/eetq + title: EETQ + - local: quantization/hqq + title: HQQ + - local: quantization/fbgemm_fp8 + title: FBGEMM_FP8 + - local: quantization/optimum + title: Optimum + - local: quantization/torchao + title: TorchAO + - local: quantization/bitnet + title: BitNet + - local: quantization/compressed_tensors + title: compressed-tensors + - local: quantization/contribute + title: Contribute new quantization method + title: Quantization Methods +- sections: + - local: performance + title: Overview + - local: llm_optims + title: LLM inference optimization + - sections: + - local: perf_train_gpu_one + title: Methods and tools for efficient training on a single GPU + - local: perf_train_gpu_many + title: Multiple GPUs and parallelism + - local: fsdp + title: Fully Sharded Data Parallel + - local: deepspeed + title: DeepSpeed + - local: perf_train_cpu + title: Efficient training on CPU + - local: perf_train_cpu_many + title: Distributed CPU training + - local: perf_train_tpu_tf + title: Training on TPU with TensorFlow + - local: perf_train_special + title: PyTorch training on Apple silicon + - local: perf_hardware + title: Custom hardware for training + - local: hpo_train + title: Hyperparameter Search using Trainer API + title: Efficient training techniques + - sections: + - local: perf_infer_cpu + title: CPU inference + - local: perf_infer_gpu_one + title: GPU inference + - local: perf_infer_gpu_multi + title: Multi-GPU inference + title: Optimizing inference + - local: big_models + title: Instantiate a big model + - local: debugging + title: Debugging + - local: tf_xla + title: XLA Integration for TensorFlow Models + - local: perf_torch_compile + title: Optimize inference using `torch.compile()` + title: Performance and scalability +- sections: + - local: contributing + title: How to contribute to ๐Ÿค— Transformers? + - local: add_new_model + title: How to add a model to ๐Ÿค— Transformers? + - local: add_new_pipeline + title: How to add a pipeline to ๐Ÿค— Transformers? + - local: testing + title: Testing + - local: pr_checks + title: Checks on a Pull Request + title: Contribute +- sections: + - local: philosophy + title: Philosophy + - local: glossary + title: Glossary + - local: task_summary + title: What ๐Ÿค— Transformers can do + - local: tasks_explained + title: How ๐Ÿค— Transformers solve tasks + - local: model_summary + title: The Transformer model family + - local: tokenizer_summary + title: Summary of the tokenizers + - local: attention + title: Attention mechanisms + - local: pad_truncation + title: Padding and truncation + - local: bertology + title: BERTology + - local: perplexity + title: Perplexity of fixed-length models + - local: pipeline_webserver + title: Pipelines for webserver inference + - local: model_memory_anatomy + title: Model training anatomy + - local: llm_tutorial_optimization + title: Getting the most out of LLMs + title: Conceptual guides +- sections: + - sections: + - local: main_classes/agent + title: Agents and Tools + - local: model_doc/auto + title: Auto Classes + - local: main_classes/backbones + title: Backbones + - local: main_classes/callback + title: Callbacks + - local: main_classes/configuration + title: Configuration + - local: main_classes/data_collator + title: Data Collator + - local: main_classes/keras_callbacks + title: Keras callbacks + - local: main_classes/logging + title: Logging + - local: main_classes/model + title: Models + - local: main_classes/text_generation + title: Text Generation + - local: main_classes/onnx + title: ONNX + - local: main_classes/optimizer_schedules + title: Optimization + - local: main_classes/output + title: Model outputs + - local: main_classes/pipelines + title: Pipelines + - local: main_classes/processors + title: Processors + - local: main_classes/quantization + title: Quantization + - local: main_classes/tokenizer + title: Tokenizer + - local: main_classes/trainer + title: Trainer + - local: main_classes/deepspeed + title: DeepSpeed + - local: main_classes/executorch + title: ExecuTorch + - local: main_classes/feature_extractor + title: Feature Extractor + - local: main_classes/image_processor + title: Image Processor + title: Main Classes + - sections: + - isExpanded: false + sections: + - local: model_doc/albert + title: ALBERT + - local: model_doc/bart + title: BART + - local: model_doc/barthez + title: BARThez + - local: model_doc/bartpho + title: BARTpho + - local: model_doc/bert + title: BERT + - local: model_doc/bert-generation + title: BertGeneration + - local: model_doc/bert-japanese + title: BertJapanese + - local: model_doc/bertweet + title: Bertweet + - local: model_doc/big_bird + title: BigBird + - local: model_doc/bigbird_pegasus + title: BigBirdPegasus + - local: model_doc/biogpt + title: BioGpt + - local: model_doc/blenderbot + title: Blenderbot + - local: model_doc/blenderbot-small + title: Blenderbot Small + - local: model_doc/bloom + title: BLOOM + - local: model_doc/bort + title: BORT + - local: model_doc/byt5 + title: ByT5 + - local: model_doc/camembert + title: CamemBERT + - local: model_doc/canine + title: CANINE + - local: model_doc/codegen + title: CodeGen + - local: model_doc/code_llama + title: CodeLlama + - local: model_doc/cohere + title: Cohere + - local: model_doc/convbert + title: ConvBERT + - local: model_doc/cpm + title: CPM + - local: model_doc/cpmant + title: CPMANT + - local: model_doc/ctrl + title: CTRL + - local: model_doc/dbrx + title: DBRX + - local: model_doc/deberta + title: DeBERTa + - local: model_doc/deberta-v2 + title: DeBERTa-v2 + - local: model_doc/dialogpt + title: DialoGPT + - local: model_doc/distilbert + title: DistilBERT + - local: model_doc/dpr + title: DPR + - local: model_doc/electra + title: ELECTRA + - local: model_doc/encoder-decoder + title: Encoder Decoder Models + - local: model_doc/ernie + title: ERNIE + - local: model_doc/ernie_m + title: ErnieM + - local: model_doc/esm + title: ESM + - local: model_doc/falcon + title: Falcon + - local: model_doc/falcon_mamba + title: FalconMamba + - local: model_doc/fastspeech2_conformer + title: FastSpeech2Conformer + - local: model_doc/flan-t5 + title: FLAN-T5 + - local: model_doc/flan-ul2 + title: FLAN-UL2 + - local: model_doc/flaubert + title: FlauBERT + - local: model_doc/fnet + title: FNet + - local: model_doc/fsmt + title: FSMT + - local: model_doc/funnel + title: Funnel Transformer + - local: model_doc/fuyu + title: Fuyu + - local: model_doc/gemma + title: Gemma + - local: model_doc/gemma2 + title: Gemma2 + - local: model_doc/glm + title: GLM + - local: model_doc/openai-gpt + title: GPT + - local: model_doc/gpt_neo + title: GPT Neo + - local: model_doc/gpt_neox + title: GPT NeoX + - local: model_doc/gpt_neox_japanese + title: GPT NeoX Japanese + - local: model_doc/gptj + title: GPT-J + - local: model_doc/gpt2 + title: GPT2 + - local: model_doc/gpt_bigcode + title: GPTBigCode + - local: model_doc/gptsan-japanese + title: GPTSAN Japanese + - local: model_doc/gpt-sw3 + title: GPTSw3 + - local: model_doc/granite + title: Granite + - local: model_doc/granitemoe + title: GraniteMoe + - local: model_doc/herbert + title: HerBERT + - local: model_doc/ibert + title: I-BERT + - local: model_doc/jamba + title: Jamba + - local: model_doc/jetmoe + title: JetMoe + - local: model_doc/jukebox + title: Jukebox + - local: model_doc/led + title: LED + - local: model_doc/llama + title: LLaMA + - local: model_doc/llama2 + title: Llama2 + - local: model_doc/llama3 + title: Llama3 + - local: model_doc/longformer + title: Longformer + - local: model_doc/longt5 + title: LongT5 + - local: model_doc/luke + title: LUKE + - local: model_doc/m2m_100 + title: M2M100 + - local: model_doc/madlad-400 + title: MADLAD-400 + - local: model_doc/mamba + title: Mamba + - local: model_doc/mamba2 + title: mamba2 + - local: model_doc/marian + title: MarianMT + - local: model_doc/markuplm + title: MarkupLM + - local: model_doc/mbart + title: MBart and MBart-50 + - local: model_doc/mega + title: MEGA + - local: model_doc/megatron-bert + title: MegatronBERT + - local: model_doc/megatron_gpt2 + title: MegatronGPT2 + - local: model_doc/mistral + title: Mistral + - local: model_doc/mixtral + title: Mixtral + - local: model_doc/mluke + title: mLUKE + - local: model_doc/mobilebert + title: MobileBERT + - local: model_doc/mpnet + title: MPNet + - local: model_doc/mpt + title: MPT + - local: model_doc/mra + title: MRA + - local: model_doc/mt5 + title: MT5 + - local: model_doc/mvp + title: MVP + - local: model_doc/myt5 + title: myt5 + - local: model_doc/nemotron + title: Nemotron + - local: model_doc/nezha + title: NEZHA + - local: model_doc/nllb + title: NLLB + - local: model_doc/nllb-moe + title: NLLB-MoE + - local: model_doc/nystromformer + title: Nystrรถmformer + - local: model_doc/olmo + title: OLMo + - local: model_doc/olmo2 + title: OLMo2 + - local: model_doc/olmoe + title: OLMoE + - local: model_doc/open-llama + title: Open-Llama + - local: model_doc/opt + title: OPT + - local: model_doc/pegasus + title: Pegasus + - local: model_doc/pegasus_x + title: PEGASUS-X + - local: model_doc/persimmon + title: Persimmon + - local: model_doc/phi + title: Phi + - local: model_doc/phi3 + title: Phi-3 + - local: model_doc/phimoe + title: PhiMoE + - local: model_doc/phobert + title: PhoBERT + - local: model_doc/plbart + title: PLBart + - local: model_doc/prophetnet + title: ProphetNet + - local: model_doc/qdqbert + title: QDQBert + - local: model_doc/qwen2 + title: Qwen2 + - local: model_doc/qwen2_moe + title: Qwen2MoE + - local: model_doc/rag + title: RAG + - local: model_doc/realm + title: REALM + - local: model_doc/recurrent_gemma + title: RecurrentGemma + - local: model_doc/reformer + title: Reformer + - local: model_doc/rembert + title: RemBERT + - local: model_doc/retribert + title: RetriBERT + - local: model_doc/roberta + title: RoBERTa + - local: model_doc/roberta-prelayernorm + title: RoBERTa-PreLayerNorm + - local: model_doc/roc_bert + title: RoCBert + - local: model_doc/roformer + title: RoFormer + - local: model_doc/rwkv + title: RWKV + - local: model_doc/splinter + title: Splinter + - local: model_doc/squeezebert + title: SqueezeBERT + - local: model_doc/stablelm + title: StableLm + - local: model_doc/starcoder2 + title: Starcoder2 + - local: model_doc/switch_transformers + title: SwitchTransformers + - local: model_doc/t5 + title: T5 + - local: model_doc/t5v1.1 + title: T5v1.1 + - local: model_doc/tapex + title: TAPEX + - local: model_doc/transfo-xl + title: Transformer XL + - local: model_doc/ul2 + title: UL2 + - local: model_doc/umt5 + title: UMT5 + - local: model_doc/xmod + title: X-MOD + - local: model_doc/xglm + title: XGLM + - local: model_doc/xlm + title: XLM + - local: model_doc/xlm-prophetnet + title: XLM-ProphetNet + - local: model_doc/xlm-roberta + title: XLM-RoBERTa + - local: model_doc/xlm-roberta-xl + title: XLM-RoBERTa-XL + - local: model_doc/xlm-v + title: XLM-V + - local: model_doc/xlnet + title: XLNet + - local: model_doc/yoso + title: YOSO + - local: model_doc/zamba + title: Zamba + title: Text models + - isExpanded: false + sections: + - local: model_doc/beit + title: BEiT + - local: model_doc/bit + title: BiT + - local: model_doc/conditional_detr + title: Conditional DETR + - local: model_doc/convnext + title: ConvNeXT + - local: model_doc/convnextv2 + title: ConvNeXTV2 + - local: model_doc/cvt + title: CvT + - local: model_doc/deformable_detr + title: Deformable DETR + - local: model_doc/deit + title: DeiT + - local: model_doc/depth_anything + title: Depth Anything + - local: model_doc/depth_anything_v2 + title: Depth Anything V2 + - local: model_doc/deta + title: DETA + - local: model_doc/detr + title: DETR + - local: model_doc/dinat + title: DiNAT + - local: model_doc/dinov2 + title: DINOV2 + - local: model_doc/dit + title: DiT + - local: model_doc/dpt + title: DPT + - local: model_doc/efficientformer + title: EfficientFormer + - local: model_doc/efficientnet + title: EfficientNet + - local: model_doc/focalnet + title: FocalNet + - local: model_doc/glpn + title: GLPN + - local: model_doc/hiera + title: Hiera + - local: model_doc/ijepa + title: I-JEPA + - local: model_doc/imagegpt + title: ImageGPT + - local: model_doc/levit + title: LeViT + - local: model_doc/mask2former + title: Mask2Former + - local: model_doc/maskformer + title: MaskFormer + - local: model_doc/mobilenet_v1 + title: MobileNetV1 + - local: model_doc/mobilenet_v2 + title: MobileNetV2 + - local: model_doc/mobilevit + title: MobileViT + - local: model_doc/mobilevitv2 + title: MobileViTV2 + - local: model_doc/nat + title: NAT + - local: model_doc/poolformer + title: PoolFormer + - local: model_doc/pvt + title: Pyramid Vision Transformer (PVT) + - local: model_doc/pvt_v2 + title: Pyramid Vision Transformer v2 (PVTv2) + - local: model_doc/regnet + title: RegNet + - local: model_doc/resnet + title: ResNet + - local: model_doc/rt_detr + title: RT-DETR + - local: model_doc/segformer + title: SegFormer + - local: model_doc/seggpt + title: SegGpt + - local: model_doc/superpoint + title: SuperPoint + - local: model_doc/swiftformer + title: SwiftFormer + - local: model_doc/swin + title: Swin Transformer + - local: model_doc/swinv2 + title: Swin Transformer V2 + - local: model_doc/swin2sr + title: Swin2SR + - local: model_doc/table-transformer + title: Table Transformer + - local: model_doc/upernet + title: UperNet + - local: model_doc/van + title: VAN + - local: model_doc/vit + title: Vision Transformer (ViT) + - local: model_doc/vit_hybrid + title: ViT Hybrid + - local: model_doc/vitdet + title: ViTDet + - local: model_doc/vit_mae + title: ViTMAE + - local: model_doc/vitmatte + title: ViTMatte + - local: model_doc/vit_msn + title: ViTMSN + - local: model_doc/yolos + title: YOLOS + - local: model_doc/zoedepth + title: ZoeDepth + title: Vision models + - isExpanded: false + sections: + - local: model_doc/audio-spectrogram-transformer + title: Audio Spectrogram Transformer + - local: model_doc/bark + title: Bark + - local: model_doc/clap + title: CLAP + - local: model_doc/dac + title: dac + - local: model_doc/encodec + title: EnCodec + - local: model_doc/hiera + title: Hiera + - local: model_doc/hubert + title: Hubert + - local: model_doc/mctct + title: MCTCT + - local: model_doc/mimi + title: Mimi + - local: model_doc/mms + title: MMS + - local: model_doc/moshi + title: Moshi + - local: model_doc/musicgen + title: MusicGen + - local: model_doc/musicgen_melody + title: MusicGen Melody + - local: model_doc/pop2piano + title: Pop2Piano + - local: model_doc/seamless_m4t + title: Seamless-M4T + - local: model_doc/seamless_m4t_v2 + title: SeamlessM4T-v2 + - local: model_doc/sew + title: SEW + - local: model_doc/sew-d + title: SEW-D + - local: model_doc/speech_to_text + title: Speech2Text + - local: model_doc/speech_to_text_2 + title: Speech2Text2 + - local: model_doc/speecht5 + title: SpeechT5 + - local: model_doc/unispeech + title: UniSpeech + - local: model_doc/unispeech-sat + title: UniSpeech-SAT + - local: model_doc/univnet + title: UnivNet + - local: model_doc/vits + title: VITS + - local: model_doc/wav2vec2 + title: Wav2Vec2 + - local: model_doc/wav2vec2-bert + title: Wav2Vec2-BERT + - local: model_doc/wav2vec2-conformer + title: Wav2Vec2-Conformer + - local: model_doc/wav2vec2_phoneme + title: Wav2Vec2Phoneme + - local: model_doc/wavlm + title: WavLM + - local: model_doc/whisper + title: Whisper + - local: model_doc/xls_r + title: XLS-R + - local: model_doc/xlsr_wav2vec2 + title: XLSR-Wav2Vec2 + title: Audio models + - isExpanded: false + sections: + - local: model_doc/timesformer + title: TimeSformer + - local: model_doc/videomae + title: VideoMAE + - local: model_doc/vivit + title: ViViT + title: Video models + - isExpanded: false + sections: + - local: model_doc/align + title: ALIGN + - local: model_doc/altclip + title: AltCLIP + - local: model_doc/aria + title: Aria + - local: model_doc/blip + title: BLIP + - local: model_doc/blip-2 + title: BLIP-2 + - local: model_doc/bridgetower + title: BridgeTower + - local: model_doc/bros + title: BROS + - local: model_doc/chameleon + title: Chameleon + - local: model_doc/chinese_clip + title: Chinese-CLIP + - local: model_doc/clip + title: CLIP + - local: model_doc/clipseg + title: CLIPSeg + - local: model_doc/clvp + title: CLVP + - local: model_doc/data2vec + title: Data2Vec + - local: model_doc/deplot + title: DePlot + - local: model_doc/donut + title: Donut + - local: model_doc/flava + title: FLAVA + - local: model_doc/git + title: GIT + - local: model_doc/grounding-dino + title: Grounding DINO + - local: model_doc/groupvit + title: GroupViT + - local: model_doc/idefics + title: IDEFICS + - local: model_doc/idefics2 + title: Idefics2 + - local: model_doc/idefics3 + title: Idefics3 + - local: model_doc/instructblip + title: InstructBLIP + - local: model_doc/instructblipvideo + title: InstructBlipVideo + - local: model_doc/kosmos-2 + title: KOSMOS-2 + - local: model_doc/layoutlm + title: LayoutLM + - local: model_doc/layoutlmv2 + title: LayoutLMV2 + - local: model_doc/layoutlmv3 + title: LayoutLMV3 + - local: model_doc/layoutxlm + title: LayoutXLM + - local: model_doc/lilt + title: LiLT + - local: model_doc/llava + title: Llava + - local: model_doc/llava_next + title: LLaVA-NeXT + - local: model_doc/llava_next_video + title: LLaVa-NeXT-Video + - local: model_doc/llava_onevision + title: LLaVA-Onevision + - local: model_doc/lxmert + title: LXMERT + - local: model_doc/matcha + title: MatCha + - local: model_doc/mgp-str + title: MGP-STR + - local: model_doc/mllama + title: mllama + - local: model_doc/nougat + title: Nougat + - local: model_doc/omdet-turbo + title: OmDet-Turbo + - local: model_doc/oneformer + title: OneFormer + - local: model_doc/owlvit + title: OWL-ViT + - local: model_doc/owlv2 + title: OWLv2 + - local: model_doc/paligemma + title: PaliGemma + - local: model_doc/perceiver + title: Perceiver + - local: model_doc/pix2struct + title: Pix2Struct + - local: model_doc/pixtral + title: Pixtral + - local: model_doc/qwen2_audio + title: Qwen2Audio + - local: model_doc/qwen2_vl + title: Qwen2VL + - local: model_doc/sam + title: Segment Anything + - local: model_doc/siglip + title: SigLIP + - local: model_doc/speech-encoder-decoder + title: Speech Encoder Decoder Models + - local: model_doc/tapas + title: TAPAS + - local: model_doc/trocr + title: TrOCR + - local: model_doc/tvlt + title: TVLT + - local: model_doc/tvp + title: TVP + - local: model_doc/udop + title: UDOP + - local: model_doc/video_llava + title: VideoLlava + - local: model_doc/vilt + title: ViLT + - local: model_doc/vipllava + title: VipLlava + - local: model_doc/vision-encoder-decoder + title: Vision Encoder Decoder Models + - local: model_doc/vision-text-dual-encoder + title: Vision Text Dual Encoder + - local: model_doc/visual_bert + title: VisualBERT + - local: model_doc/xclip + title: X-CLIP + title: Multimodal models + - isExpanded: false + sections: + - local: model_doc/decision_transformer + title: Decision Transformer + - local: model_doc/trajectory_transformer + title: Trajectory Transformer + title: Reinforcement learning models + - isExpanded: false + sections: + - local: model_doc/autoformer + title: Autoformer + - local: model_doc/informer + title: Informer + - local: model_doc/patchtsmixer + title: PatchTSMixer + - local: model_doc/patchtst + title: PatchTST + - local: model_doc/time_series_transformer + title: Time Series Transformer + title: Time series models + - isExpanded: false + sections: + - local: model_doc/graphormer + title: Graphormer + title: Graph models + title: Models + - sections: + - local: internal/modeling_utils + title: Custom Layers and Utilities + - local: internal/pipelines_utils + title: Utilities for pipelines + - local: internal/tokenization_utils + title: Utilities for Tokenizers + - local: internal/trainer_utils + title: Utilities for Trainer + - local: internal/generation_utils + title: Utilities for Generation + - local: internal/image_processing_utils + title: Utilities for Image Processors + - local: internal/audio_utils + title: Utilities for Audio processing + - local: internal/file_utils + title: General Utilities + - local: internal/time_series_utils + title: Utilities for Time Series + title: Internal Helpers + title: API diff --git a/docs/source/agents.md b/docs/source/agents.md new file mode 100644 index 0000000..56c9184 --- /dev/null +++ b/docs/source/agents.md @@ -0,0 +1,431 @@ + +# Agents and tools + +[[open-in-colab]] + +### What is an agent? + +Large Language Models (LLMs) trained to perform [causal language modeling](./tasks/language_modeling) can tackle a wide range of tasks, but they often struggle with basic tasks like logic, calculation, and search. When prompted in domains in which they do not perform well, they often fail to generate the answer we expect them to. + +One approach to overcome this weakness is to create an *agent*. + +An agent is a system that uses an LLM as its engine, and it has access to functions called *tools*. + +These *tools* are functions for performing a task, and they contain all necessary description for the agent to properly use them. + +The agent can be programmed to: +- devise a series of actions/tools and run them all at once, like the [`CodeAgent`] +- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one, like the [`ReactJsonAgent`] + +### Types of agents + +#### Code agent + +This agent has a planning step, then generates python code to execute all its actions at once. It natively handles different input and output types for its tools, thus it is the recommended choice for multimodal tasks. + +#### React agents + +This is the go-to agent to solve reasoning tasks, since the ReAct framework ([Yao et al., 2022](https://huggingface.co/papers/2210.03629)) makes it really efficient to think on the basis of its previous observations. + +We implement two versions of ReactJsonAgent: +- [`ReactJsonAgent`] generates tool calls as a JSON in its output. +- [`ReactCodeAgent`] is a new type of ReactJsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance. + +> [!TIP] +> Read [Open-source LLMs as LangChain Agents](https://huggingface.co/blog/open-source-llms-as-agents) blog post to learn more about ReAct agents. + +
+ + +
+ +![Framework of a React Agent](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/open-source-llms-as-agents/ReAct.png) + +For example, here is how a ReAct Code agent would work its way through the following question. + +```py3 +>>> agent.run( +... "How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?", +... ) +=====New task===== +How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need? +====Agent is executing the code below: +bert_blocks = search(query="number of blocks in BERT base encoder") +print("BERT blocks:", bert_blocks) +==== +Print outputs: +BERT blocks: twelve encoder blocks + +====Agent is executing the code below: +attention_layer = search(query="number of layers in Attention is All You Need") +print("Attention layers:", attention_layer) +==== +Print outputs: +Attention layers: Encoder: The encoder is composed of a stack of N = 6 identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position- 2 Page 3 Figure 1: The Transformer - model architecture. + +====Agent is executing the code below: +bert_blocks = 12 +attention_layers = 6 +diff = bert_blocks - attention_layers +print("Difference in blocks:", diff) +final_answer(diff) +==== + +Print outputs: +Difference in blocks: 6 + +Final answer: 6 +``` + +### How can I build an agent? + +To initialize an agent, you need these arguments: + +- an LLM to power your agent - the agent is not exactly the LLM, itโ€™s more like the agent is a program that uses an LLM as its engine. +- a system prompt: what the LLM engine will be prompted with to generate its output +- a toolbox from which the agent pick tools to execute +- a parser to extract from the LLM output which tools are to call and with which arguments + +Upon initialization of the agent system, the tool attributes are used to generate a tool description, then baked into the agentโ€™s `system_prompt` to let it know which tools it can use and why. + +To start with, please install the `agents` extras in order to install all default dependencies. + +```bash +pip install transformers[agents] +``` + +Build your LLM engine by defining a `llm_engine` method which accepts a list of [messages](./chat_templating) and returns text. This callable also needs to accept a `stop` argument that indicates when to stop generating. + +```python +from huggingface_hub import login, InferenceClient + +login("") + +client = InferenceClient(model="meta-llama/Meta-Llama-3-70B-Instruct") + +def llm_engine(messages, stop_sequences=["Task"]) -> str: + response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000) + answer = response.choices[0].message.content + return answer +``` + +You could use any `llm_engine` method as long as: +1. it follows the [messages format](./chat_templating) (`List[Dict[str, str]]`) for its input `messages`, and it returns a `str`. +2. it stops generating outputs at the sequences passed in the argument `stop_sequences` + +Additionally, `llm_engine` can also take a `grammar` argument. In the case where you specify a `grammar` upon agent initialization, this argument will be passed to the calls to llm_engine, with the `grammar` that you defined upon initialization, to allow [constrained generation](https://huggingface.co/docs/text-generation-inference/conceptual/guidance) in order to force properly-formatted agent outputs. + +You will also need a `tools` argument which accepts a list of `Tools` - it can be an empty list. You can also add the default toolbox on top of your `tools` list by defining the optional argument `add_base_tools=True`. + +Now you can create an agent, like [`CodeAgent`], and run it. You can also create a [`TransformersEngine`] with a pre-initialized pipeline to run inference on your local machine using `transformers`. +For convenience, since agentic behaviours generally require stronger models such as `Llama-3.1-70B-Instruct` that are harder to run locally for now, we also provide the [`HfApiEngine`] class that initializes a `huggingface_hub.InferenceClient` under the hood. + +```python +from transformers import CodeAgent, HfApiEngine + +llm_engine = HfApiEngine(model="meta-llama/Meta-Llama-3-70B-Instruct") +agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True) + +agent.run( + "Could you translate this sentence from French, say it out loud and return the audio.", + sentence="Oรน est la boulangerie la plus proche?", +) +``` + +This will be handy in case of emergency baguette need! +You can even leave the argument `llm_engine` undefined, and an [`HfApiEngine`] will be created by default. + +```python +from transformers import CodeAgent + +agent = CodeAgent(tools=[], add_base_tools=True) + +agent.run( + "Could you translate this sentence from French, say it out loud and give me the audio.", + sentence="Oรน est la boulangerie la plus proche?", +) +``` + +Note that we used an additional `sentence` argument: you can pass text as additional arguments to the model. + +You can also use this to indicate the path to local or remote files for the model to use: + +```py +from transformers import ReactCodeAgent + +agent = ReactCodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True) + +agent.run("Why does Mike not know many people in New York?", audio="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/recording.mp3") +``` + + +The prompt and output parser were automatically defined, but you can easily inspect them by calling the `system_prompt_template` on your agent. + +```python +print(agent.system_prompt_template) +``` + +It's important to explain as clearly as possible the task you want to perform. +Every [`~Agent.run`] operation is independent, and since an agent is powered by an LLM, minor variations in your prompt might yield completely different results. +You can also run an agent consecutively for different tasks: each time the attributes `agent.task` and `agent.logs` will be re-initialized. + + +#### Code execution + +A Python interpreter executes the code on a set of inputs passed along with your tools. +This should be safe because the only functions that can be called are the tools you provided (especially if it's only tools by Hugging Face) and the print function, so you're already limited in what can be executed. + +The Python interpreter also doesn't allow imports by default outside of a safe list, so all the most obvious attacks shouldn't be an issue. +You can still authorize additional imports by passing the authorized modules as a list of strings in argument `additional_authorized_imports` upon initialization of your [`ReactCodeAgent`] or [`CodeAgent`]: + +```py +>>> from transformers import ReactCodeAgent + +>>> agent = ReactCodeAgent(tools=[], additional_authorized_imports=['requests', 'bs4']) +>>> agent.run("Could you get me the title of the page at url 'https://huggingface.co/blog'?") + +(...) +'Hugging Face โ€“ Blog' +``` + +The execution will stop at any code trying to perform an illegal operation or if there is a regular Python error with the code generated by the agent. + +> [!WARNING] +> The LLM can generate arbitrary code that will then be executed: do not add any unsafe imports! + +### The system prompt + +An agent, or rather the LLM that drives the agent, generates an output based on the system prompt. The system prompt can be customized and tailored to the intended task. For example, check the system prompt for the [`ReactCodeAgent`] (below version is slightly simplified). + +```text +You will be given a task to solve as best you can. +You have access to the following tools: +<> + +To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences. + +At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task, then the tools that you want to use. +Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '/End code' sequence. +During each intermediate step, you can use 'print()' to save whatever important information you will then need. +These print outputs will then be available in the 'Observation:' field, for using this information as input for the next step. + +In the end you have to return a final answer using the `final_answer` tool. + +Here are a few examples using notional tools: +--- +{examples} + +Above example were using notional tools that might not exist for you. You only have acces to those tools: +<> +You also can perform computations in the python code you generate. + +Always provide a 'Thought:' and a 'Code:\n```py' sequence ending with '```' sequence. You MUST provide at least the 'Code:' sequence to move forward. + +Remember to not perform too many operations in a single code block! You should split the task into intermediate code blocks. +Print results at the end of each step to save the intermediate results. Then use final_answer() to return the final result. + +Remember to make sure that variables you use are all defined. + +Now Begin! +``` + +The system prompt includes: +- An *introduction* that explains how the agent should behave and what tools are. +- A description of all the tools that is defined by a `<>` token that is dynamically replaced at runtime with the tools defined/chosen by the user. + - The tool description comes from the tool attributes, `name`, `description`, `inputs` and `output_type`, and a simple `jinja2` template that you can refine. +- The expected output format. + +You could improve the system prompt, for example, by adding an explanation of the output format. + +For maximum flexibility, you can overwrite the whole system prompt template by passing your custom prompt as an argument to the `system_prompt` parameter. + +```python +from transformers import ReactJsonAgent +from transformers.agents import PythonInterpreterTool + +agent = ReactJsonAgent(tools=[PythonInterpreterTool()], system_prompt="{your_custom_prompt}") +``` + +> [!WARNING] +> Please make sure to define the `<>` string somewhere in the `template` so the agent is aware +of the available tools. + + +### Inspecting an agent run + +Here are a few useful attributes to inspect what happened after a run: +- `agent.logs` stores the fine-grained logs of the agent. At every step of the agent's run, everything gets stored in a dictionary that then is appended to `agent.logs`. +- Running `agent.write_inner_memory_from_logs()` creates an inner memory of the agent's logs for the LLM to view, as a list of chat messages. This method goes over each step of the log and only stores what it's interested in as a message: for instance, it will save the system prompt and task in separate messages, then for each step it will store the LLM output as a message, and the tool call output as another message. Use this if you want a higher-level view of what has happened - but not every log will be transcripted by this method. + +## Tools + +A tool is an atomic function to be used by an agent. + +You can for instance check the [`PythonInterpreterTool`]: it has a name, a description, input descriptions, an output type, and a `__call__` method to perform the action. + +When the agent is initialized, the tool attributes are used to generate a tool description which is baked into the agent's system prompt. This lets the agent know which tools it can use and why. + +### Default toolbox + +Transformers comes with a default toolbox for empowering agents, that you can add to your agent upon initialization with argument `add_base_tools = True`: + +- **Document question answering**: given a document (such as a PDF) in image format, answer a question on this document ([Donut](./model_doc/donut)) +- **Image question answering**: given an image, answer a question on this image ([VILT](./model_doc/vilt)) +- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](./model_doc/whisper)) +- **Text to speech**: convert text to speech ([SpeechT5](./model_doc/speecht5)) +- **Translation**: translates a given sentence from source language to target language. +- **DuckDuckGo search***: performs a web search using DuckDuckGo browser. +- **Python code interpreter**: runs your the LLM generated Python code in a secure environment. This tool will only be added to [`ReactJsonAgent`] if you initialize it with `add_base_tools=True`, since code-based agent can already natively execute Python code + + +You can manually use a tool by calling the [`load_tool`] function and a task to perform. + + +```python +from transformers import load_tool + +tool = load_tool("text-to-speech") +audio = tool("This is a text to speech tool") +``` + + +### Create a new tool + +You can create your own tool for use cases not covered by the default tools from Hugging Face. +For example, let's create a tool that returns the most downloaded model for a given task from the Hub. + +You'll start with the code below. + +```python +from huggingface_hub import list_models + +task = "text-classification" + +model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) +print(model.id) +``` + +This code can quickly be converted into a tool, just by wrapping it in a function and adding the `tool` decorator: + + +```py +from transformers import tool + +@tool +def model_download_tool(task: str) -> str: + """ + This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. + It returns the name of the checkpoint. + + Args: + task: The task for which + """ + model = next(iter(list_models(filter="text-classification", sort="downloads", direction=-1))) + return model.id +``` + +The function needs: +- A clear name. The name usually describes what the tool does. Since the code returns the model with the most downloads for a task, let's put `model_download_tool`. +- Type hints on both inputs and output +- A description, that includes an 'Args:' part where each argument is described (without a type indication this time, it will be pulled from the type hint). +All these will be automatically baked into the agent's system prompt upon initialization: so strive to make them as clear as possible! + +> [!TIP] +> This definition format is the same as tool schemas used in `apply_chat_template`, the only difference is the added `tool` decorator: read more on our tool use API [here](https://huggingface.co/blog/unified-tool-use#passing-tools-to-a-chat-template). + +Then you can directly initialize your agent: +```py +from transformers import CodeAgent +agent = CodeAgent(tools=[model_download_tool], llm_engine=llm_engine) +agent.run( + "Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?" +) +``` + +You get the following: +```text +======== New task ======== +Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub? +==== Agent is executing the code below: +most_downloaded_model = model_download_tool(task="text-to-video") +print(f"The most downloaded model for the 'text-to-video' task is {most_downloaded_model}.") +==== +``` + +And the output: +`"The most downloaded model for the 'text-to-video' task is ByteDance/AnimateDiff-Lightning."` + +### Manage your agent's toolbox + +If you have already initialized an agent, it is inconvenient to reinitialize it from scratch with a tool you want to use. With Transformers, you can manage an agent's toolbox by adding or replacing a tool. + +Let's add the `model_download_tool` to an existing agent initialized with only the default toolbox. + +```python +from transformers import CodeAgent + +agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True) +agent.toolbox.add_tool(model_download_tool) +``` +Now we can leverage both the new tool and the previous text-to-speech tool: + +```python +agent.run( + "Can you read out loud the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub and return the audio?" +) +``` + + +| **Audio** | +|------------------------------------------------------------------------------------------------------------------------------------------------------| +|