Support third-party Inference providers in `HfApiModel` (#422)

* Add `provider` param to `HfApiModel `, update guided_tour.md

---------

Co-authored-by: Aymeric <aymeric.roucher@gmail.com>
This commit is contained in:
Julien Chaumond 2025-01-30 01:03:09 +01:00 committed by GitHub
parent b1742ed06c
commit d3912c70cf
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 27 additions and 10 deletions

View File

@ -25,18 +25,18 @@ To initialize a minimal agent, you need at least these two arguments:
- `model`, a text-generation model to power your agent - because the agent is different from a simple LLM, it is a system that uses a LLM as its engine. You can use any of these options: - `model`, a text-generation model to power your agent - because the agent is different from a simple LLM, it is a system that uses a LLM as its engine. You can use any of these options:
- [`TransformersModel`] takes a pre-initialized `transformers` pipeline to run inference on your local machine using `transformers`. - [`TransformersModel`] takes a pre-initialized `transformers` pipeline to run inference on your local machine using `transformers`.
- [`HfApiModel`] leverages a `huggingface_hub.InferenceClient` under the hood. - [`HfApiModel`] leverages a `huggingface_hub.InferenceClient` under the hood and supports all Inference Providers on the Hub.
- [`LiteLLMModel`] lets you call 100+ different models through [LiteLLM](https://docs.litellm.ai/)! - [`LiteLLMModel`] similarly lets you call 100+ different models and providers through [LiteLLM](https://docs.litellm.ai/)!
- [`AzureOpenAIServerModel`] allows you to use OpenAI models deployed in [Azure](https://azure.microsoft.com/en-us/products/ai-services/openai-service). - [`AzureOpenAIServerModel`] allows you to use OpenAI models deployed in [Azure](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
- `tools`, a list of `Tools` that the agent can use to solve the task. It can be an empty list. You can also add the default toolbox on top of your `tools` list by defining the optional argument `add_base_tools=True`. - `tools`, a list of `Tools` that the agent can use to solve the task. It can be an empty list. You can also add the default toolbox on top of your `tools` list by defining the optional argument `add_base_tools=True`.
Once you have these two arguments, `tools` and `model`, you can create an agent and run it. You can use any LLM you'd like, either through [Hugging Face API](https://huggingface.co/docs/api-inference/en/index), [transformers](https://github.com/huggingface/transformers/), [ollama](https://ollama.com/), [LiteLLM](https://www.litellm.ai/), or [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service). Once you have these two arguments, `tools` and `model`, you can create an agent and run it. You can use any LLM you'd like, either through [Inference Providers](https://huggingface.co/blog/inference-providers), [transformers](https://github.com/huggingface/transformers/), [ollama](https://ollama.com/), [LiteLLM](https://www.litellm.ai/), or [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
<hfoptions id="Pick a LLM"> <hfoptions id="Pick a LLM">
<hfoption id="Hugging Face API"> <hfoption id="HF Inference API">
Hugging Face API is free to use without a token, but then it will have a rate limitation. HF Inference API is free to use without a token, but then it will have a rate limit.
To access gated models or rise your rate limits with a PRO account, you need to set the environment variable `HF_TOKEN` or pass `token` variable upon initialization of `HfApiModel`. You can get your token from your [settings page](https://huggingface.co/settings/tokens) To access gated models or rise your rate limits with a PRO account, you need to set the environment variable `HF_TOKEN` or pass `token` variable upon initialization of `HfApiModel`. You can get your token from your [settings page](https://huggingface.co/settings/tokens)
@ -46,6 +46,7 @@ from smolagents import CodeAgent, HfApiModel
model_id = "meta-llama/Llama-3.3-70B-Instruct" model_id = "meta-llama/Llama-3.3-70B-Instruct"
model = HfApiModel(model_id=model_id, token="<YOUR_HUGGINGFACEHUB_API_TOKEN>") # You can choose to not pass any model_id to HfApiModel to use a default free model model = HfApiModel(model_id=model_id, token="<YOUR_HUGGINGFACEHUB_API_TOKEN>") # You can choose to not pass any model_id to HfApiModel to use a default free model
# you can also specify a particular provider e.g. provider="together" or provider="sambanova"
agent = CodeAgent(tools=[], model=model, add_base_tools=True) agent = CodeAgent(tools=[], model=model, add_base_tools=True)
agent.run( agent.run(

View File

@ -25,7 +25,7 @@ This library offers:
**Simplicity**: the logic for agents fits in ~thousand lines of code. We kept abstractions to their minimal shape above raw code! **Simplicity**: the logic for agents fits in ~thousand lines of code. We kept abstractions to their minimal shape above raw code!
🌐 **Support for any LLM**: it supports models hosted on the Hub loaded in their `transformers` version or through our inference API, but also models from OpenAI, Anthropic... it's really easy to power an agent with any LLM. 🌐 **Support for any LLM**: it supports models hosted on the Hub loaded in their `transformers` version or through our inference API and Inference providers, but also models from OpenAI, Anthropic... it's really easy to power an agent with any LLM.
🧑‍💻 **First-class support for Code Agents**, i.e. agents that write their actions in code (as opposed to "agents being used to write code"), [read more here](tutorials/secure_code_execution). 🧑‍💻 **First-class support for Code Agents**, i.e. agents that write their actions in code (as opposed to "agents being used to write code"), [read more here](tutorials/secure_code_execution).

View File

@ -74,7 +74,7 @@ print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
### HfApiModel ### HfApiModel
The `HfApiModel` wraps an [HF Inference API](https://huggingface.co/docs/api-inference/index) client for the execution of the LLM. The `HfApiModel` wraps huggingface_hub's [InferenceClient](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) for the execution of the LLM. It supports both HF's own [Inference API](https://huggingface.co/docs/api-inference/index) as well as all [Inference Providers](https://huggingface.co/blog/inference-providers) available on the Hub.
```python ```python
from smolagents import HfApiModel from smolagents import HfApiModel

View File

@ -12,7 +12,7 @@ authors = [
readme = "README.md" readme = "README.md"
requires-python = ">=3.10" requires-python = ">=3.10"
dependencies = [ dependencies = [
"huggingface-hub>=0.24.0", "huggingface-hub>=0.28.0",
"requests>=2.32.3", "requests>=2.32.3",
"rich>=13.9.4", "rich>=13.9.4",
"pandas>=2.2.3", "pandas>=2.2.3",

View File

@ -21,6 +21,8 @@ from .default_tools import *
from .e2b_executor import * from .e2b_executor import *
from .gradio_ui import * from .gradio_ui import *
from .local_python_executor import * from .local_python_executor import *
from .logger import *
from .memory import *
from .models import * from .models import *
from .monitoring import * from .monitoring import *
from .prompts import * from .prompts import *

View File

@ -338,6 +338,9 @@ class HfApiModel(Model):
Parameters: Parameters:
model_id (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`): model_id (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`):
The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.
provider (`str`, *optional*):
Name of the provider to use for inference. Can be `"replicate"`, `"together"`, `"fal-ai"`, `"sambanova"` or `"hf-inference"`.
defaults to hf-inference (HF Inference API).
token (`str`, *optional*): token (`str`, *optional*):
Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference API'. Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference API'.
If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'. If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'.
@ -368,15 +371,17 @@ class HfApiModel(Model):
def __init__( def __init__(
self, self,
model_id: str = "Qwen/Qwen2.5-Coder-32B-Instruct", model_id: str = "Qwen/Qwen2.5-Coder-32B-Instruct",
provider: Optional[str] = None,
token: Optional[str] = None, token: Optional[str] = None,
timeout: Optional[int] = 120, timeout: Optional[int] = 120,
**kwargs, **kwargs,
): ):
super().__init__(**kwargs) super().__init__(**kwargs)
self.model_id = model_id self.model_id = model_id
self.provider = provider
if token is None: if token is None:
token = os.getenv("HF_TOKEN") token = os.getenv("HF_TOKEN")
self.client = InferenceClient(self.model_id, token=token, timeout=timeout) self.client = InferenceClient(self.model_id, provider=provider, token=token, timeout=timeout)
def __call__( def __call__(
self, self,

View File

@ -474,7 +474,7 @@ class AgentTests(unittest.TestCase):
with agent.logger.console.capture() as capture: with agent.logger.console.capture() as capture:
agent.run("Count to 3") agent.run("Count to 3")
str_output = capture.get() str_output = capture.get()
assert "Consider passing said import under" in str_output.replace("\n", "") assert "`additional_authorized_imports`" in str_output.replace("\n", "")
def test_multiagents(self): def test_multiagents(self):
class FakeModelMultiagentsManagerAgent: class FakeModelMultiagentsManagerAgent:

View File

@ -78,6 +78,7 @@ class DocCodeExtractor:
return tmp_file return tmp_file
@pytest.mark.skipif(not os.getenv("RUN_ALL"), reason="RUN_ALL environment variable not set")
class TestDocs: class TestDocs:
"""Test case for documentation code testing.""" """Test case for documentation code testing."""

View File

@ -13,10 +13,12 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import json import json
import os
import unittest import unittest
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Optional
import pytest
from transformers.testing_utils import get_tests_dir from transformers.testing_utils import get_tests_dir
from smolagents import ChatMessage, HfApiModel, TransformersModel, models, tool from smolagents import ChatMessage, HfApiModel, TransformersModel, models, tool
@ -51,6 +53,12 @@ class ModelTests(unittest.TestCase):
messages = [{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}] messages = [{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}]
model(messages, stop_sequences=["great"]) model(messages, stop_sequences=["great"])
@pytest.mark.skipif(not os.getenv("RUN_ALL"), reason="RUN_ALL environment variable not set")
def test_get_hfapi_message_no_tool_external_provider(self):
model = HfApiModel(model="Qwen/Qwen2.5-Coder-32B-Instruct", provider="together", max_tokens=10)
messages = [{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}]
model(messages, stop_sequences=["great"])
def test_transformers_message_no_tool(self): def test_transformers_message_no_tool(self):
model = TransformersModel( model = TransformersModel(
model_id="HuggingFaceTB/SmolLM2-135M-Instruct", model_id="HuggingFaceTB/SmolLM2-135M-Instruct",