Support third-party Inference providers in `HfApiModel` (#422)
* Add `provider` param to `HfApiModel `, update guided_tour.md --------- Co-authored-by: Aymeric <aymeric.roucher@gmail.com>
This commit is contained in:
parent
b1742ed06c
commit
d3912c70cf
|
@ -25,18 +25,18 @@ To initialize a minimal agent, you need at least these two arguments:
|
||||||
|
|
||||||
- `model`, a text-generation model to power your agent - because the agent is different from a simple LLM, it is a system that uses a LLM as its engine. You can use any of these options:
|
- `model`, a text-generation model to power your agent - because the agent is different from a simple LLM, it is a system that uses a LLM as its engine. You can use any of these options:
|
||||||
- [`TransformersModel`] takes a pre-initialized `transformers` pipeline to run inference on your local machine using `transformers`.
|
- [`TransformersModel`] takes a pre-initialized `transformers` pipeline to run inference on your local machine using `transformers`.
|
||||||
- [`HfApiModel`] leverages a `huggingface_hub.InferenceClient` under the hood.
|
- [`HfApiModel`] leverages a `huggingface_hub.InferenceClient` under the hood and supports all Inference Providers on the Hub.
|
||||||
- [`LiteLLMModel`] lets you call 100+ different models through [LiteLLM](https://docs.litellm.ai/)!
|
- [`LiteLLMModel`] similarly lets you call 100+ different models and providers through [LiteLLM](https://docs.litellm.ai/)!
|
||||||
- [`AzureOpenAIServerModel`] allows you to use OpenAI models deployed in [Azure](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
|
- [`AzureOpenAIServerModel`] allows you to use OpenAI models deployed in [Azure](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
|
||||||
|
|
||||||
- `tools`, a list of `Tools` that the agent can use to solve the task. It can be an empty list. You can also add the default toolbox on top of your `tools` list by defining the optional argument `add_base_tools=True`.
|
- `tools`, a list of `Tools` that the agent can use to solve the task. It can be an empty list. You can also add the default toolbox on top of your `tools` list by defining the optional argument `add_base_tools=True`.
|
||||||
|
|
||||||
Once you have these two arguments, `tools` and `model`, you can create an agent and run it. You can use any LLM you'd like, either through [Hugging Face API](https://huggingface.co/docs/api-inference/en/index), [transformers](https://github.com/huggingface/transformers/), [ollama](https://ollama.com/), [LiteLLM](https://www.litellm.ai/), or [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
|
Once you have these two arguments, `tools` and `model`, you can create an agent and run it. You can use any LLM you'd like, either through [Inference Providers](https://huggingface.co/blog/inference-providers), [transformers](https://github.com/huggingface/transformers/), [ollama](https://ollama.com/), [LiteLLM](https://www.litellm.ai/), or [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
|
||||||
|
|
||||||
<hfoptions id="Pick a LLM">
|
<hfoptions id="Pick a LLM">
|
||||||
<hfoption id="Hugging Face API">
|
<hfoption id="HF Inference API">
|
||||||
|
|
||||||
Hugging Face API is free to use without a token, but then it will have a rate limitation.
|
HF Inference API is free to use without a token, but then it will have a rate limit.
|
||||||
|
|
||||||
To access gated models or rise your rate limits with a PRO account, you need to set the environment variable `HF_TOKEN` or pass `token` variable upon initialization of `HfApiModel`. You can get your token from your [settings page](https://huggingface.co/settings/tokens)
|
To access gated models or rise your rate limits with a PRO account, you need to set the environment variable `HF_TOKEN` or pass `token` variable upon initialization of `HfApiModel`. You can get your token from your [settings page](https://huggingface.co/settings/tokens)
|
||||||
|
|
||||||
|
@ -46,6 +46,7 @@ from smolagents import CodeAgent, HfApiModel
|
||||||
model_id = "meta-llama/Llama-3.3-70B-Instruct"
|
model_id = "meta-llama/Llama-3.3-70B-Instruct"
|
||||||
|
|
||||||
model = HfApiModel(model_id=model_id, token="<YOUR_HUGGINGFACEHUB_API_TOKEN>") # You can choose to not pass any model_id to HfApiModel to use a default free model
|
model = HfApiModel(model_id=model_id, token="<YOUR_HUGGINGFACEHUB_API_TOKEN>") # You can choose to not pass any model_id to HfApiModel to use a default free model
|
||||||
|
# you can also specify a particular provider e.g. provider="together" or provider="sambanova"
|
||||||
agent = CodeAgent(tools=[], model=model, add_base_tools=True)
|
agent = CodeAgent(tools=[], model=model, add_base_tools=True)
|
||||||
|
|
||||||
agent.run(
|
agent.run(
|
||||||
|
|
|
@ -25,7 +25,7 @@ This library offers:
|
||||||
|
|
||||||
✨ **Simplicity**: the logic for agents fits in ~thousand lines of code. We kept abstractions to their minimal shape above raw code!
|
✨ **Simplicity**: the logic for agents fits in ~thousand lines of code. We kept abstractions to their minimal shape above raw code!
|
||||||
|
|
||||||
🌐 **Support for any LLM**: it supports models hosted on the Hub loaded in their `transformers` version or through our inference API, but also models from OpenAI, Anthropic... it's really easy to power an agent with any LLM.
|
🌐 **Support for any LLM**: it supports models hosted on the Hub loaded in their `transformers` version or through our inference API and Inference providers, but also models from OpenAI, Anthropic... it's really easy to power an agent with any LLM.
|
||||||
|
|
||||||
🧑💻 **First-class support for Code Agents**, i.e. agents that write their actions in code (as opposed to "agents being used to write code"), [read more here](tutorials/secure_code_execution).
|
🧑💻 **First-class support for Code Agents**, i.e. agents that write their actions in code (as opposed to "agents being used to write code"), [read more here](tutorials/secure_code_execution).
|
||||||
|
|
||||||
|
|
|
@ -74,7 +74,7 @@ print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
|
||||||
|
|
||||||
### HfApiModel
|
### HfApiModel
|
||||||
|
|
||||||
The `HfApiModel` wraps an [HF Inference API](https://huggingface.co/docs/api-inference/index) client for the execution of the LLM.
|
The `HfApiModel` wraps huggingface_hub's [InferenceClient](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) for the execution of the LLM. It supports both HF's own [Inference API](https://huggingface.co/docs/api-inference/index) as well as all [Inference Providers](https://huggingface.co/blog/inference-providers) available on the Hub.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from smolagents import HfApiModel
|
from smolagents import HfApiModel
|
||||||
|
|
|
@ -12,7 +12,7 @@ authors = [
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.10"
|
requires-python = ">=3.10"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"huggingface-hub>=0.24.0",
|
"huggingface-hub>=0.28.0",
|
||||||
"requests>=2.32.3",
|
"requests>=2.32.3",
|
||||||
"rich>=13.9.4",
|
"rich>=13.9.4",
|
||||||
"pandas>=2.2.3",
|
"pandas>=2.2.3",
|
||||||
|
|
|
@ -21,6 +21,8 @@ from .default_tools import *
|
||||||
from .e2b_executor import *
|
from .e2b_executor import *
|
||||||
from .gradio_ui import *
|
from .gradio_ui import *
|
||||||
from .local_python_executor import *
|
from .local_python_executor import *
|
||||||
|
from .logger import *
|
||||||
|
from .memory import *
|
||||||
from .models import *
|
from .models import *
|
||||||
from .monitoring import *
|
from .monitoring import *
|
||||||
from .prompts import *
|
from .prompts import *
|
||||||
|
|
|
@ -338,6 +338,9 @@ class HfApiModel(Model):
|
||||||
Parameters:
|
Parameters:
|
||||||
model_id (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`):
|
model_id (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`):
|
||||||
The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.
|
The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.
|
||||||
|
provider (`str`, *optional*):
|
||||||
|
Name of the provider to use for inference. Can be `"replicate"`, `"together"`, `"fal-ai"`, `"sambanova"` or `"hf-inference"`.
|
||||||
|
defaults to hf-inference (HF Inference API).
|
||||||
token (`str`, *optional*):
|
token (`str`, *optional*):
|
||||||
Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference API'.
|
Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference API'.
|
||||||
If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'.
|
If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'.
|
||||||
|
@ -368,15 +371,17 @@ class HfApiModel(Model):
|
||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
model_id: str = "Qwen/Qwen2.5-Coder-32B-Instruct",
|
model_id: str = "Qwen/Qwen2.5-Coder-32B-Instruct",
|
||||||
|
provider: Optional[str] = None,
|
||||||
token: Optional[str] = None,
|
token: Optional[str] = None,
|
||||||
timeout: Optional[int] = 120,
|
timeout: Optional[int] = 120,
|
||||||
**kwargs,
|
**kwargs,
|
||||||
):
|
):
|
||||||
super().__init__(**kwargs)
|
super().__init__(**kwargs)
|
||||||
self.model_id = model_id
|
self.model_id = model_id
|
||||||
|
self.provider = provider
|
||||||
if token is None:
|
if token is None:
|
||||||
token = os.getenv("HF_TOKEN")
|
token = os.getenv("HF_TOKEN")
|
||||||
self.client = InferenceClient(self.model_id, token=token, timeout=timeout)
|
self.client = InferenceClient(self.model_id, provider=provider, token=token, timeout=timeout)
|
||||||
|
|
||||||
def __call__(
|
def __call__(
|
||||||
self,
|
self,
|
||||||
|
|
|
@ -474,7 +474,7 @@ class AgentTests(unittest.TestCase):
|
||||||
with agent.logger.console.capture() as capture:
|
with agent.logger.console.capture() as capture:
|
||||||
agent.run("Count to 3")
|
agent.run("Count to 3")
|
||||||
str_output = capture.get()
|
str_output = capture.get()
|
||||||
assert "Consider passing said import under" in str_output.replace("\n", "")
|
assert "`additional_authorized_imports`" in str_output.replace("\n", "")
|
||||||
|
|
||||||
def test_multiagents(self):
|
def test_multiagents(self):
|
||||||
class FakeModelMultiagentsManagerAgent:
|
class FakeModelMultiagentsManagerAgent:
|
||||||
|
|
|
@ -78,6 +78,7 @@ class DocCodeExtractor:
|
||||||
return tmp_file
|
return tmp_file
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.skipif(not os.getenv("RUN_ALL"), reason="RUN_ALL environment variable not set")
|
||||||
class TestDocs:
|
class TestDocs:
|
||||||
"""Test case for documentation code testing."""
|
"""Test case for documentation code testing."""
|
||||||
|
|
||||||
|
|
|
@ -13,10 +13,12 @@
|
||||||
# See the License for the specific language governing permissions and
|
# See the License for the specific language governing permissions and
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
import json
|
import json
|
||||||
|
import os
|
||||||
import unittest
|
import unittest
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
|
import pytest
|
||||||
from transformers.testing_utils import get_tests_dir
|
from transformers.testing_utils import get_tests_dir
|
||||||
|
|
||||||
from smolagents import ChatMessage, HfApiModel, TransformersModel, models, tool
|
from smolagents import ChatMessage, HfApiModel, TransformersModel, models, tool
|
||||||
|
@ -51,6 +53,12 @@ class ModelTests(unittest.TestCase):
|
||||||
messages = [{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}]
|
messages = [{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}]
|
||||||
model(messages, stop_sequences=["great"])
|
model(messages, stop_sequences=["great"])
|
||||||
|
|
||||||
|
@pytest.mark.skipif(not os.getenv("RUN_ALL"), reason="RUN_ALL environment variable not set")
|
||||||
|
def test_get_hfapi_message_no_tool_external_provider(self):
|
||||||
|
model = HfApiModel(model="Qwen/Qwen2.5-Coder-32B-Instruct", provider="together", max_tokens=10)
|
||||||
|
messages = [{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}]
|
||||||
|
model(messages, stop_sequences=["great"])
|
||||||
|
|
||||||
def test_transformers_message_no_tool(self):
|
def test_transformers_message_no_tool(self):
|
||||||
model = TransformersModel(
|
model = TransformersModel(
|
||||||
model_id="HuggingFaceTB/SmolLM2-135M-Instruct",
|
model_id="HuggingFaceTB/SmolLM2-135M-Instruct",
|
||||||
|
|
Loading…
Reference in New Issue