Merge branch 'huggingface:main' into add-device-parameter

This commit is contained in:
Izaak Curry 2025-01-02 21:55:05 -08:00 committed by GitHub
commit 6f87aee50e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 10 additions and 13 deletions

View File

@ -26,12 +26,12 @@ In this guide, we're going to see best practices for building agents.
### The best agentic systems are the simplest: simplify the workflow as much as you can ### The best agentic systems are the simplest: simplify the workflow as much as you can
Giving an LLM some agency in your workflow introducessome risk of errors. Giving an LLM some agency in your workflow introduces some risk of errors.
Well-programmed agentic systems have good error logging and retry mechanisms anyway, so the LLM engine has a chance to self-correct their mistake. But to reduce the risk of LLM error to the maximum, you should simplify your worklow! Well-programmed agentic systems have good error logging and retry mechanisms anyway, so the LLM engine has a chance to self-correct their mistake. But to reduce the risk of LLM error to the maximum, you should simplify your workflow!
Let's take again the example from [intro_agents]: a bot that answers user queries on a surf trip company. Let's take again the example from [intro_agents]: a bot that answers user queries on a surf trip company.
Instead of letting the agent do 2 different calls for "travel distance API" and "weather API" each time they are asked about a new surf spot, you could just make one unified tool "return_spot_information", a functions that calls both APIs at once and returns their concatenated outputs to the user. Instead of letting the agent do 2 different calls for "travel distance API" and "weather API" each time they are asked about a new surf spot, you could just make one unified tool "return_spot_information", a function that calls both APIs at once and returns their concatenated outputs to the user.
This will reduce costs, latency, and error risk! This will reduce costs, latency, and error risk!
@ -168,7 +168,7 @@ Final answer:
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png /var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png
``` ```
The user sees, instead of an image being returned, a path being returned to them. The user sees, instead of an image being returned, a path being returned to them.
It could look like a bug from the system, but actually the agentic system didn't cause the error: it's just that the LLM engine tid the mistake of not saving the image output into a variable. It could look like a bug from the system, but actually the agentic system didn't cause the error: it's just that the LLM engine did the mistake of not saving the image output into a variable.
Thus it cannot access the image again except by leveraging the path that was logged while saving the image, so it returns the path instead of an image. Thus it cannot access the image again except by leveraging the path that was logged while saving the image, so it returns the path instead of an image.
The first step to debugging your agent is thus "Use a more powerful LLM". Alternatives like `Qwen2/5-72B-Instruct` wouldn't have made that mistake. The first step to debugging your agent is thus "Use a more powerful LLM". Alternatives like `Qwen2/5-72B-Instruct` wouldn't have made that mistake.
@ -177,9 +177,9 @@ The first step to debugging your agent is thus "Use a more powerful LLM". Altern
Then you can also use less powerful models but guide them better. Then you can also use less powerful models but guide them better.
Put yourself in the shoes if your model: if you were the model solving the task, would you struggle with the information available to you (from the system prompt + task formulation + tool description) ? Put yourself in the shoes of your model: if you were the model solving the task, would you struggle with the information available to you (from the system prompt + task formulation + tool description) ?
Would you need some added claritications ? Would you need some added clarifications?
To provide extra information, we do not recommend to change the system prompt right away: the default system prompt has many adjustments that you do not want to mess up except if you understand the prompt very well. To provide extra information, we do not recommend to change the system prompt right away: the default system prompt has many adjustments that you do not want to mess up except if you understand the prompt very well.
Better ways to guide your LLM engine are: Better ways to guide your LLM engine are:
@ -217,4 +217,4 @@ agent = CodeAgent(
result = agent.run( result = agent.run(
"How long would a cheetah at full speed take to run the length of Pont Alexandre III?", "How long would a cheetah at full speed take to run the length of Pont Alexandre III?",
) )
``` ```

View File

@ -189,14 +189,12 @@ class HfApiModel(Model):
This engine allows you to communicate with Hugging Face's models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization. This engine allows you to communicate with Hugging Face's models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization.
Parameters: Parameters:
model (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`): model_id (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`):
The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.
token (`str`, *optional*): token (`str`, *optional*):
Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference API'. Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference API'.
If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'. If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'.
If not provided, the class will try to use environment variable 'HF_TOKEN', else use the token stored in the Hugging Face CLI configuration. If not provided, the class will try to use environment variable 'HF_TOKEN', else use the token stored in the Hugging Face CLI configuration.
max_tokens (`int`, *optional*, defaults to 1500):
The maximum number of tokens allowed in the output.
timeout (`int`, *optional*, defaults to 120): timeout (`int`, *optional*, defaults to 120):
Timeout for the API request, in seconds. Timeout for the API request, in seconds.
@ -207,12 +205,11 @@ class HfApiModel(Model):
Example: Example:
```python ```python
>>> engine = HfApiModel( >>> engine = HfApiModel(
... model="Qwen/Qwen2.5-Coder-32B-Instruct", ... model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
... token="your_hf_token_here", ... token="your_hf_token_here",
... max_tokens=2000
... ) ... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}] >>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"]) >>> response = engine(messages, stop_sequences=["END"], max_tokens=1500)
>>> print(response) >>> print(response)
"Quantum mechanics is the branch of physics that studies..." "Quantum mechanics is the branch of physics that studies..."
``` ```