Merge branch 'huggingface:main' into add-device-parameter
This commit is contained in:
commit
6f87aee50e
|
@ -26,12 +26,12 @@ In this guide, we're going to see best practices for building agents.
|
|||
|
||||
### The best agentic systems are the simplest: simplify the workflow as much as you can
|
||||
|
||||
Giving an LLM some agency in your workflow introducessome risk of errors.
|
||||
Giving an LLM some agency in your workflow introduces some risk of errors.
|
||||
|
||||
Well-programmed agentic systems have good error logging and retry mechanisms anyway, so the LLM engine has a chance to self-correct their mistake. But to reduce the risk of LLM error to the maximum, you should simplify your worklow!
|
||||
Well-programmed agentic systems have good error logging and retry mechanisms anyway, so the LLM engine has a chance to self-correct their mistake. But to reduce the risk of LLM error to the maximum, you should simplify your workflow!
|
||||
|
||||
Let's take again the example from [intro_agents]: a bot that answers user queries on a surf trip company.
|
||||
Instead of letting the agent do 2 different calls for "travel distance API" and "weather API" each time they are asked about a new surf spot, you could just make one unified tool "return_spot_information", a functions that calls both APIs at once and returns their concatenated outputs to the user.
|
||||
Instead of letting the agent do 2 different calls for "travel distance API" and "weather API" each time they are asked about a new surf spot, you could just make one unified tool "return_spot_information", a function that calls both APIs at once and returns their concatenated outputs to the user.
|
||||
|
||||
This will reduce costs, latency, and error risk!
|
||||
|
||||
|
@ -168,7 +168,7 @@ Final answer:
|
|||
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png
|
||||
```
|
||||
The user sees, instead of an image being returned, a path being returned to them.
|
||||
It could look like a bug from the system, but actually the agentic system didn't cause the error: it's just that the LLM engine tid the mistake of not saving the image output into a variable.
|
||||
It could look like a bug from the system, but actually the agentic system didn't cause the error: it's just that the LLM engine did the mistake of not saving the image output into a variable.
|
||||
Thus it cannot access the image again except by leveraging the path that was logged while saving the image, so it returns the path instead of an image.
|
||||
|
||||
The first step to debugging your agent is thus "Use a more powerful LLM". Alternatives like `Qwen2/5-72B-Instruct` wouldn't have made that mistake.
|
||||
|
@ -177,9 +177,9 @@ The first step to debugging your agent is thus "Use a more powerful LLM". Altern
|
|||
|
||||
Then you can also use less powerful models but guide them better.
|
||||
|
||||
Put yourself in the shoes if your model: if you were the model solving the task, would you struggle with the information available to you (from the system prompt + task formulation + tool description) ?
|
||||
Put yourself in the shoes of your model: if you were the model solving the task, would you struggle with the information available to you (from the system prompt + task formulation + tool description) ?
|
||||
|
||||
Would you need some added claritications ?
|
||||
Would you need some added clarifications?
|
||||
|
||||
To provide extra information, we do not recommend to change the system prompt right away: the default system prompt has many adjustments that you do not want to mess up except if you understand the prompt very well.
|
||||
Better ways to guide your LLM engine are:
|
||||
|
@ -217,4 +217,4 @@ agent = CodeAgent(
|
|||
result = agent.run(
|
||||
"How long would a cheetah at full speed take to run the length of Pont Alexandre III?",
|
||||
)
|
||||
```
|
||||
```
|
||||
|
|
|
@ -189,14 +189,12 @@ class HfApiModel(Model):
|
|||
This engine allows you to communicate with Hugging Face's models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization.
|
||||
|
||||
Parameters:
|
||||
model (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`):
|
||||
model_id (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`):
|
||||
The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.
|
||||
token (`str`, *optional*):
|
||||
Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference API'.
|
||||
If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'.
|
||||
If not provided, the class will try to use environment variable 'HF_TOKEN', else use the token stored in the Hugging Face CLI configuration.
|
||||
max_tokens (`int`, *optional*, defaults to 1500):
|
||||
The maximum number of tokens allowed in the output.
|
||||
timeout (`int`, *optional*, defaults to 120):
|
||||
Timeout for the API request, in seconds.
|
||||
|
||||
|
@ -207,12 +205,11 @@ class HfApiModel(Model):
|
|||
Example:
|
||||
```python
|
||||
>>> engine = HfApiModel(
|
||||
... model="Qwen/Qwen2.5-Coder-32B-Instruct",
|
||||
... model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
|
||||
... token="your_hf_token_here",
|
||||
... max_tokens=2000
|
||||
... )
|
||||
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
|
||||
>>> response = engine(messages, stop_sequences=["END"])
|
||||
>>> response = engine(messages, stop_sequences=["END"], max_tokens=1500)
|
||||
>>> print(response)
|
||||
"Quantum mechanics is the branch of physics that studies..."
|
||||
```
|
||||
|
|
Loading…
Reference in New Issue