
In the past weeks, I have been studying AI agents, especially in the context of the Hermes agent framework that is in my opinion one of the best setups. The great thing of ai agents is that they are not just text and answer machines, they can do actual tasks from writing e-books, generating social media posts, youtube thumbnails or even 3D printing.
On the other hand the problem with AI agents is that the costs for API calls can explode. I watched tons of tutorials and reviews on Youtube and was surprised that a lot of people using AI agents were using Claude Opus 4.8 which is probably the best model but also by far the most expensive to run. Now when you run agents, their process is not to give you a straight answer but to think things through, to look at their skills and instructions and generate the output you want. This means that you have no idea how many tokens your process will take. If your agent is stuck in a loop, it can get expensive quite quickly and you might end up with huge monthly bills that go far beyond the monthly subscription that you have.
This was the main reason why I hesitated for a long time to go into AI agents. Another point that I don't really appreciate is how people deal in these tutorials with api keys and mcp accesses. The problem is that if they are not well hidden and protected, prompt injections could make them freely visible. If your api key is compromised and you don't realise it, somebody can use it and you pay the bill. These are the reasons why I started with AI agents at a very slow pace, trying to eliminate risks and problems.
How to limit costs?
For most processes it is overkill to use Claude Opus, the best model. To scrape a website a much lighter and cheaper model is by far enough. I have a pro Claude account and with a subscription I can't use the claude account with hermes or open claw. So what I do, I use claude to generate the planning and the PRD's for my projects. I then feed this to my manager agent in hermes. That's where the costs could explode if no guardrails are in place. I was exploring many different set ups:
1- API from anthropic or open ai
2- API from services like open router or other free or cheap services
3- open source models on local machine
To be honest, solution 1 doesn't make any sense to me. It's simply too expensive. While learning how to deal with hermes, I don't want to have thousands of dollars of api costs just for running a pseudo swarm of agents. Solution 3, seems for me the long term holy grail. There are very powerful open source models available that you can download on your computer and run locally. The advantage, no more API costs for AI! The issue with this solution is that you need a computer that is powerful enough to run such models. So far, I thought that only nvidia cards could handle this but soon realised that you need several cards to get enough ram to run these models. There are some other solutions that work from apple and AMD that not only are slightly cheaper than nvidia but also use way less power.
My long time goal is to go for that but only once I have learned how to work with agents and I see that there is a real utility for me.
My free set up
So to test things out, I decided to run hermes on my old desktop computer and then use the only free api model that I have found which is gemma 4. This model is part of the google family and you can get a free API on google ai studio. I first run gemini 3.5 flash but wasn't happy with the results and reached rate limits fast. Then I set up gemma 4 31B and it was actually much better than gemini for agentic work. I have to say that gemma 4 has become one of my favourite models to run. This set up is basically free to run but it can happen that the servers are too busy and my agent stop. Another thing is that to run this, my computer needs to be on. So I can't really use cron jobs and telegram or discord connection only works if my computer is up and running. I'm using the hermes desktop app that runs quite well on windows and it's good enough to experiment and create agents and test them out.

hermes desktop app
When I look at this, a computer capable or running big models costs between 4'000 and 5'000 $ which is actually not that much if you think that you will never need to pay api costs again. The big advantage linked to that is that your data is not sent to api providers, it goes to your own computer and stays there. A framework like hermes allows you to define what each agent is allowed, which mcp's and which api's it can access and also which model it uses. With a local infrastructure, you become model agnostic. You can download the models that you want and then they are loaded in your ram according to which model you use.

A small extract of open source models on ollama to host locally
For me this set up with locally hosted models is the only solution for extensive work with ai agents. What do you think about open source models?
With @ph1102, I'm running the @liotes project.
Please consider supporting our Witness nodes:
