Introduction
Golem is a durable computing platform that makes it simple to build and deploy highly reliable distributed systems.
Golem is highly scalable, and partitions agents across many worker executor nodes, which are each in charge of running a different subset of agents.
Although partitioning agents across many nodes provides the benefit of horizontal scalability, it makes it more difficult to know which node is executing a particular agent.
Even if you know which node is executing an agent, it would not be convenient to interact with the node directly, because it could fail, and you would have to implement logic that detects failure and waits until the agent is recovered on a new node before retrying the invocation.
To address these issues, Golem has a Worker Gateway service, which is effectively stateless and scaled independently of worker executor nodes.
The primary functions of the Worker Gateway are as follows:
- Identify the node that is responsible for executing the agent being invoked, and route the invocation request to the node.
- Transparently handle executor node failures by detecting failure, awaiting recovery, and retrying the invocation.
- Support the execution of custom APIs, which satisfy arbitrary business and technical requirements.
To learn more about how the Worker Gateway supports custom APIs, read the API definitions page.