Adding Resource Quotas to an Agent (Scala)
Golem provides a distributed resource quota system via golem.host.QuotaApi. Quotas let you define limited resources (API call rates, storage capacity, connection concurrency) and enforce consumption limits across all agents in a deployment.
1. Define Resources in the Application Manifest
Add resource definitions under resourceDefaults in golem.yaml, scoped per environment:
resourceDefaults:
prod:
api-calls:
limit:
type: Rate
value: 100
period: minute
max: 1000
enforcementAction: reject
unit: request
units: requests
storage:
limit:
type: Capacity
value: 1073741824 # 1 GB
enforcementAction: reject
unit: byte
units: bytes
connections:
limit:
type: Concurrency
value: 50
enforcementAction: throttle
unit: connection
units: connectionsLimit Types
Rate— refillsvaluetokens everyperiod(second/minute/hour/day), capped atmax. Use for rate-limiting API calls.Capacity— fixed pool ofvaluetokens. Once consumed, never refilled. Use for storage budgets.Concurrency— pool ofvaluetokens returned when released. Use for limiting parallel connections.
Enforcement Actions
reject— returnsLeft(FailedReservation)with an optional estimated wait time. The agent must handle the error.throttle— Golem suspends the agent until capacity is available. Fully automatic, no code needed.terminate— kills the agent with a failure message.
2. Acquire a QuotaToken
Acquire a QuotaToken once per resource, typically in the agent constructor:
import golem.host.QuotaApi._
val token = QuotaToken("api-calls", BigInt(1))The second parameter is the expected amount per reservation, used for fair scheduling. For simple 1-call = 1-token rate limiting, use BigInt(1).
3. Simple Rate Limiting with withReservation
Use withReservation to reserve tokens, run code, and commit actual usage:
val result = withReservation(token, BigInt(1)) { reservation =>
callSimpleApi().map { response =>
(BigInt(1), response)
}
}The callback returns Future[(BigInt, T)] where the first element is actual usage. If actual < reserved, unused capacity returns to the pool.
4. Variable-Cost Reservations (e.g., LLM Tokens)
Reserve the maximum expected cost, then commit actual usage:
val result = withReservation(token, BigInt(4000)) { reservation =>
callLlm(prompt, maxTokens = 4000).map { response =>
(BigInt(response.tokensUsed), response)
}
}5. Manual Reserve / Commit
For finer control, use reserve and commit directly:
token.reserve(BigInt(100)) match {
case Right(reservation) =>
val result = doWork()
reservation.commit(BigInt(result.actualUsage))
case Left(failed) =>
println(s"Quota unavailable: $failed")
}6. Splitting Tokens for Agent-to-Agent RPC
Split a portion of your quota to pass to a child agent:
val childToken: QuotaToken = token.split(BigInt(200))
for {
childAgent <- SummarizerAgent.newPhantom()
summary <- childAgent.summarize(text, childToken)
} yield summaryThe child agent receives the QuotaToken as a method parameter and uses it for its own reservations. Merge returned tokens back:
token.merge(returnedToken)7. Dynamic Resource Updates via CLI
Modify resource limits at runtime — changes affect running agents immediately:
golem resource update api-calls --limit '{"type":"rate","value":200,"period":"minute","max":2000}' --environment prodKey Constraints
- Acquire
QuotaTokenonce and reuse — do not create a new one per call - All quota amounts are
BigIntvalues splittraps ifchildExpectedUseexceeds the parent’s current expected-usemergetraps if the tokens refer to different resourceswithReservationreturnsFuture[Either[FailedReservation, T]]—Leftonly forrejectenforcement;throttlesuspends transparently- Resource names in code must match the names in
golem.yamlresourceDefaults