Skip to Content
How-To GuidesScalaConfiguring Semantic Retry Policies (Scala)

Configuring Semantic Retry Policies (Scala)

Golem provides a composable, per-environment retry policy system. Policies are evaluated against error context properties and can be defined in the application manifest, managed via CLI, or created/overridden at runtime from agent code using the SDK.

1. Define Retry Policies in the Application Manifest

Add retry policy definitions under retryPolicyDefaults in golem.yaml, scoped per environment:

retryPolicyDefaults: prod: http-transient: priority: 10 predicate: and: - propEq: { property: "error-type", value: "transient" } - propEq: { property: "uri-scheme", value: "https" } policy: countBox: maxRetries: 5 inner: jitter: factor: 0.15 inner: clamp: minDelay: "100ms" maxDelay: "5s" inner: exponential: baseDelay: "200ms" factor: 2.0 catch-all: priority: 0 predicate: true policy: countBox: maxRetries: 3 inner: exponential: baseDelay: "100ms" factor: 3.0

Policy Evaluation Order

When an error occurs, policies are evaluated in descending priority order. The first matching predicate’s policy is applied. If no user-defined policy matches, the built-in default policy (3 retries, exponential backoff, clamped to [100ms, 1s], 15% jitter) is used.

Base Policies

PolicyDescription
periodicFixed delay between each attempt
exponentialbaseDelay × factor^attempt — exponentially growing delays
fibonacciDelays follow the Fibonacci sequence starting from first and second
immediateRetry immediately (zero delay)
neverNever retry — give up on first failure

Combinators

CombinatorDescription
countBoxLimits the total number of retry attempts
timeBoxLimits retries to a wall-clock duration
clampClamps computed delay to a [minDelay, maxDelay] range
addDelayAdds a constant offset on top of the computed delay
jitterAdds random noise (±factor × delay) to avoid thundering herds
filteredOnApply the inner policy only when a predicate matches; otherwise give up
andThenRun the first policy until it gives up, then switch to the second
unionRetry if either sub-policy wants to; pick the shorter delay
intersectRetry only while both sub-policies want to; pick the longer delay

Predicates

Predicates are boolean expressions evaluated against error context properties. Compose with and, or, not:

  • true / false — always/never match
  • propEq — property equals a value
  • propIn — property is one of a set of values
  • propGte / propLt — numeric comparisons
  • and / or / not — logical composition

Available Properties

Every retry decision happens in a specific context (an outgoing HTTP request, an HTTP response, a worker-to-worker RPC call, a trap from inside the guest, etc.). Each context only populates a subset of the property vocabulary below — a policy keyed on a property that is not present in the current context is silently skipped for that decision (it cannot apply there by definition).

Common to every context:

  • verb — operation verb (HTTP method, RDBMS verb, RPC verb, or "trap" in the trap context)
  • noun-uri — the resource URI (https://..., worker://..., kv://..., blobstore://..., dns://..., wasm://<function> for traps, golem://api, …)
  • uri-scheme, uri-host, uri-port, uri-path — decomposed from noun-uri

Context-specific properties:

PropertyPopulated in
status-codeoutgoing HTTP response only
error-typeoutgoing HTTP response only
functionworker-to-worker RPC call
target-component-idworker-to-worker RPC call
target-agent-typeworker-to-worker RPC call (when the agent ID parses)
db-typeRDBMS operations (e.g. postgres, mysql)
trap-typeguest WASM trap (transient-error, unknown, …)

Practical consequence. A status-code-keyed policy (predicate: status-code in [...]) only fires for HTTP responses. The trap path does not see status-code and silently skips that policy — it does not error out. Likewise, a trap-type-keyed policy only fires from the trap path. Design one policy per context (or use or/and to make a policy explicitly match multiple contexts) rather than expecting a single policy to apply everywhere.

error-type values

  • transient — transient transport failure (e.g. WASI HTTP error code, transient RDBMS error)
  • http-status — HTTP response with a status code that matched a status-code-keyed policy

Status-code retries (opt-in)

Outgoing HTTP responses now flow through the retry-policy machinery: when the response arrives, its status-code is exposed to predicates. A policy is only considered for status-code retries if its predicate (or the predicate inside a nested FilteredOn) explicitly references the status-code property. Catch-all policies — including the synthesized default and any user-defined “matches all” policy — are intentionally excluded so status-based retries remain strictly opt-in.

When a matching policy decides to retry, the rejected response is dropped, the request body is reconstructed from the oplog, and the request is re-sent.

Eligibility rules (mirror inline transport retry):

  • live execution (not replay/snapshot/PersistNothing)
  • request body and trailers are reconstructible
  • the HTTP method is idempotent, or assume_idempotence was set on the outgoing request
  • not inside an atomically(...) block — in v1 status retries are skipped inside atomic regions; the user-land throw still triggers atomic-region replay, which gives equivalent end-to-end behavior

Example status-code policy:

http-5xx-retry: priority: 20 predicate: and: - propIn: { property: "status-code", values: [500, 502, 503, 504] } - propEq: { property: "uri-scheme", value: "https" } policy: countBox: maxRetries: 3 inner: exponential: baseDelay: "200ms" factor: 2.0

2. SDK: Build and Apply Retry Policies at Runtime

Use golem.Guards._ and golem.host.Retry._ to construct and apply retry policies from agent code:

import golem.Guards._ import golem.host.Retry._ import scala.concurrent.duration._ val policy = named( "http-transient", Policy.exponential(200.millis, 2.0) .clamp(100.millis, 5.seconds) .withJitter(0.15) .onlyWhen(Props.errorType.eq("transient")) .maxRetries(5) ).priority(10) .appliesWhen(Props.uriScheme.eq("https"))

Scoped Usage with withRetryPolicy

Apply a policy for a block of code — the previous policy is restored when the Future completes:

withRetryPolicy(policy) { Future { // HTTP calls in this block use the custom retry policy makeHttpRequest() } }

Policy Builder Methods

Build policies fluently from base policies:

// Exponential backoff clamped with jitter and max retries Policy.exponential(200.millis, 2.0) .clamp(100.millis, 5.seconds) .withJitter(0.15) .maxRetries(5) // Periodic with time limit Policy.periodic(1.second) .timeBox(60.seconds) // Immediate retries then fall back to exponential Policy.immediate .maxRetries(3) .andThen( Policy.exponential(1.second, 2.0) .maxRetries(5) ) // Never retry (fail immediately) Policy.never

Predicate Builder Methods

// Match transient host-level failures Props.errorType.eq("transient") // Match a property value Props.uriScheme.eq("https") // Combine predicates Props.errorType.eq("transient") && Props.uriScheme.eq("https")

3. Querying Retry Policies at Runtime

Use the query API to inspect active policies from agent code:

import golem.host.RetryApi // List all active policies val policies = RetryApi.getRetryPolicies() policies.foreach { p => println(s"Policy '${p.name}' priority=${p.priority}") } // Get a specific policy by name RetryApi.getRetryPolicyByName("http-transient").foreach { policy => println(s"Found policy with priority ${policy.priority}") }

The returned JsNamedRetryPolicy has fields: name (String), priority (Int), predicate, policy. Use RetryApi.getNamedPolicies() for high-level Retry.NamedPolicy wrappers.

4. Live-Editing Policies via CLI

Retry policies can be managed at runtime without redeployment:

# Create a new policy golem retry-policy create http-transient \ --priority 10 \ --predicate '{ "and": [{ "propEq": { "property": "error-type", "value": "transient" } }, { "propEq": { "property": "uri-scheme", "value": "https" } }] }' \ --policy '{ "countBox": { "maxRetries": 5, "inner": { "exponential": { "baseDelay": "200ms", "factor": 2.0 } } } }' # List all policies in the current environment golem retry-policy list # Get a specific policy by name golem retry-policy get http-transient # Update an existing policy golem retry-policy update http-transient --priority 15 # Delete a policy golem retry-policy delete http-transient

5. Default Retry Policy

When no user-defined retry policies are set, Golem activates a default catch-all:

  • Name: default
  • Priority: 0
  • Predicate: true (matches everything)
  • Policy: Up to 3 retries, exponential backoff (factor 3.0), delays clamped to [100ms, 1s], 15% jitter

Key Constraints

  • Policies are defined per-environment — different environments can have different retry behaviors
  • Policy names must be unique within an environment
  • Higher priority policies are evaluated first; the first matching predicate wins
  • withRetryPolicy is scoped — the policy is restored when the Future completes
  • Inline retries (automatic transparent retries for transient network errors) happen before the policy system kicks in
  • Changes made via CLI or REST API take effect immediately for running agents
Last updated on