Worker deployment and performance

This document outlines best practices for deploying and optimizing Workers to ensure high performance, reliability, and scalability. It covers deployment strategies, scaling techniques, tuning recommendations, and monitoring approaches to help you get the most out of your Temporal Workers.

We also provide a reference application: Order Management System (OMS) that demonstrates the deployment best practices in action. You can find the OMS codebase on GitHub.

Deployment and lifecycle management

Well-designed Worker deployment ensures resilience, observability, and maintainability. A Worker should be treated as a long-running service that can be deployed, upgraded, and scaled in a controlled way.

Package and configure Workers for flexibility

Workers should be artifacts produced by a CI/CD pipeline. Inject all required parameters for connecting to Temporal Cloud or a self-hosted Temporal Service at runtime via environment variables, configuration files, or command-line parameters. This allows for more granularity, easier testability, upgrade, scalability, and isolation of workers.

In the order management reference app, Workers are packaged as Docker images with configuration provided via environment variables and mounted configuration files. The following Dockerfile uses a multi-stage build to create a minimal, production-ready Worker image:

FROM golang:1.23.8 AS oms-builder

WORKDIR /usr/src/oms

COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    go mod download

COPY app ./app
COPY cmd ./cmd

RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    CGO_ENABLED=0 go build -v -o /usr/local/bin/oms ./cmd/oms

FROM busybox AS oms-worker

COPY --from=oms-builder /usr/local/bin/oms /usr/local/bin/oms

ENTRYPOINT ["/usr/local/bin/oms", "worker"]

This Dockerfile uses a multi-stage build pattern with two stages:

oms-builder stage: compile the Worker binary.
1. Copies dependency files and downloads dependencies using BuildKit cache mounts to speed up subsequent builds.
2. Copies the application code and builds a statically linked binary that doesn't require external libraries at runtime.
oms-worker stage: create a minimal final image.
1. Copies only the compiled binary from the oms-builder stage
2. Sets the entrypoint to run the Worker process.

The entry point oms worker starts the Worker process, which reads configuration from environment variables at runtime. For example, the Billing Worker deployment in Kubernetes uses environment variables to configure the Worker:

spec:
  containers:
    - args:
        - -k
        - supersecretkey
        - -s
        - billing
      env:
        - name: FRAUD_API_URL
          value: http://billing-api:8084
        - name: TEMPORAL_ADDRESS
          value: temporal-frontend.temporal:7233
      image: ghcr.io/temporalio/reference-app-orders-go-worker:latest
      name: billing-worker
      imagePullPolicy: Always
  enableServiceLinks: false

Separate Task Queues logically

Use separate Task Queues for distinct workloads. This isolation allows you to control rate limiting, prioritize certain workloads, and prevent one workload from starving another. For each Task Queue, ensure you configure at least two Workers to poll the Task Queue.

In the order management reference app, each microservice has its own Task Queue. For example, the Billing Worker polls the billing Task Queue, while the Order Worker polls the order Task Queue. This separation allows each service to scale independently based on its workload.

Diagram showing separate Task Queues for different Workers

The following code snippet shows how the Billing Worker is set up to poll its Task Queue. The default value for TaskQueue comes from the api.go configuration file and is set to billing.

func RunWorker(ctx context.Context, config config.AppConfig, client client.Client) error {
	w := worker.New(client, TaskQueue, worker.Options{
		MaxConcurrentWorkflowTaskPollers: 8,
		MaxConcurrentActivityTaskPollers: 8,
	})

	w.RegisterWorkflow(Charge)
	w.RegisterActivity(&Activities{FraudCheckURL: config.FraudURL})

	return w.Run(temporalutil.WorkerInterruptFromContext(ctx))
}

Version Workflows safely

Use Worker Versioning to deploy new Workflow code without breaking running Executions.
Worker Versioning lets you map each Workflow Execution to a specific Worker Deployment Version identified by a build ID. This guarantees that pinned Workflows always run on the same Worker version where they started

To learn more about versioning Workflows, see the Workflow Versioning guide.

Manage Workflow History growth

The maximum Event History is 50,000 events for a given Workflow. Workflows that accumulate large histories can experience replay delays. For long-running Workflows, track the Event History size and use ContinueAsNew to reset the Event History when it starts to degrade performance. All Temporal SDKs provide functions to suggest whether to use continue-as-new based on the current history size. For example, Python SDK has the is_continue_as_new_suggested() function that returns a boolean indicating whether to use continue-as-new.

If you don't want to rely on the built-in function, aim to keep the Event History to fewer than 20,000 events to maintain optimal performance.

Scaling, monitoring, and tuning

Scaling and tuning are critical to Worker performance and cost efficiency. The goal is to balance concurrency, throughput, and resource utilization while maintaining low task latency.

Interpret metrics as a whole

No single metric tells the full story. The following are some of the most useful Worker-related metrics to monitor. We recommend having all metrics listed below on your Worker monitoring dashboard. When you observe anomalies, correlate across multiple metrics to identify root causes.

Worker CPU and memory utilization
workflow_task_schedule_to_start_latency and activity_task_schedule_to_start_latency
worker_task_slots_available
temporal_long_request_failure and temporal_long_request_latency

Schedule-to-Start latency measures how long a Task waits in the queue before a Worker starts it. High latency means your Workers or pollers can’t keep up with incoming Tasks, but the root cause depends on your resource metrics:

High latency and high CPU/memory: Workers are saturated. Add more Worker processes or replicas.
High latency and low CPU/memory: Workers are underutilized. Increase the number of pollers, executor slots, or both. If this is accompanied by high temporal_long_request_latency or temporal_long_request_failure, your Workers are struggling to reach the Temporal Service. Refer to Troubleshooting for guidance.

Optimize Worker cache

Workers keep a cache of Workflow Executions to improve performance by reducing replay overhead. However, larger caches consume more memory. The temporal_sticky_cache_size tracks the size of the cache. If you observe high memory usage for your Workers and high temporal_sticky_cache_size, you can be reasonably sure the cache is contributing to memory pressure.

Having a high temporal_sticky_cache_size by itself isn't necessarily an issue, but if your Workers are memory-bound, consider reducing the cache size to allow more concurrent executions. We recommend you experiment with different cache sizes in a staging environment to find the optimal setting for your Workflows. Refer to Troubleshooting - Caching for more details on how to interpret the different cache-related metrics.

Manage scale-down safely

Before shutting down a Worker, verify that it does not have too many active Tasks. This is especially relevant if your Workers are handling long-running, expensive Activities.

If worker_task_slots_available is or near zero, the Worker is running active Tasks. Shutting it down could trigger expensive retries or timeouts for long-running Activities.

Deployment and lifecycle management​

Package and configure Workers for flexibility​

Separate Task Queues logically​

Version Workflows safely​

Manage Workflow History growth​

Scaling, monitoring, and tuning​

Interpret metrics as a whole​

Optimize Worker cache​

Manage scale-down safely​