Work request scheduling

Lifecycle of a user-submitted work request

When a work request gets submitted by a user, Debusine records it with a pending status and with no worker assigned. The list of pending work requests constitutes the queue of work requests that are waiting to be processed.

When the Debusine scheduler finds a suitable worker (it must be idle and must match the requirements defined by the planned task), the work request is assigned to the worker and the worker is notified of the availability of a new work request to process. When the worker starts to process the work request, the status is updated to running.

Note

At any point in time, there is at most one (pending or running) work request assigned to a given worker.

When the worker has finished to process the work request, and after having sent back the results and uploaded generated artifacts, the status is updated to completed.

The aborted status is a special case, it can only be set by the submitter, by an administrator or by the scheduler when the pre-requisites are not met. It is the official way to cancel a work request.

A failed work request can be retried, in which case a new work request is created superseding the old one, its dynamic task data is recomputed with new lookups to update references to artifacts like build environments, and previous work request dependencies are updated to point to the new one, effectively replacing it. The superseded work request will be kept for inspection.

Lifecycle of a work request inside a workflow

When a workflow creates work requests, it will typically create dependencies between them. When a work request has a dependency against a work request that is not yet completed, it is put in the blocked status.

The scheduler will move the work request to the pending status only when all the dependent work requests have successfully completed their work. When a dependent work request has failed (and when it was not allowed to fail), the work request will be marked as aborted.

Priorities

Work requests have a base priority and a priority adjustment. The former is set automatically, and the latter by administrators. The effective priority of a work request is the sum of its base priority and its priority adjustment. The scheduler considers eligible tasks in descending order of effective priority.

The base priority of a work request is normally set by a workflow template or a workflow orchestrator. Failing that, it defaults to the effective priority of the parent work request (computed at creation time). If there is no parent work request, it defaults to 0.

Workflow templates have a priority, which is used as the initial base priority for work requests created from that template. This can be set by administrators, and it is expected that workflows used by automated QA tasks would be given a negative priority.

When workflow orchestrators lay out an execution plan, they may adjust the base priority of each resulting work request relative to the parent work request’s effective priority. For example, an orchestrator planning several different kinds of tasks might choose to give quicker static analysis tasks a slightly higher base priority than slower dynamic analysis tasks.

Separating the priority adjustment from the base priority allows us to tell more easily when effective priorities have been adjusted manually.

Users with the db.change_workrequest permission (including superusers) can use debusine manage-work-request --set-priority-adjustment ADJUSTMENT WORK_REQUEST_ID to adjust a work request’s priority.

Ordering of work request

The scheduler handles the queue of pending work requests in descending order of effective priority, breaking ties by chronological order of creation (first in, first out).

Note

This doesn’t mean that all work requests will be processed in that priority order because they can have different requirements for workers. If work request N has no suitable worker available, but N+1 has one worker available, then N+1 will start before N.

Matching of workers and work requests

Every time that a worker completes a work request, the scheduler kicks in and tries to find a suitable next work request for that worker.

The scheduler builds a dictionary-based description of that worker by combining static metadata (set by the administrators) and dynamic metadata (returned by the worker themselves). The key/values from the static metadata take precedence over those provided by the dynamic metadata.

A first filter is made by:

  • excluding work requests whose task_name are listed in the tasks_denylist metadata

  • selecting work requests whose task_name are listed in the tasks_allowlist metadata

Then a second — work-request specific — filter is made by the scheduler.

Workers and work requests can both provide and require scheduler tags. To easily recognize them apart, tags provided by workers are prefixed with worker: while tags provided by tasks are prefixed with task:.

A work request can be executed on a given worker only if the following two conditions are met: all the scheduler tags required by the work request must be provided by the worker, and all the scheduler tags required by the worker must be provided by the work request.

This matching is performed using scheduler tags stored in the database, allowing efficient filtering of candidate work requests.

If there are remaining work requests, then they are deemed suitable for that worker. The scheduler considers them in descending priority order, using age as a tie-breaker when multiple work requests have the same priority (the oldest work request is selected).

For information on configuring worker metadata and scheduler tags, see Configure and manage a worker.

Management of architecture-specific tasks

Many work requests have to run on workers supporting a specific architecture (or on workers that are compatible with that architecture).

Architecture compatibility is represented through scheduler tags such as worker:build-arch:amd64.

Workers automatically advertise architecture tags for architectures supported by the host system. Additional architecture tags may be configured in worker metadata when using emulation technologies such as qemu-user-static.

By default, workers advertise the host architecture (as returned by dpkg --print-architecture) together with any compatible architectures supported by the system.

See Configure the list of compatible architectures for detailed instructions on how to configure the appropriate metadata.

Dynamic worker provisioning

When Dynamic Worker Pools are available, workers can be spun up in response to demand. An estimated execution latency is calculated for pending tasks, and if it exceeds the configured target_latency_seconds limit, then additional dynamic workers will be provisioned.

Once the queue is exhausted and dynamic workers have been sitting idle for max_idle_seconds, they will be terminated.

See Add a new Cloud Worker Pool for detailed instructions on how to configure dynamic workers.