06 - Workqueues

In this lab you will learn how to defer work from one execution context to another using the kernel's workqueue infrastructure. Workqueues are the standard mechanism a driver uses when it needs to run code that cannot run where the event arrived — typically because the current context cannot sleep (an interrupt handler, a softirq, a spinlock-held section) but the work it must do can: allocating memory with GFP_KERNEL, taking a mutex, issuing a blocking I/O request, or waiting for a completion. The Rust abstraction is in the kernel::workqueue module and wraps the C struct workqueue_struct and struct work_struct APIs into a small set of safe, pin-aware Rust types.

By the end of the lab you will have extended the misc device from labs 03 and 04 with a button-style ioctl that schedules deferred work on a system workqueue, and you will have explored the various ways the kernel lets you compose multiple work items, delay them, and share state with the enqueueing context.

Objectives

Understand what kernel workqueues are, when to use them, and which system queues are available out of the box.
Embed a Work field in a driver struct and connect it to a callback through the WorkItem trait, the impl_has_work! macro, and the new_work! initializer macro.
Use Queue::enqueue on a system workqueue and reason about what happens when the same item is enqueued more than once.
Defer work with DelayedWork and Queue::enqueue_delayed, and host multiple independent work items in the same struct through the const ID generic.

What workqueues are

A workqueue is a thread pool managed by the kernel. You submit a small descriptor — a struct work_struct — that points at a callback. A pool of kernel threads (the "workers") picks descriptors off the queue and runs the callbacks one at a time. Because the workers are full kernel threads, the callbacks run in process context and may sleep, allocate with GFP_KERNEL, take mutexes, and call any other API that requires a sleepable context.

Workqueues are the canonical way to push deferred work out of contexts where sleeping is forbidden. A common pattern is:

An interrupt handler runs (atomic, cannot sleep).
It acknowledges the hardware and enqueues a work_struct onto a workqueue.
The IRQ handler returns; a worker thread picks the descriptor up some time later and does the heavy lifting (parsing the device's DMA buffer, signalling a userspace waiter, etc.) in process context.

The same pattern applies whenever the originating context is constrained: a softirq, a spinlock-protected critical section, a timer callback, or even another work item that wants to split itself.

The reference for the underlying C subsystem is Concurrency Managed Workqueue (cmwq). The Rust bindings are intentionally narrow: they expose what you need to enqueue work and nothing more. There is at present no Rust API for flush_work, cancel_work_sync, drain_workqueue, or for allocating private workqueues; if you need those you must either work around it (by polling a shared flag from inside run) or drop into raw bindings:: calls. This is worth keeping in mind as you design your drivers.

The system workqueues

The kernel pre-creates several global workqueues at boot. The Rust bindings expose them as free functions returning &'static Queue:

Function	Backing C queue	When to use
`workqueue::system()`	`system_wq`	The default. Short, latency-tolerant items. Backs `schedule_work()` in C.
`workqueue::system_highpri()`	`system_highpri_wq`	Like `system()` but workers run at higher scheduling priority.
`workqueue::system_long()`	`system_long_wq`	Items that may take a long time. Flushing this queue may take a while.
`workqueue::system_unbound()`	`system_unbound_wq`	Workers not bound to any CPU; not concurrency-managed. Use when items are CPU-intensive and locality does not matter.
`workqueue::system_freezable()`	`system_freezable_wq`	Drained during system suspend; new submissions wait for thaw. Use for work that must pause across suspend.
`workqueue::system_power_efficient()`	`system_power_efficient_wq`	Becomes unbound when the kernel command line sets `workqueue.power_efficient`; otherwise like `system()`.
`workqueue::system_bh()`	`system_bh_wq`	Runs in softirq (BH) context. Cannot sleep — no `GFP_KERNEL`, no mutexes.

For everything you will write in this lab, prefer workqueue::system(). Reach for one of the specialised queues only when you have a concrete reason.

note

The Rust bindings do not currently provide a safe equivalent of alloc_workqueue(). You cannot create a private workqueue from Rust without falling back to raw bindings. In practice the system queues cover almost every driver's needs.

Defining a work item

A work item is a struct that owns a Work field, implements WorkItem, and is reachable through a smart pointer that the kernel can store while the item is queued. Four pieces have to fit together:

The struct itself, marked with #[pin_data] because Work must not move once it has been initialised in place.
A #[pin]-annotated Work<Self> field embedded inside the struct.
An impl_has_work! invocation that tells the runtime how to reach that field from a raw *mut Self (and back).
An impl WorkItem block that declares which smart pointer type owns the struct and defines the callback that the worker thread will eventually run.

The canonical example, taken from the module documentation:

use kernel::prelude::*;
use kernel::sync::Arc;
use kernel::workqueue::{self, impl_has_work, new_work, Work, WorkItem};

#[pin_data]
struct MyStruct {
    value: i32,
    #[pin]
    work: Work<MyStruct>,
}

impl_has_work! {
    impl HasWork<Self> for MyStruct { self.work }
}

impl MyStruct {
    fn new(value: i32) -> Result<Arc<Self>> {
        Arc::pin_init(
            pin_init!(MyStruct {
                value,
                work <- new_work!("MyStruct::work"),
            }),
            GFP_KERNEL,
        )
    }
}

impl WorkItem for MyStruct {
    type Pointer = Arc<MyStruct>;

    fn run(this: Arc<MyStruct>) {
        pr_info!("the value is: {}\n", this.value);
    }
}

fn print_later(val: Arc<MyStruct>) {
    let _ = workqueue::system().enqueue(val);
}

The shape is small but every line carries weight. Read it carefully, then keep reading.

The `Work<T, ID>` field

Work<T, ID> is a #[repr(transparent)] wrapper around the kernel's struct work_struct. It is Send + Sync unconditionally, but it is not movable once initialised — every embedded Work must live behind a pin. That is why the surrounding struct is #[pin_data] and the field is #[pin]-annotated.

The type parameter T is the struct that owns the Work. The const generic ID (default 0) lets a single struct embed multiple independent Work fields:

#[pin_data]
struct TwoJobs {
    #[pin]
    short: Work<TwoJobs, 1>,
    #[pin]
    long:  Work<TwoJobs, 2>,
}

Each field is wired up with its own impl_has_work! and its own WorkItem<ID> impl. The ID is purely a compile-time tag; it never appears in the final binary. Any two distinct u64 values are fine.

The `impl_has_work!` macro

impl_has_work! produces an unsafe impl of the HasWork trait that teaches the runtime two things: how to compute *mut Work<T, ID> given *mut Self, and how to recover *mut Self given a pointer to the embedded Work. The second direction is what lets the C-side callback locate your struct via container_of when the worker fires.

The macro accepts several forms:

// Simplest case — one Work field at the default ID 0.
impl_has_work! {
    impl HasWork<Self> for MyStruct { self.work }
}

// Multiple work fields with explicit IDs.
impl_has_work! {
    impl HasWork<Self, 1> for TwoJobs { self.short }
    impl HasWork<Self, 2> for TwoJobs { self.long  }
}

// With generics on the struct.
impl_has_work! {
    impl{T} HasWork<Generic<T>, 0> for Generic<T> { self.work }
}

The macro will only compile if the named field actually has type Work<T, ID>. That static check is what justifies the otherwise-unsafe impl.

The `WorkItem` trait

WorkItem<ID> has two items:

pub trait WorkItem<const ID: u64 = 0> {
    type Pointer: WorkItemPointer<ID>;
    fn run(this: Self::Pointer);
}

type Pointer is the smart pointer the kernel will hold onto while the work is queued. Three concrete pointer types are blanket-implemented by the bindings: Arc<Self>, Pin<KBox<Self>>, and ARef<Self>. In practice you pick one based on whether the work item shares state with other code:

Arc<Self> — the enqueueing side can keep its own clone of the Arc, so the work item can read and update shared state visible to the rest of the driver. This is the common choice for device-bound work.
Pin<KBox<Self>> — the work item owns its allocation exclusively. Enqueue consumes the box; nothing else holds a reference. Use this for one-shot deferred actions whose state is private to the callback.
ARef<T> — for types that carry their own refcount (e.g. some device-derived structures). Rare in driver-level code.

fn run(this: Self::Pointer) takes the smart pointer by value. When run returns the pointer is dropped: an Arc releases one reference, a KBox is freed. If your driver needs the work item to stay alive after run (for example, to re-enqueue itself) it must keep an extra Arc somewhere outside the work.

The `new_work!` initializer

new_work! returns an impl PinInit<Work<T, ID>> suitable for use on the right-hand side of <- inside a pin_init! or try_pin_init! block. The macro takes an optional string literal that becomes the lockdep name for the work item — used in error reports and /sys/kernel/debug/lockdep output:

work <- new_work!("MyStruct::work"),

Pass a descriptive name. If you omit the argument (new_work!()) the macro uses a file:line auto-generated name, which is harder to identify in lock traces but otherwise equivalent.

Constructing the owner

For the Arc<Self> case, the construction idiom is Arc::pin_init:

Arc::pin_init(
    pin_init!(MyStruct {
        value,
        work <- new_work!("MyStruct::work"),
    }),
    GFP_KERNEL,
)

For Pin<KBox<Self>> use KBox::pin_init:

KBox::pin_init(
    pin_init!(MyStruct {
        value,
        work <- new_work!("MyStruct::work"),
    }),
    GFP_KERNEL,
)

The two helpers run the same in-place initialiser; they differ only in the allocator used and in whether the result can be cheaply cloned (Arc) or not (KBox).

Enqueueing

Once you have a smart pointer to a properly initialised struct, submit it with Queue::enqueue:

let _ = workqueue::system().enqueue(arc_val);

The signature is:

pub fn enqueue<W, const ID: u64>(&self, w: W) -> W::EnqueueOutput
where
    W: RawWorkItem<ID> + Send + 'static,

A few things follow from it:

The pointer is consumed. The kernel needs to keep it alive until the worker has run, so it leaks the underlying allocation into the C side. For Arc<T> you can still keep an Arc of your own by .clone()-ing before the call; for KBox you cannot, by design.
The associated EnqueueOutput is Result<(), Self> for Arc<T> and ARef<T>, and () for Pin<KBox<T>>. With Arc, an Err(self) means "this work was already enqueued somewhere and your submission was a no-op"; you get your Arc back. With KBox, the ownership model statically rules out that case.
W: Send + 'static is enforced at the call site. This typically means the inner T must be Send + Sync. The Work field itself is unconditionally so; restrictions usually come from the surrounding state.

What happens if you enqueue twice

Every work_struct carries a "currently queued" flag. Submitting a work_struct that is already pending is a no-op at the C level — it is not an error. The Rust API surfaces this:

let arc = MyStruct::new(42)?;
let again = arc.clone();

match workqueue::system().enqueue(arc) {
    Ok(())   => pr_info!("queued\n"),
    Err(_)   => pr_info!("already queued — dropped this submission\n"),
}

// This is a no-op until the first run completes. The Arc comes back in Err.
let _ = workqueue::system().enqueue(again);

After the worker has run, the same item can be enqueued again from inside run or from anywhere else. This is how self-rescheduling work items are written.

The closure shortcut: `try_spawn`

For the trivial case of "run this closure on the system workqueue, I don't need a struct", the bindings provide Queue::try_spawn:

workqueue::system().try_spawn(GFP_KERNEL, || {
    pr_info!("hello from a worker thread\n");
})?;

The closure must be FnOnce + Send + 'static. Internally try_spawn allocates a KBox that holds the closure and a Work, then enqueues it. It returns Err(AllocError) if the allocation fails. Use it for fire-and-forget deferred work; for anything that needs to share state with the rest of the driver, define a proper struct.

Delayed work

DelayedWork is the analogue of Work for items that should run after a delay rather than as soon as possible. The structural differences are minor:

use kernel::workqueue::{impl_has_delayed_work, new_delayed_work, DelayedWork};

#[pin_data]
struct Blinker {
    led_on: bool,
    #[pin]
    work: DelayedWork<Blinker>,
}

impl_has_delayed_work! {
    impl HasDelayedWork<Self> for Blinker { self.work }
}

impl WorkItem for Blinker {
    type Pointer = Arc<Blinker>;
    fn run(this: Arc<Blinker>) {
        pr_info!("blink\n");
        // re-schedule one second in the future
        let _ = workqueue::system().enqueue_delayed(this, kernel::time::HZ);
    }
}

The macro impl_has_delayed_work! generates both a HasWork and a HasDelayedWork impl, so a delayed work item satisfies both halves of the API. Initialise the field with new_delayed_work!, which is to DelayedWork what new_work! is to Work.

Submit with Queue::enqueue_delayed:

pub fn enqueue_delayed<W, const ID: u64>(&self, w: W, delay: Jiffies) -> W::EnqueueOutput

The delay is a Jiffies value — convert from milliseconds with kernel::time::msecs_to_jiffies or use the kernel::time::HZ constant (one second's worth of jiffies) for whole-second delays. A delay of 0 is permitted and is equivalent to enqueue — useful for code paths that decide at runtime whether to delay.

You can also submit a DelayedWork-backed struct through the plain enqueue API; the result is the same as enqueue_delayed with a delay of zero.

What is missing

The Rust bindings deliberately keep the workqueue surface small. The following operations exist in the C API but are not yet exposed to Rust:

flush_work / flush_workqueue / drain_workqueue — wait for outstanding items to finish. There is no safe way to call these from Rust today.
cancel_work_sync / cancel_delayed_work[_sync] — remove a not-yet-running item from the queue or wait for a running one to finish.
alloc_workqueue — create a private workqueue with custom flags. Only the system queues are reachable from Rust.

The practical consequence is that when your module is being unloaded you must arrange for in-flight work to finish on its own. The simplest pattern is to keep an Arc<MyStruct> inside your module struct, set an AtomicBool "stop" flag on drop, and have run check the flag and exit early if set. Because enqueue consumes a strong reference, the work will eventually drain itself once you stop enqueueing new items.

warning

Until cancellation is exposed, do not write work items whose run body holds resources (file handles, mappings, IRQs) that the rest of the module needs in order to unload. A work item that is currently running cannot be cancelled, and rmmod will simply wait for it to complete.

A worked example: a deferred-action ioctl

The exercises below build a driver that defers work in response to ioctl commands. The device skeleton is the same misc device from lab 03 — an Arc<Mutex<...>> shared between open instances — with an extra Work field that holds the deferred action.

The skeleton looks like this. The device struct is shared (one Arc<DeviceState> per misc device, cloned into each open file's private data) and contains the work item:

use kernel::{
    alloc::KBox,
    c_str,
    fs::File,
    miscdevice::{MiscDevice, MiscDeviceOptions, MiscDeviceRegistration},
    prelude::*,
    sync::{Arc, Mutex, new_mutex},
    workqueue::{self, impl_has_work, new_work, Work, WorkItem},
};

#[pin_data]
struct DeviceState {
    #[pin]
    counter: Mutex<u32>,
    #[pin]
    work: Work<DeviceState>,
}

impl_has_work! {
    impl HasWork<Self> for DeviceState { self.work }
}

impl WorkItem for DeviceState {
    type Pointer = Arc<DeviceState>;

    fn run(this: Arc<DeviceState>) {
        let mut counter = this.counter.lock();
        *counter += 1;
        pr_info!("worker tick: counter = {}\n", *counter);
    }
}

impl DeviceState {
    fn new() -> Result<Arc<Self>> {
        Arc::pin_init(
            pin_init!(DeviceState {
                counter <- new_mutex!(0u32),
                work    <- new_work!("DeviceState::work"),
            }),
            GFP_KERNEL,
        )
    }
}

struct Instance {
    state: Arc<DeviceState>,
}

#[vtable]
impl MiscDevice for Instance {
    type Ptr = KBox<Self>;

    fn open(_file: &File, _reg: &MiscDeviceRegistration<Self>) -> Result<KBox<Self>> {
        let state = DeviceState::new()?;
        Ok(KBox::new(Instance { state }, GFP_KERNEL)?)
    }

    fn ioctl(
        device: <KBox<Self> as kernel::types::ForeignOwnable>::Borrowed<'_>,
        _file: &File,
        cmd: u32,
        _arg: usize,
    ) -> Result<isize> {
        match cmd {
            KICK => {
                let _ = workqueue::system().enqueue(device.state.clone());
                Ok(0)
            }
            _ => Err(ENOTTY),
        }
    }
}

Notice that the Arc<DeviceState> is held by every open file and leaked into the kernel each time work is enqueued. The Arc keeps the allocation alive across all those owners; when the last file is closed and any pending work has finished, the allocation is freed.

Exercises

The exercises start by adding a workqueue to the misc device skeleton from lab 03 and progressively extend it. All exercises can be tested with the ioctl userspace utility introduced in the previous labs.

Exercise 1 — Schedule a print from an ioctl

Define an argumentless ioctl KICK with magic byte 'J' and number 10 using _IO. When userspace calls it, your driver should enqueue a Work item that prints a message via pr_info!.

Verify:

ioctl /dev/mydevice -t J -n 10 -d none
dmesg | tail
# should show your "worker ran" message

Hint

Embed a Work<DeviceState> field in your shared device state and wire it up with impl_has_work!. Implement WorkItem so that run calls pr_info!. In the ioctl handler, clone the Arc<DeviceState> and pass it to workqueue::system().enqueue(...).

Exercise 2 — Observe the "already queued" behaviour

Issue the KICK ioctl twice in quick succession. Inspect the return value of workqueue::system().enqueue(...) and print whether the submission was accepted or discarded.

ioctl /dev/mydevice -t J -n 10 -d none
ioctl /dev/mydevice -t J -n 10 -d none
dmesg | tail

Explain why the second submission may be silently dropped. What does the worker thread see if you keep firing the ioctl while it is running?

Hint

enqueue for an Arc<T> returns Result<(), Arc<T>>. Err means the work was already pending. Log both arms with pr_info! to make the behaviour visible.

Exercise 3 — Pass a value into the worker

Add a Mutex<u32> to the device state. Define a second ioctl KICK_WITH_VALUE using _IOW::<u32> with sequence number 11. The handler should read a u32 from userspace, store it under the mutex, and enqueue the work item. Inside run, lock the mutex and print the stored value.

ioctl /dev/mydevice -t J -n 11 -d write -s 4 -v 42
dmesg | tail
# should show: worker received: 42

Hint

Use the same UserSlice pattern from lab 03 to copy the u32 out of userspace. Storing it in a Mutex<u32> inside Arc<DeviceState> means both the ioctl and the worker can reach it.

Exercise 4 — Self-rescheduling work

Modify run so that, after printing, it re-enqueues the same Arc<DeviceState> onto workqueue::system(). Submit the work once from userspace and observe what happens.

ioctl /dev/mydevice -t J -n 10 -d none
dmesg -w

Add a counter that limits the number of re-enqueues. After the counter reaches some threshold, stop re-enqueueing and observe that the kernel log stops growing.

warning

Without the counter, the worker will run forever — the system queue will be busy indefinitely. Make sure the limit is in place before you load the module.

Hint

Re-enqueueing from inside run is fine: by the time run is invoked, the work_struct is no longer marked pending, so the next enqueue will succeed.

To re-enqueue, call let _ = workqueue::system().enqueue(this.clone()); from inside run. Use an AtomicUsize field for the counter so the check does not need the mutex.

Exercise 5 — Two workers in one struct

Add a second Work field to your device state, this time with ID = 1:

#[pin_data]
struct DeviceState {
    // ...
    #[pin]
    quick: Work<DeviceState, 0>,
    #[pin]
    slow:  Work<DeviceState, 1>,
}

Implement WorkItem<0> and WorkItem<1> independently — the first should print "quick" and the second should print "slow" after sleeping for a few jiffies (use kernel::time::Delta or a simple busy-loop for the lab). Define two ioctls (KICK_QUICK and KICK_SLOW) that enqueue each.

When you submit both ioctls quickly, do the workers run in order? Concurrently? Explain what you observe.

Hint

You will need two impl_has_work! invocations (one per ID) and two WorkItem impls. When enqueueing, the inferred ID is 0; for the second worker write workqueue::system().enqueue::<Arc<DeviceState>, 1>(state.clone()) to disambiguate.

Exercise 6 — Delayed work

Replace the Work field used by KICK with a DelayedWork. Use impl_has_delayed_work! to wire it up and new_delayed_work! to initialise it. Schedule it with enqueue_delayed and a delay of two seconds:

let _ = workqueue::system().enqueue_delayed(
    state.clone(),
    2 * kernel::time::HZ,
);

Verify with a wall-clock observation that the message appears two seconds after the ioctl returns.

Hint

kernel::time::HZ is the number of jiffies in one second. Multiply for whole-second delays. For sub-second delays, convert from milliseconds with msecs_to_jiffies.

Exercise 7 — `try_spawn` for ad-hoc work

Use Queue::try_spawn to enqueue a closure that does not require a dedicated struct. The closure should capture a counter value (passed in by ioctl arg) and print it from the worker thread.

let val = arg as u32;
workqueue::system().try_spawn(GFP_KERNEL, move || {
    pr_info!("spawned worker saw {}\n", val);
})?;

When would you prefer try_spawn over the struct-based approach? When would you not?

Hint

try_spawn allocates a KBox for the closure and consumes it. It is convenient for one-shot fire-and-forget work, but every submission is a fresh allocation and there is no way to share state between successive spawns. The struct-based approach lets you keep shared mutable state behind an Arc<Mutex<...>>.

Exercise 8 — Graceful shutdown

Because the Rust bindings do not expose cancel_work_sync or flush_work, you must arrange your code so that in-flight work items finish on their own and stop being re-enqueued before your module unloads.

Add an AtomicBool stopping flag to your device state, defaulted to false. In the self-rescheduling work item from exercise 4, check the flag at the top of run and return early if it is set. In your module's PinnedDrop impl, set the flag to true before letting the misc registration drop.

What scenarios does this design not cover? (For example: a long-running work item that is currently inside run when the flag is set will still complete its current pass.)

Hint

Setting an AtomicBool with Release ordering from drop and reading it with Acquire from run is sufficient. The Arc<DeviceState> strong-count goes to zero only after every queued submission has either run-and-returned or never made it onto the queue, so once you stop enqueueing the allocation drains itself naturally.

Exercise 9 — Pick the right queue

Read the per-queue documentation linked from kernel::workqueue and decide which system queue is appropriate for each of the following hypothetical workloads:

A handler that processes a 50 MB DMA buffer and may take several hundred milliseconds.
A heartbeat that prints a single line every 10 seconds and must continue across suspend/resume.
A latency-sensitive notification that should run as soon as possible after a high-priority interrupt.
A long-running compression task that should not steal CPU from latency-sensitive work.

Justify each choice in one sentence.

Exercise 10 — Verify with `dmesg` timestamps

Configure dmesg to show timestamps:

dmesg -T -w

Run a sequence of KICK, KICK_SLOW, and KICK_DELAYED ioctls and confirm from the timestamps that:

Work items run on a worker thread, not in the calling process's context — the message timestamp may be slightly after the ioctl returns.
Delayed work runs after the requested delay, not before.
Self-rescheduling work, once stopped via the stopping flag, prints exactly one final message and then nothing more.

Objectives​

What workqueues are​

The system workqueues​

Defining a work item​

The Work<T, ID> field​

The impl_has_work! macro​

The WorkItem trait​

The new_work! initializer​

Constructing the owner​

Enqueueing​

What happens if you enqueue twice​

The closure shortcut: try_spawn​

Delayed work​

What is missing​

A worked example: a deferred-action ioctl​

Exercises​

Exercise 1 — Schedule a print from an ioctl​

Exercise 2 — Observe the "already queued" behaviour​

Exercise 3 — Pass a value into the worker​

Exercise 4 — Self-rescheduling work​

Exercise 5 — Two workers in one struct​

Exercise 6 — Delayed work​

Exercise 7 — try_spawn for ad-hoc work​

Exercise 8 — Graceful shutdown​

Exercise 9 — Pick the right queue​

Exercise 10 — Verify with dmesg timestamps​

Objectives

What workqueues are

The system workqueues

Defining a work item

The `Work<T, ID>` field

The `impl_has_work!` macro

The `WorkItem` trait

The `new_work!` initializer

Constructing the owner

Enqueueing

What happens if you enqueue twice

The closure shortcut: `try_spawn`

Delayed work

What is missing

A worked example: a deferred-action ioctl

Exercises

Exercise 1 — Schedule a print from an ioctl

Exercise 2 — Observe the "already queued" behaviour

Exercise 3 — Pass a value into the worker

Exercise 4 — Self-rescheduling work

Exercise 5 — Two workers in one struct

Exercise 6 — Delayed work

Exercise 7 — `try_spawn` for ad-hoc work

Exercise 8 — Graceful shutdown

Exercise 9 — Pick the right queue

Exercise 10 — Verify with `dmesg` timestamps