06 - Workqueues
In this lab you will learn how to defer work from one execution context to another using the
kernel's workqueue infrastructure. Workqueues are the standard mechanism a driver uses when
it needs to run code that cannot run where the event arrived — typically because the current
context cannot sleep (an interrupt handler, a softirq, a spinlock-held section) but the work
it must do can: allocating memory with GFP_KERNEL, taking a mutex, issuing a blocking I/O
request, or waiting for a completion. The Rust abstraction is in the
kernel::workqueue module and
wraps the C struct workqueue_struct and struct work_struct APIs into a small set of safe,
pin-aware Rust types.
By the end of the lab you will have extended the misc device from labs 03 and 04 with a button-style ioctl that schedules deferred work on a system workqueue, and you will have explored the various ways the kernel lets you compose multiple work items, delay them, and share state with the enqueueing context.
Objectives
- Understand what kernel workqueues are, when to use them, and which system queues are available out of the box.
- Embed a
Workfield in a driver struct and connect it to a callback through theWorkItemtrait, theimpl_has_work!macro, and thenew_work!initializer macro. - Use
Queue::enqueueon a system workqueue and reason about what happens when the same item is enqueued more than once. - Defer work with
DelayedWorkandQueue::enqueue_delayed, and host multiple independent work items in the same struct through theconst IDgeneric.
What workqueues are
A workqueue is a thread pool managed by the kernel. You submit a small descriptor — a
struct work_struct — that points at a callback. A pool of kernel threads (the "workers")
picks descriptors off the queue and runs the callbacks one at a time. Because the workers
are full kernel threads, the callbacks run in process context and may sleep, allocate
with GFP_KERNEL, take mutexes, and call any other API that requires a sleepable context.
Workqueues are the canonical way to push deferred work out of contexts where sleeping is forbidden. A common pattern is:
- An interrupt handler runs (atomic, cannot sleep).
- It acknowledges the hardware and enqueues a
work_structonto a workqueue. - The IRQ handler returns; a worker thread picks the descriptor up some time later and does the heavy lifting (parsing the device's DMA buffer, signalling a userspace waiter, etc.) in process context.
The same pattern applies whenever the originating context is constrained: a softirq, a spinlock-protected critical section, a timer callback, or even another work item that wants to split itself.
The reference for the underlying C subsystem is
Concurrency Managed Workqueue (cmwq).
The Rust bindings are intentionally narrow: they expose what you need to enqueue work and
nothing more. There is at present no Rust API for flush_work, cancel_work_sync,
drain_workqueue, or for allocating private workqueues; if you need those you must either
work around it (by polling a shared flag from inside run) or drop into raw bindings::
calls. This is worth keeping in mind as you design your drivers.
The system workqueues
The kernel pre-creates several global workqueues at boot. The Rust bindings expose them as
free functions returning &'static Queue:
| Function | Backing C queue | When to use |
|---|---|---|
workqueue::system() | system_wq | The default. Short, latency-tolerant items. Backs schedule_work() in C. |
workqueue::system_highpri() | system_highpri_wq | Like system() but workers run at higher scheduling priority. |
workqueue::system_long() | system_long_wq | Items that may take a long time. Flushing this queue may take a while. |
workqueue::system_unbound() | system_unbound_wq | Workers not bound to any CPU; not concurrency-managed. Use when items are CPU-intensive and locality does not matter. |
workqueue::system_freezable() | system_freezable_wq | Drained during system suspend; new submissions wait for thaw. Use for work that must pause across suspend. |
workqueue::system_power_efficient() | system_power_efficient_wq | Becomes unbound when the kernel command line sets workqueue.power_efficient; otherwise like system(). |
workqueue::system_bh() | system_bh_wq | Runs in softirq (BH) context. Cannot sleep — no GFP_KERNEL, no mutexes. |
For everything you will write in this lab, prefer workqueue::system(). Reach for one of
the specialised queues only when you have a concrete reason.
The Rust bindings do not currently provide a safe equivalent of alloc_workqueue(). You
cannot create a private workqueue from Rust without falling back to raw bindings. In
practice the system queues cover almost every driver's needs.
Defining a work item
A work item is a struct that owns a Work
field, implements WorkItem,
and is reachable through a smart pointer that the kernel can store while the item is queued.
Four pieces have to fit together:
- The struct itself, marked with
#[pin_data]becauseWorkmust not move once it has been initialised in place. - A
#[pin]-annotatedWork<Self>field embedded inside the struct. - An
impl_has_work!invocation that tells the runtime how to reach that field from a raw*mut Self(and back). - An
impl WorkItemblock that declares which smart pointer type owns the struct and defines the callback that the worker thread will eventually run.
The canonical example, taken from the module documentation:
use kernel::prelude::*;
use kernel::sync::Arc;
use kernel::workqueue::{self, impl_has_work, new_work, Work, WorkItem};
#[pin_data]
struct MyStruct {
value: i32,
#[pin]
work: Work<MyStruct>,
}
impl_has_work! {
impl HasWork<Self> for MyStruct { self.work }
}
impl MyStruct {
fn new(value: i32) -> Result<Arc<Self>> {
Arc::pin_init(
pin_init!(MyStruct {
value,
work <- new_work!("MyStruct::work"),
}),
GFP_KERNEL,
)
}
}
impl WorkItem for MyStruct {
type Pointer = Arc<MyStruct>;
fn run(this: Arc<MyStruct>) {
pr_info!("the value is: {}\n", this.value);
}
}
fn print_later(val: Arc<MyStruct>) {
let _ = workqueue::system().enqueue(val);
}
The shape is small but every line carries weight. Read it carefully, then keep reading.
The Work<T, ID> field
Work<T, ID> is a
#[repr(transparent)] wrapper around the kernel's struct work_struct. It is Send + Sync
unconditionally, but it is not movable once initialised — every embedded Work must live
behind a pin. That is why the surrounding struct is #[pin_data] and the field is
#[pin]-annotated.
The type parameter T is the struct that owns the Work. The const generic ID (default
0) lets a single struct embed multiple independent Work fields:
#[pin_data]
struct TwoJobs {
#[pin]
short: Work<TwoJobs, 1>,
#[pin]
long: Work<TwoJobs, 2>,
}
Each field is wired up with its own impl_has_work! and its own WorkItem<ID> impl. The
ID is purely a compile-time tag; it never appears in the final binary. Any two distinct
u64 values are fine.
The impl_has_work! macro
impl_has_work!
produces an unsafe impl of the
HasWork trait that
teaches the runtime two things: how to compute *mut Work<T, ID> given *mut Self, and
how to recover *mut Self given a pointer to the embedded Work. The second direction is
what lets the C-side callback locate your struct via container_of when the worker fires.
The macro accepts several forms:
// Simplest case — one Work field at the default ID 0.
impl_has_work! {
impl HasWork<Self> for MyStruct { self.work }
}
// Multiple work fields with explicit IDs.
impl_has_work! {
impl HasWork<Self, 1> for TwoJobs { self.short }
impl HasWork<Self, 2> for TwoJobs { self.long }
}
// With generics on the struct.
impl_has_work! {
impl{T} HasWork<Generic<T>, 0> for Generic<T> { self.work }
}
The macro will only compile if the named field actually has type Work<T, ID>. That static
check is what justifies the otherwise-unsafe impl.
The WorkItem trait
WorkItem<ID> has two
items:
pub trait WorkItem<const ID: u64 = 0> {
type Pointer: WorkItemPointer<ID>;
fn run(this: Self::Pointer);
}
type Pointer is the smart pointer the kernel will hold onto while the work is queued.
Three concrete pointer types are blanket-implemented by the bindings: Arc<Self>,
Pin<KBox<Self>>, and ARef<Self>. In practice you pick one based on whether the work
item shares state with other code:
Arc<Self>— the enqueueing side can keep its own clone of theArc, so the work item can read and update shared state visible to the rest of the driver. This is the common choice for device-bound work.Pin<KBox<Self>>— the work item owns its allocation exclusively. Enqueue consumes the box; nothing else holds a reference. Use this for one-shot deferred actions whose state is private to the callback.ARef<T>— for types that carry their own refcount (e.g. somedevice-derived structures). Rare in driver-level code.
fn run(this: Self::Pointer) takes the smart pointer by value. When run returns the
pointer is dropped: an Arc releases one reference, a KBox is freed. If your driver
needs the work item to stay alive after run (for example, to re-enqueue itself) it must
keep an extra Arc somewhere outside the work.
The new_work! initializer
new_work! returns
an impl PinInit<Work<T, ID>> suitable for use on the right-hand side of <- inside a
pin_init! or try_pin_init! block. The macro takes an optional string literal that
becomes the lockdep name for the work item — used in error reports and
/sys/kernel/debug/lockdep output:
work <- new_work!("MyStruct::work"),
Pass a descriptive name. If you omit the argument (new_work!()) the macro uses a
file:line auto-generated name, which is harder to identify in lock traces but otherwise
equivalent.
Constructing the owner
For the Arc<Self> case, the construction idiom is
Arc::pin_init:
Arc::pin_init(
pin_init!(MyStruct {
value,
work <- new_work!("MyStruct::work"),
}),
GFP_KERNEL,
)
For Pin<KBox<Self>> use
KBox::pin_init:
KBox::pin_init(
pin_init!(MyStruct {
value,
work <- new_work!("MyStruct::work"),
}),
GFP_KERNEL,
)
The two helpers run the same in-place initialiser; they differ only in the allocator used
and in whether the result can be cheaply cloned (Arc) or not (KBox).
Enqueueing
Once you have a smart pointer to a properly initialised struct, submit it with
Queue::enqueue:
let _ = workqueue::system().enqueue(arc_val);
The signature is:
pub fn enqueue<W, const ID: u64>(&self, w: W) -> W::EnqueueOutput
where
W: RawWorkItem<ID> + Send + 'static,
A few things follow from it:
- The pointer is consumed. The kernel needs to keep it alive until the worker has run,
so it leaks the underlying allocation into the C side. For
Arc<T>you can still keep anArcof your own by.clone()-ing before the call; forKBoxyou cannot, by design. - The associated
EnqueueOutputisResult<(), Self>forArc<T>andARef<T>, and()forPin<KBox<T>>. WithArc, anErr(self)means "this work was already enqueued somewhere and your submission was a no-op"; you get yourArcback. WithKBox, the ownership model statically rules out that case. W: Send + 'staticis enforced at the call site. This typically means the innerTmust beSend + Sync. TheWorkfield itself is unconditionally so; restrictions usually come from the surrounding state.
What happens if you enqueue twice
Every work_struct carries a "currently queued" flag. Submitting a work_struct that is
already pending is a no-op at the C level — it is not an error. The Rust API surfaces this:
let arc = MyStruct::new(42)?;
let again = arc.clone();
match workqueue::system().enqueue(arc) {
Ok(()) => pr_info!("queued\n"),
Err(_) => pr_info!("already queued — dropped this submission\n"),
}
// This is a no-op until the first run completes. The Arc comes back in Err.
let _ = workqueue::system().enqueue(again);
After the worker has run, the same item can be enqueued again from inside run or from
anywhere else. This is how self-rescheduling work items are written.
The closure shortcut: try_spawn
For the trivial case of "run this closure on the system workqueue, I don't need a struct",
the bindings provide
Queue::try_spawn:
workqueue::system().try_spawn(GFP_KERNEL, || {
pr_info!("hello from a worker thread\n");
})?;
The closure must be FnOnce + Send + 'static. Internally try_spawn allocates a KBox
that holds the closure and a Work, then enqueues it. It returns Err(AllocError) if the
allocation fails. Use it for fire-and-forget deferred work; for anything that needs to
share state with the rest of the driver, define a proper struct.
Delayed work
DelayedWork is
the analogue of Work for items that should run after a delay rather than as soon as
possible. The structural differences are minor:
use kernel::workqueue::{impl_has_delayed_work, new_delayed_work, DelayedWork};
#[pin_data]
struct Blinker {
led_on: bool,
#[pin]
work: DelayedWork<Blinker>,
}
impl_has_delayed_work! {
impl HasDelayedWork<Self> for Blinker { self.work }
}
impl WorkItem for Blinker {
type Pointer = Arc<Blinker>;
fn run(this: Arc<Blinker>) {
pr_info!("blink\n");
// re-schedule one second in the future
let _ = workqueue::system().enqueue_delayed(this, kernel::time::HZ);
}
}
The macro impl_has_delayed_work!
generates both a HasWork and a HasDelayedWork impl, so a delayed work item satisfies
both halves of the API. Initialise the field with
new_delayed_work!,
which is to DelayedWork what new_work! is to Work.
Submit with
Queue::enqueue_delayed:
pub fn enqueue_delayed<W, const ID: u64>(&self, w: W, delay: Jiffies) -> W::EnqueueOutput
The delay is a Jiffies
value — convert from milliseconds with
kernel::time::msecs_to_jiffies
or use the kernel::time::HZ constant (one second's worth of jiffies) for whole-second
delays. A delay of 0 is permitted and is equivalent to enqueue — useful for code paths
that decide at runtime whether to delay.
You can also submit a DelayedWork-backed struct through the plain enqueue API; the
result is the same as enqueue_delayed with a delay of zero.
What is missing
The Rust bindings deliberately keep the workqueue surface small. The following operations exist in the C API but are not yet exposed to Rust:
flush_work/flush_workqueue/drain_workqueue— wait for outstanding items to finish. There is no safe way to call these from Rust today.cancel_work_sync/cancel_delayed_work[_sync]— remove a not-yet-running item from the queue or wait for a running one to finish.alloc_workqueue— create a private workqueue with custom flags. Only the system queues are reachable from Rust.
The practical consequence is that when your module is being unloaded you must arrange for
in-flight work to finish on its own. The simplest pattern is to keep an Arc<MyStruct>
inside your module struct, set an AtomicBool "stop" flag on drop, and have run check
the flag and exit early if set. Because enqueue consumes a strong reference, the work
will eventually drain itself once you stop enqueueing new items.
Until cancellation is exposed, do not write work items whose run body holds resources
(file handles, mappings, IRQs) that the rest of the module needs in order to unload. A
work item that is currently running cannot be cancelled, and rmmod will simply wait for
it to complete.
A worked example: a deferred-action ioctl
The exercises below build a driver that defers work in response to ioctl commands. The
device skeleton is the same misc device from lab 03 — an Arc<Mutex<...>> shared between
open instances — with an extra Work field that holds the deferred action.
The skeleton looks like this. The device struct is shared (one Arc<DeviceState> per
misc device, cloned into each open file's private data) and contains the work item:
use kernel::{
alloc::KBox,
c_str,
fs::File,
miscdevice::{MiscDevice, MiscDeviceOptions, MiscDeviceRegistration},
prelude::*,
sync::{Arc, Mutex, new_mutex},
workqueue::{self, impl_has_work, new_work, Work, WorkItem},
};
#[pin_data]
struct DeviceState {
#[pin]
counter: Mutex<u32>,
#[pin]
work: Work<DeviceState>,
}
impl_has_work! {
impl HasWork<Self> for DeviceState { self.work }
}
impl WorkItem for DeviceState {
type Pointer = Arc<DeviceState>;
fn run(this: Arc<DeviceState>) {
let mut counter = this.counter.lock();
*counter += 1;
pr_info!("worker tick: counter = {}\n", *counter);
}
}
impl DeviceState {
fn new() -> Result<Arc<Self>> {
Arc::pin_init(
pin_init!(DeviceState {
counter <- new_mutex!(0u32),
work <- new_work!("DeviceState::work"),
}),
GFP_KERNEL,
)
}
}
struct Instance {
state: Arc<DeviceState>,
}
#[vtable]
impl MiscDevice for Instance {
type Ptr = KBox<Self>;
fn open(_file: &File, _reg: &MiscDeviceRegistration<Self>) -> Result<KBox<Self>> {
let state = DeviceState::new()?;
Ok(KBox::new(Instance { state }, GFP_KERNEL)?)
}
fn ioctl(
device: <KBox<Self> as kernel::types::ForeignOwnable>::Borrowed<'_>,
_file: &File,
cmd: u32,
_arg: usize,
) -> Result<isize> {
match cmd {
KICK => {
let _ = workqueue::system().enqueue(device.state.clone());
Ok(0)
}
_ => Err(ENOTTY),
}
}
}
Notice that the Arc<DeviceState> is held by every open file and leaked into the kernel
each time work is enqueued. The Arc keeps the allocation alive across all those owners;
when the last file is closed and any pending work has finished, the allocation is freed.
Exercises
The exercises start by adding a workqueue to the misc device skeleton from lab 03 and
progressively extend it. All exercises can be tested with the ioctl userspace utility
introduced in the previous labs.
Exercise 1 — Schedule a print from an ioctl
Define an argumentless ioctl KICK with magic byte 'J' and number 10 using
_IO. When userspace calls it,
your driver should enqueue a Work item that prints a message via
pr_info!.
Verify:
ioctl /dev/mydevice -t J -n 10 -d none
dmesg | tail
# should show your "worker ran" message
Hint
Embed a Work<DeviceState> field in your shared device state and wire it up with
impl_has_work!. Implement WorkItem so that run calls pr_info!. In the ioctl
handler, clone the Arc<DeviceState> and pass it to workqueue::system().enqueue(...).
Exercise 2 — Observe the "already queued" behaviour
Issue the KICK ioctl twice in quick succession. Inspect the return value of
workqueue::system().enqueue(...) and print whether the submission was accepted or
discarded.
ioctl /dev/mydevice -t J -n 10 -d none
ioctl /dev/mydevice -t J -n 10 -d none
dmesg | tail
Explain why the second submission may be silently dropped. What does the worker thread see if you keep firing the ioctl while it is running?
Hint
enqueue for an Arc<T> returns Result<(), Arc<T>>. Err means the work was already
pending. Log both arms with pr_info! to make the behaviour visible.
Exercise 3 — Pass a value into the worker
Add a Mutex<u32> to the device state. Define a second ioctl KICK_WITH_VALUE using
_IOW::<u32> with sequence
number 11. The handler should read a u32 from userspace, store it under the mutex,
and enqueue the work item. Inside run, lock the mutex and print the stored value.
ioctl /dev/mydevice -t J -n 11 -d write -s 4 -v 42
dmesg | tail
# should show: worker received: 42
Hint
Use the same UserSlice pattern from lab 03 to copy the u32 out of userspace. Storing
it in a Mutex<u32> inside Arc<DeviceState> means both the ioctl and the worker can
reach it.
Exercise 4 — Self-rescheduling work
Modify run so that, after printing, it re-enqueues the same Arc<DeviceState> onto
workqueue::system(). Submit the work once from userspace and observe what happens.
ioctl /dev/mydevice -t J -n 10 -d none
dmesg -w
Add a counter that limits the number of re-enqueues. After the counter reaches some threshold, stop re-enqueueing and observe that the kernel log stops growing.
Without the counter, the worker will run forever — the system queue will be busy indefinitely. Make sure the limit is in place before you load the module.
Hint
Re-enqueueing from inside run is fine: by the time run is invoked, the work_struct
is no longer marked pending, so the next enqueue will succeed.
To re-enqueue, call let _ = workqueue::system().enqueue(this.clone()); from inside
run. Use an AtomicUsize field for the counter so the check does not need the mutex.
Exercise 5 — Two workers in one struct
Add a second Work field to your device state, this time with ID = 1:
#[pin_data]
struct DeviceState {
// ...
#[pin]
quick: Work<DeviceState, 0>,
#[pin]
slow: Work<DeviceState, 1>,
}
Implement WorkItem<0> and WorkItem<1> independently — the first should print "quick"
and the second should print "slow" after sleeping for a few jiffies (use
kernel::time::Delta or a
simple busy-loop for the lab). Define two ioctls (KICK_QUICK and KICK_SLOW) that
enqueue each.
When you submit both ioctls quickly, do the workers run in order? Concurrently? Explain what you observe.
Hint
You will need two impl_has_work! invocations (one per ID) and two WorkItem impls. When
enqueueing, the inferred ID is 0; for the second worker write
workqueue::system().enqueue::<Arc<DeviceState>, 1>(state.clone()) to disambiguate.
Exercise 6 — Delayed work
Replace the Work field used by KICK with a
DelayedWork.
Use impl_has_delayed_work!
to wire it up and new_delayed_work!
to initialise it. Schedule it with
enqueue_delayed
and a delay of two seconds:
let _ = workqueue::system().enqueue_delayed(
state.clone(),
2 * kernel::time::HZ,
);
Verify with a wall-clock observation that the message appears two seconds after the ioctl returns.
Hint
kernel::time::HZ is the number of jiffies in one second. Multiply for whole-second
delays. For sub-second delays, convert from milliseconds with
msecs_to_jiffies.
Exercise 7 — try_spawn for ad-hoc work
Use Queue::try_spawn
to enqueue a closure that does not require a dedicated struct. The closure should capture
a counter value (passed in by ioctl arg) and print it from the worker thread.
let val = arg as u32;
workqueue::system().try_spawn(GFP_KERNEL, move || {
pr_info!("spawned worker saw {}\n", val);
})?;
When would you prefer try_spawn over the struct-based approach? When would you not?
Hint
try_spawn allocates a KBox for the closure and consumes it. It is convenient for
one-shot fire-and-forget work, but every submission is a fresh allocation and there is no
way to share state between successive spawns. The struct-based approach lets you keep
shared mutable state behind an Arc<Mutex<...>>.
Exercise 8 — Graceful shutdown
Because the Rust bindings do not expose cancel_work_sync or flush_work, you must
arrange your code so that in-flight work items finish on their own and stop being
re-enqueued before your module unloads.
Add an
AtomicBool
stopping flag to your device state, defaulted to false. In the self-rescheduling
work item from exercise 4, check the flag at the top of run and return early if it is
set. In your module's
PinnedDrop impl, set the
flag to true before letting the misc registration drop.
What scenarios does this design not cover? (For example: a long-running work item that
is currently inside run when the flag is set will still complete its current pass.)
Hint
Setting an AtomicBool with Release ordering from drop and reading it with Acquire
from run is sufficient. The Arc<DeviceState> strong-count goes to zero only after
every queued submission has either run-and-returned or never made it onto the queue, so
once you stop enqueueing the allocation drains itself naturally.
Exercise 9 — Pick the right queue
Read the per-queue documentation linked from
kernel::workqueue
and decide which system queue is appropriate for each of the following hypothetical
workloads:
- A handler that processes a 50 MB DMA buffer and may take several hundred milliseconds.
- A heartbeat that prints a single line every 10 seconds and must continue across suspend/resume.
- A latency-sensitive notification that should run as soon as possible after a high-priority interrupt.
- A long-running compression task that should not steal CPU from latency-sensitive work.
Justify each choice in one sentence.
Exercise 10 — Verify with dmesg timestamps
Configure dmesg to show timestamps:
dmesg -T -w
Run a sequence of KICK, KICK_SLOW, and KICK_DELAYED ioctls and confirm from the
timestamps that:
- Work items run on a worker thread, not in the calling process's context — the message timestamp may be slightly after the ioctl returns.
- Delayed work runs after the requested delay, not before.
- Self-rescheduling work, once stopped via the
stoppingflag, prints exactly one final message and then nothing more.