This is a guest post by P99 CONF speaker Johannes Bechberger (OpenJDK developer at SAP SE). Johannes and Jake Hillion (Software Engineer at Meta) will be covering “Concurrency Testing Using Custom Linux Schedulers” in depth at P99 CONF 2025 (free + virtual). Their session description:
New features of the Linux kernel allow us to develop our own Linux schedulers. This talk shows how anyone can write a basic Linux scheduler and use it, for example, to fuzz for concurrency bugs or optimize for specific workloads.
Join P99 CONF 2025 (free + virtual)
This article was originally posted on Johannes’s blog, Mostly nerdless.
***
Welcome back to my hello-ebpf series. I was at FOSDEM earlier this year and gave three(ish) talks. This blog post covers one of them, which is related to concurrency testing.
Backstory
The whole story started in the summer of last year when I thought about the potential use cases for custom Linux schedulers. The only two prominent use cases at the time were:
- Improve performance on servers and heterogeneous systems
- Allow people to write their own schedulers for educational purposes
Of course, the latter is also important for the hello-ebpf project, which allows Java developers to create their owFrn schedulers. But I had another idea back then: Why can’t we use custom Linux schedulers for concurrency testing? I presented this idea in my keynote at the eBPF summit in September, which you can watch here.
I ended up giving this talk because of a certain Bill Mulligan, who liked the idea of showing my eBPF SDK in Java and sched-ext in a single talk. For whatever reason, I foolishly agreed to give the talk and spent a whole weekend in Oslo before JavaZone frantically implementing sched-ext support in hello-ebpf.
My idea then was that one could use a custom scheduler to test specific scheduling orders to find (and possibly) reproduce erroneous behavior of concurrent code. One of the examples in my talk was deadlocks:
Task-Control
The result of this idea was my blog post Hello eBPF: Control task scheduling with a custom scheduler written in Java (16) and the accompanying code:
The scheduler checks on every scheduling decision to determine whether a task can be scheduled. Coupled with a proper API, this can already be useful.
Initial Idea
But then FOSDEM came up, and I wanted to submit a talk on this topic to the Testing and Continuous Delivery track. With the help of Jake Hillion, I crafted the following idea: a scheduler that can fuzz concurrent programs. Taking the task-control code one step further, implementing the randomization of scheduling orders directly into a scheduler:
Consider you want to have a concurrency bug that requires threads to run in a specific order. Wouldn’t it be great if you could stop and start threads at random? Prevent them from being scheduled onto the CPU? And the best part: Without the application being able to prevent this, like it could do with POSIX STOP and START signals? In comes the scheduler extensions for the Linux Kernel. Introduced in version 6.12, they allow you to quickly write your own schedulers with eBPF, and can be the base for simple libraries that enable you to start and stop threads directly in the Linux kernel. This opens the possibility of creating complex scheduler scenarios at a whim.
In this talk, we’ll show you a prototypical sched_ext-based library for concurrency testing.
As you can read in any operating system text-book (and Wikipedia), scheduler developers typically have the following goals in mind:
- maximizing throughput (the total amount of work completed per time unit);
- minimizing wait time (time from work becoming ready until the first point it begins execution);
- minimizing latency or response time (time from work becoming ready until it is finished in case of batch activity,[1][2][3] or until the system responds and hands the first output to the user in case of interactive activity);[4]
- maximizing fairness (equal CPU time to each process, or more generally appropriate times according to the priority and workload of each process).
WIKIPEDIA
This means that all commonly known schedulers exhibit somewhat predictable behavior. So, the main idea of the proposed talk was to create a scheduler with erratic behavior – a scheduler that ultimately ignores goals like fairness or minimizing wait time. With sched-ext, implementing such a scheduler became possible.
To our surprise, the talk was accepted. Remember, this was the only sched-ext talk at FOSDEM in front of non-Kernel or eBPF people.
The Talk
And even more surprising was the full room at FOSDEM, with people apparently not getting in because there was too much interest. Sadly, despite preparing the talk and writing the code, I couldn’t join, because my train connection to Brussels got derailed (only figuratively). Jake told me later that he was still sleepy when he received my message at 8 am, but this changed rapidly when he realized that he would have to give the 12pm talk alone. But luckily, the talk went well.
I’m happy that Jake Hillion rose to the challenge and delivered a really good talk (“Even my 17-year-old daughter understood it”, a colleague from SAP later told me).
Even better, it caught the attention of Daroc Alden, who wrote a great article on LWN, covering the talk and the topic better than I ever could:
Hillion and Bechberger’s scheduler does this deliberately: when a thread is ready to run, instead of assigning it to a CPU right away, their scheduler will put it to sleep for a random amount of time based on some configurable parameters. This gives threads that would ordinarily not have any problem keeping up with each other a chance to let one thread race ahead and overwhelm the other. It also ensures that threads run in a random order, which can help expose bugs as well. In a way, it is the fact that Linux’s default scheduler is so good that makes these problems hard to reproduce, Hillion explained. EEVDF is too fair, and only actually lets one thread overwhelm the other when the machine is already overloaded, and even then rarely.
The idea of running threads in a random order is not new. There are plenty of specialized concurrency testing tools that do something similar. But concurrency-fuzz-scheduler is not nearly as fine-grained as such tools usually are. It won’t help deterministically find violations of a particular memory model, for example. It will find the kinds of gross logic errors that are common in real-life applications, however.
Exposing concurrency bugs with a custom scheduler
Code
The tool I developed is in its early state; it’s more of a proof-of-concept, helping to grasp the concept and possibilities.
You can find the code on GitHub, written, of course, in Java. I’ll cover the eBPF part of the code in the following, skipping over the boilerplate CLI portion.
First, we define a few data structures:
Every task that we care about (related to the program under test) can have one of three states:
START
: the task just came into the system; we have to set up everything for itRUNNING
: The task can currently be scheduled and, therefore, is able to runSLEEPING
: The task can’t be scheduled
We switch between RUNNING
and SLEEPING
whenever the current time in the state (timeAllowedInState
) is up, as stored in the TaskContext
. So, we essentially have a simple state machine:
We have to store the context somewhere, so we also need some maps:
We cache the information on whether a task is related to the program under test in a map, so we don’t always have to recompute it:
Of course, we then need to define some basic methods to get a random number in a specific range
and to initialize task contexts:
The initialization, depending on the randomly chosen start state of relevant new tasks, is done by different methods, which are also used when a task changes its state:
The actual state change (and the check, if it’s needed) is done in the updateState
method:
The actual scheduling is now pretty simple. We start with the init
and enqueue
methods that are implemented as always for the FIFO scheduler (see Hello eBPF: Writing a Linux scheduler in Java with eBPF (15)):
We must note that we schedule all tasks to the global queue, even if we don’t want to schedule them onto a CPU because of their current state. This could, and should, be improved by creating a secondary queue for tasks in the SLEEPING
state.
Now to the dispatching, which takes care of only scheduling tasks that are not in the SLEEPING
state and ignores tasks that have any CPU constraints:
We also need to update the start and stop information in the context of every relevant task:
And remove tasks from the isScriptRelated
map whenever they exit, as task IDs can be reused:
This is the relevant part of the code. Yes, it can be improved in several ways, but this is for another day.
Usage
After you build the tool via mvn package
, you can configure the SchedulerSettings
object:
More on this in the README, but you essentially run ./scheduler.sh PROGRAM
to run a program in the context of the fuzzer.
Example Program
In the talk, I used a toy example that is related to my work on a new profiler for Java:
This example involves a producer who produces items with a best-before date and should be used within a specific time frame. The consumer regularly consumes the items and crashes if they are too old. The items are passed via a concurrent queue. This is a simplification of the rather typical scenario in which items are not indefinitely valid but contain invalid information after some state changes.
You can find the code of the toy example on GitHub.
To build, use the samples/build_queue.sh
. Then you can run using the tool:
This erratic scheduler caused the consumer thread to be paused randomly and, in one instance, processed an invalid item. Of course, this toy example is constructed to make the error occur quickly so that it can be used in a demo. But the same technique should work with proper applications, too, just a lot slower. Fuzzing takes time.
Improvements
There are two main areas of improvement: Making the fuzzing more targeted and improving the scheduler. For the latter, we could use multiple queues and potentially a more advanced scheduler than FIFO as the base scheduler. To improve the fuzzing, we could, for example, make the sleeps after yields of random length. These scheduler yields can happen when a task doesn’t get the lock it asks for and has to wait longer. Or, of course, when the task sleeps. We could get even more knowledge on the tasks if we instrument the application. This is quite easy in Java with bytecode instrumentation and allows us to push more information dynamically into our fuzzing scheduler.
There are many more ways to improve the fuzzer; the only issue is finding the time to implement and test them.
Conclusion
I want to conclude with the words of Daroc Alden:
Hillon and Bechberger’s concurrency-fuzz-scheduler is a promising example of a sched_ext scheduler that does something more than act as a testbed for scheduling ideas. It is a practical tool for shaking more bugs out of applications — something that is always helpful, in a world where software only grows more complex.
And, of course, it’s a testament to the versatility of hello-ebpf. The tool is in its early stages, but given enough time, it can hopefully mature into a helpful tool for finding bugs in OpenJDK and more.
Thank you for joining me on this journey to bring eBPF and sched-ext to the Java world.
P.S.: This wasn’t my only talk at FOSDEM; I also gave a talk on sched-ext in the eBPF track and participated in a talk on the future of Java profiling in the Free Java track. See my FOSDEM speaker page for the details.
P.P.S.: Jake Hillion created a version of this scheduler in the scx repository and used it to find a kernel bug.
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thanks to Jake for helping with both the talk and this blog post.
***
Bonus: For a sneak peek of the bug that Johannes and Jake will be discussing at P99 CONF, see Jake’s blog post, Fixing a Kernel Driver Scheduling Bug with scx_chaos