CFS: the completely *fair* Linux scheduler
20 Jan 2017
3 minutes read


For the operating system course I have this semester, our teacher asked us to search more about the CFS for the exams. Here is a compilation of things I found.


A scheduler is a part of the kernel that control the scheduling of multiples processes and threads. There is several ways to define which and when a process will run on the processor. I will explain CFS way of scheduling.

Linux scheduling

Before Linux 2.6 (back when I was still a kid), the scheduler was the O(1) scheduler which use a Round Robin with a given time quantum. The name O(1) come from the big O notation. The scheduler choose his next target with a constant time. With Linux 2.6.x, a new scheduler was merged and used, the Completely Fair Scheduler.

Another thing specific to Linux is the handling of processes and threads. For Linux, threads are a lightweight process with shared data with others processes. This means Linux doesn’t make any difference in the scheduling between a process and a thread.


Let’s now talk about the more important piece, the CFS. We will see that the scheduler is fair in his way.

Each task has an information about the time already consummed on the CPU (with a nanoseconds precision!) and CFS use this information to analyse which task will be the next. The idea is to check the start time and the already consummed time of task to analyse if this task deserve being on the CPU.

The advantage of this approach over the O(1) scheduler occurs when an interactive process waiting for input gets the CPU. As this process was waiting the input, the ratio of used CPU time is very low compared to others tasks which means the task will have more time on the CPU, increasing the user percieved responsiveness.

The fairness of the scheduler is therefore not only for ready-to-be-run tasks but also for waiting tasks.

Let’s add some nice!

In Linux, nice value acts as priority (the lower the nice, the higher the priority). CFS uses a weight for each task based on the niceness. We can calculate the weight as weight = 1024 / (1.25^nice). The scheduler calculate the proportion of CPU given to a task with his weight divided by the weight of all current tasks.

If a task arrives with a slower nice than the current running task, the running task will be preempted.


The runqueue for the task is implemented as a red-black tree. Before analysing why the runqueue is implemented like that, let’s analyse the red-black tree data structures.

A red-black tree is a type of balanced tree where each node has a color (red or black). The color of a node has to be different than the color of his parent and his children. It gives the tree the property that insert, lookup and delete can be done in O(log n) instead of O(n) for most of the balanced tree in worst case scenario.

We can thus see why the red-black tree has been choosen over another implementation of balanced tree. The O(1) scheduler in another hand is using an array for the runqueue but it suffers from artifacts created to counterbalance the simple data structure.

In the red-black tree, tasks are organized by the execution time. When the scheduler pick a task, it always takes the left-most from the tree meaning the task picking is done in O(1) time. When a task need to re-enter the tree, the task obviously go to the right of the tree giving a chance for others tasks to run.

Final words

It sums up the things I could collected on the CFS without going into too much details. This post gives a good first understanding on how the Linux scheduler works.


Back to posts