====================
Icebreakers:
* When was the kthread primitive introduced and who introduced it?
What was the original purpose of the kthreads API?
* What was the first try_to_freeze() kernel user?
* Do *you* run into issues with the kthread freezer?
* We're addressing such woes with filesystems first
Motivation
----------
The kernel kthread freezer API is rather loose, and even with a lot of years of
evolution over *how* to properly freeze kthreads, there are still issues that
creep up. One goal suggested long ago by Jiri Kosina was to *not* have to
call try_to_freeze() on kthreads all over the kernel and instead replace it
with more appropriate infrastructure per subsystem.
Long term we want to address this throughout the kernel, however, we'll start
off focusing on filesystems first. Each other subsystem will have to address
this on their own but perhaps they can get some ideas of what to do from the
filesystems work.
Example of a modern kthread freezer issue
-----------------------------------------
A regression was detected on XFS with suspend, on a hibernation stress test
after 48 rounds of cycling it failed.
> Reverting 18f1df4e00ce ("xfs: Make xfsaild freezeable again")
> would break the proper form of the kthread for it to be freezable.
> This "form" is not defined formally, and sadly its just a form
> learned throughout years over different kthreads in the kernel.
Dave Chinner later noted:
"Suspend on journalling filesystems has been broken for a long time
(i.e since I first realised the scope of the problem back in 2005)"
"IOWs, suspend of filesystems has been broken forever, and we've been
slapping bandaids on it in XFS forever."
https://lkml.kernel.org/r/20171114212538.GC4094@dastard
Components to understand the the issue
--------------------------------------
* refrigerator()
* try_to_freeze()
* kthread_freezable_should_stop()
* kthread_run()
The core of the issue:
If a freezable kernel thread fails to call try_to_freeze() after the freezer
has initiated a freezing operation, the freezing of tasks will fail and the
entire hibernation or suspend operation will be cancelled.
refrigerator()
--------------
Commit 542f96a52 ("[PATCH] suspend-to-{RAM,disk}") by
Pavel Machek <pavel@ucw.cz> on v2.5.18 added suspend-to-RAM/disk
support, and as part of it, it added the refrigerator(). It carried
heavy warnings for a good reason:
/Documentation/swsusp.txt
BIG FAT WARNING
If you have unsupported (*) devices using DMA...
...say goodbye to your data.
If you touch anything on disk between suspend and resume...
...kiss your data goodbye.
If your disk driver does not support suspend... (IDE does)
...you'd better find out how to get along
without your data.
(*) pm interface support is needed to make it safe.
# refrigerator() in a nutshell
void refrigerator(unsigned long flag)
{
...
while (current->flags & PF_FROZEN)
schedule();
...
}
kthread_run()
-------------
When was the kthread primitive introduced and who introduced it?
Rusty Russell <rusty@rustcorp.com.au> via linux-history commit
933ba10234f68 ("[PATCH] kthread primitive") on the v2.6.4 release.
Original motivation was to enable CPU hotplug. Managing tasks properly
in light of CPU hotplug is hard, the kthread primitive helps with this.
Uses kernel_thread() behind the scenes -- the kernel equivalent to a fork()
Don't freeze kthreads on try_to_freeze_tasks()
----------------------------------------------
We don't want kernel threads to be frozen in unexpected places, so we allow
them to block freeze_processes(), or to set PF_NOFREEZE if needed.
KTW_FREEZABLE exists to enable kthread work freezing but no users exist.
static void create_kthread(struct kthread_create_info *create)
{
...
/* We want our own signal handler (we take no signals by default). */
pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
...
}
Also:
bool freeze_task(struct task_struct *p)
{
...
if (!(p->flags & PF_KTHREAD))
fake_signal_wake_up(p);
else
wake_up_state(p, TASK_INTERRUPTIBLE);
...
}
Since kthreads want their own signal handler we won't wake them up with
the above signal, and we also have that extra PF_KTHREAD check -- just
in case. kthreads need to have control over how they are frozen.
freezer_do_not_count() - don't freeze current
---------------------------------------------
Just keep in mind its not just kthreads.
kthreads are not the only things which avoids the general freeze_processes().
freezer_do_not_count() sets PF_NOFREEZE and these proceesses will also skip
the general freeze. Currently set on:
* do_fork() on wait_for_vfork_done()
* do_coredump() on coredump_wait()
* binder_thread_read() on binder_wait_for_work()
First try_to_freeze()
---------------------
Commit 54820fb26 ("[PATCH] swsusp: try_to_freeze() to make freezing hooks
nicer") added as of v2.6.11 by Pavel Machek <pavel@ucw.cz> added
try_to_freeze() API for the kernel scheduler.
# First try_to_freeze() kernel user
What was the first try_to_freeze() user? It was on the x86 Intel IO-APIC
kernel IRQ balancer:
static int __init balanced_irq_init(void)
{
...
printk(KERN_INFO "Starting balanced_irq\n");
if (kernel_thread(balanced_irq, NULL, CLONE_KERNEL) >= 0)
return 0;
else
printk(KERN_ERR "balanced_irq_init: failed to spawn balanced_irq");
...
}
Modified later to use the kthread API, so after commit f26d6a2bbcf38 ("[PATCH]
i386: convert to the kthread API") merged on v2.6.22 this looked like:
static int __init balanced_irq_init(void)
{
...
printk(KERN_INFO "Starting balanced_irq\n");
if (!IS_ERR(kthread_run(balanced_irq, NULL, "kirqd")))
return 0;
printk(KERN_ERR "balanced_irq_init: failed to spawn balanced_irq");
...
}
Anyway, this IRQ balancer was later removed via commit 8b8e8c1bf7275e ("x86:
remove irqbalance in kernel for 32 bit") merged on v2.6.28 since the userspace
irqbalanced deprecated this.
Early try_to_freeze() users
---------------------------
Code was simplified all around that used similar semantics with
try_to_freeze() via commit f9adcf4ea1599 ("[PATCH] swsusp: refrigerator
cleanups") added on v2.6.11 by Pavel Machek <pavel@ucw.cz>.
What are the try_to_freeze() early users? Determined by using
linux-history with:
git checkout -b linux-2.6.11 v2.6.11
git grep try_to_freeze
* architecture do_signal() calls -- for example on x86:
@@ -24,7 +24,6 @@
#include <linux/stddef.h>
#include <linux/personality.h>
#include <linux/compiler.h>
-#include <linux/suspend.h>
#include <asm/ucontext.h>
#include <asm/uaccess.h>
#include <asm/i387.h>
@@ -423,10 +422,8 @@ int do_signal(struct pt_regs *regs, sigset_t *oldset)
return 1;
}
- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(0))
goto no_signal;
- }
if (!oldset)
oldset = ¤t->blocked;