diff mbox series

[v2] mm/compaction: Disable compact_unevictable_allowed on RT

Message ID 20200302173516.iysuejilava37psk@linutronix.de (mailing list archive)
State New, archived
Headers show
Series [v2] mm/compaction: Disable compact_unevictable_allowed on RT | expand

Commit Message

Sebastian Andrzej Siewior March 2, 2020, 5:35 p.m. UTC
Since commit
    5bbe3547aa3ba ("mm: allow compaction of unevictable pages")

it is allowed to examine mlocked pages and compact them by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.

Make compact_unevictable_allowed = 0 default and RO on RT.

Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2: - Make the proc file RO instead removing it.
       - Mention this change in Documentation/…/vm.rst.

 Documentation/admin-guide/sysctl/vm.rst | 1 +
 kernel/sysctl.c                         | 4 ++++
 mm/compaction.c                         | 4 ++++
 3 files changed, 9 insertions(+)

Comments

Andrew Morton March 2, 2020, 9:25 p.m. UTC | #1
On Mon, 2 Mar 2020 18:35:16 +0100 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> Since commit
>     5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> 
> it is allowed to examine mlocked pages and compact them by default.
> On -RT even minor pagefaults are problematic because it may take a few
> 100us to resolve them and until then the task is blocked.
> 
> Make compact_unevictable_allowed = 0 default and RO on RT.

hm, that's a bit sad but I guess it's tolerable.

> ...
>
> index 64aeee1009cab..bbfa59d25eec3 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -128,6 +128,7 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
>  This should be used on systems where stalls for minor page faults are an
>  acceptable trade for large contiguous free memory.  Set to 0 to prevent
>  compaction from moving pages that are unevictable.  Default value is 1.
> +On CONFIG_PREEMPT_RT the default value is 0.

This doesn't mention that the file is unwritable on -rt, and it doesn't
explain *why* -rt has different behaviour.

> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1483,7 +1483,11 @@ static struct ctl_table vm_table[] = {
>  		.procname	= "compact_unevictable_allowed",
>  		.data		= &sysctl_compact_unevictable_allowed,
>  		.maxlen		= sizeof(int),
> +#ifdef CONFIG_PREEMPT_RT
> +		.mode		= 0444,
> +#else
>  		.mode		= 0644,
> +#endif

This is non-backward-compatible and introduces a possibility that
tested-on-non-rt userspace will fail on -rt kernels.  It might be
better to accept the writes, but to ignore them.  Probably with a
pr_warn_once() to let people know what we did.

But do we really need to take the option away from -rt users?  Perhaps
someone wants this feature and can accept the latency hit.  How about
switching the default and otherwise leaving the kernel behaviour as-is
and simply emitting a warning letting -rt users know that they might
not want to enable this?
Sebastian Andrzej Siewior March 3, 2020, 5:59 p.m. UTC | #2
On 2020-03-02 13:25:31 [-0800], Andrew Morton wrote:
> > index 64aeee1009cab..bbfa59d25eec3 100644
> > --- a/Documentation/admin-guide/sysctl/vm.rst
> > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > @@ -128,6 +128,7 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
> >  This should be used on systems where stalls for minor page faults are an
> >  acceptable trade for large contiguous free memory.  Set to 0 to prevent
> >  compaction from moving pages that are unevictable.  Default value is 1.
> > +On CONFIG_PREEMPT_RT the default value is 0.
> 
> This doesn't mention that the file is unwritable on -rt, and it doesn't
> explain *why* -rt has different behaviour.

I updated this bit.

> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -1483,7 +1483,11 @@ static struct ctl_table vm_table[] = {
> >  		.procname	= "compact_unevictable_allowed",
> >  		.data		= &sysctl_compact_unevictable_allowed,
> >  		.maxlen		= sizeof(int),
> > +#ifdef CONFIG_PREEMPT_RT
> > +		.mode		= 0444,
> > +#else
> >  		.mode		= 0644,
> > +#endif
> 
> This is non-backward-compatible and introduces a possibility that
> tested-on-non-rt userspace will fail on -rt kernels.  It might be
> better to accept the writes, but to ignore them.  Probably with a
> pr_warn_once() to let people know what we did.

Hmm.

> But do we really need to take the option away from -rt users?  Perhaps
> someone wants this feature and can accept the latency hit.  How about
> switching the default and otherwise leaving the kernel behaviour as-is
> and simply emitting a warning letting -rt users know that they might
> not want to enable this?

I don't think that RT people can live with the latency spike. The
problem is that it is not deterministic in terms *when* it happens and
*how*long* does it need to complete. Also it is not visible so you end
up with additional 100us and you have no idea why.
compaction is "okay" in the setup / configuration phase when the mlock()
pages aren't around / the RT task is disabled. So it does not disturb
the RT load.

Allowing the user to change the knob and spitting a warning is probably
good. So we have a preferred default and the user is aware if it is
changed with or without his knowledge.
Let me send a patch in a bit…

Sebastian
diff mbox series

Patch

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009cab..bbfa59d25eec3 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -128,6 +128,7 @@  allowed to examine the unevictable lru (mlocked pages) for pages to compact.
 This should be used on systems where stalls for minor page faults are an
 acceptable trade for large contiguous free memory.  Set to 0 to prevent
 compaction from moving pages that are unevictable.  Default value is 1.
+On CONFIG_PREEMPT_RT the default value is 0.
 
 
 dirty_background_bytes
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ad5b88a53c5a8..f113e31d0b0b6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1483,7 +1483,11 @@  static struct ctl_table vm_table[] = {
 		.procname	= "compact_unevictable_allowed",
 		.data		= &sysctl_compact_unevictable_allowed,
 		.maxlen		= sizeof(int),
+#ifdef CONFIG_PREEMPT_RT
+		.mode		= 0444,
+#else
 		.mode		= 0644,
+#endif
 		.proc_handler	= proc_dointvec,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
diff --git a/mm/compaction.c b/mm/compaction.c
index 672d3c78c6abf..ba77809a1666e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1590,7 +1590,11 @@  typedef enum {
  * Allow userspace to control policy on scanning the unevictable LRU for
  * compactable pages.
  */
+#ifdef CONFIG_PREEMPT_RT
+int sysctl_compact_unevictable_allowed __read_mostly = 0;
+#else
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
+#endif
 
 static inline void
 update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)