diff mbox series

fs/proc: fix softlockup in __read_vmcore

Message ID 20240507091858.36ff767f@imladris.surriel.com (mailing list archive)
State New, archived
Headers show
Series fs/proc: fix softlockup in __read_vmcore | expand

Commit Message

Rik van Riel May 7, 2024, 1:18 p.m. UTC
While taking a kernel core dump with makedumpfile on a larger system,
softlockup messages often appear.

While softlockup warnings can be harmless, they can also interfere
with things like RCU freeing memory, which can be problematic when
the kdump kexec image is configured with as little memory as possible.

Avoid the softlockup, and give things like work items and RCU a
chance to do their thing during __read_vmcore by adding a cond_resched.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 fs/proc/vmcore.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Baoquan He May 9, 2024, 3:52 a.m. UTC | #1
Hi,

On 05/07/24 at 09:18am, Rik van Riel wrote:
> While taking a kernel core dump with makedumpfile on a larger system,
> softlockup messages often appear.
> 
> While softlockup warnings can be harmless, they can also interfere
> with things like RCU freeing memory, which can be problematic when
> the kdump kexec image is configured with as little memory as possible.
> 
> Avoid the softlockup, and give things like work items and RCU a
> chance to do their thing during __read_vmcore by adding a cond_resched.

Thanks for fixing this.

By the way, is it easy to reproduce? And should we add some trace of the
softlockup into log so that people can search for it and confirm when
encountering it?

Thanks
Baoquan

> ---
>  fs/proc/vmcore.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 1fb213f379a5..d06607a1f137 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -383,6 +383,8 @@ static ssize_t __read_vmcore(struct iov_iter *iter, loff_t *fpos)
>  		/* leave now if filled buffer already */
>  		if (!iov_iter_count(iter))
>  			return acc;
> +
> +		cond_resched();
>  	}
>  
>  	list_for_each_entry(m, &vmcore_list, list) {
> -- 
> 2.42.0
> 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>
Rik van Riel May 9, 2024, 1:41 p.m. UTC | #2
On Thu, 2024-05-09 at 11:52 +0800, Baoquan He wrote:
> Hi,
> 
> On 05/07/24 at 09:18am, Rik van Riel wrote:
> > While taking a kernel core dump with makedumpfile on a larger
> > system,
> > softlockup messages often appear.
> > 
> > While softlockup warnings can be harmless, they can also interfere
> > with things like RCU freeing memory, which can be problematic when
> > the kdump kexec image is configured with as little memory as
> > possible.
> > 
> > Avoid the softlockup, and give things like work items and RCU a
> > chance to do their thing during __read_vmcore by adding a
> > cond_resched.
> 
> Thanks for fixing this.
> 
> By the way, is it easy to reproduce? And should we add some trace of
> the
> softlockup into log so that people can search for it and confirm when
> encountering it?

It is pretty easy to reproduce, but it does not happen all the time.
With millions of systems, even rare errors are common :)

However, we have been running with this fix for long enough (we
deployed it in order to test it) that I don't think we have theĀ 
warning stored any more. Those logs were rotated out long ago.

kind regards,

Rik
Baoquan He May 9, 2024, 3:30 p.m. UTC | #3
On 05/09/24 at 09:41am, Rik van Riel wrote:
> On Thu, 2024-05-09 at 11:52 +0800, Baoquan He wrote:
> > Hi,
> > 
> > On 05/07/24 at 09:18am, Rik van Riel wrote:
> > > While taking a kernel core dump with makedumpfile on a larger
> > > system,
> > > softlockup messages often appear.
> > > 
> > > While softlockup warnings can be harmless, they can also interfere
> > > with things like RCU freeing memory, which can be problematic when
> > > the kdump kexec image is configured with as little memory as
> > > possible.
> > > 
> > > Avoid the softlockup, and give things like work items and RCU a
> > > chance to do their thing during __read_vmcore by adding a
> > > cond_resched.
> > 
> > Thanks for fixing this.
> > 
> > By the way, is it easy to reproduce? And should we add some trace of
> > the
> > softlockup into log so that people can search for it and confirm when
> > encountering it?
> 
> It is pretty easy to reproduce, but it does not happen all the time.
> With millions of systems, even rare errors are common :)
> 
> However, we have been running with this fix for long enough (we
> deployed it in order to test it) that I don't think we have theĀ 
> warning stored any more. Those logs were rotated out long ago.

OK, thanks for the explanation.

Acked-by: Baoquan He <bhe@redhat.com>
diff mbox series

Patch

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 1fb213f379a5..d06607a1f137 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -383,6 +383,8 @@  static ssize_t __read_vmcore(struct iov_iter *iter, loff_t *fpos)
 		/* leave now if filled buffer already */
 		if (!iov_iter_count(iter))
 			return acc;
+
+		cond_resched();
 	}
 
 	list_for_each_entry(m, &vmcore_list, list) {