fs/proc: fix softlockup in __read_vmcore

Message ID	20240507091858.36ff767f@imladris.surriel.com (mailing list archive)
State	New, archived
Headers	show Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17C5315E5D3; Tue, 7 May 2024 13:19:43 +0000 (UTC) Date: Tue, 7 May 2024 09:18:58 -0400 From: Rik van Riel <riel@surriel.com> To: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com>, Dave Young <dyoung@redhat.com>, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH] fs/proc: fix softlockup in __read_vmcore Message-ID: <20240507091858.36ff767f@imladris.surriel.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: riel@surriel.com
Series	fs/proc: fix softlockup in __read_vmcore \| expand fs/proc: fix softlockup in __read_vmcore

Message ID

20240507091858.36ff767f@imladris.surriel.com (mailing list archive)

State

New, archived

Headers

Date: Tue, 7 May 2024 09:18:58 -0400
From: Rik van Riel <riel@surriel.com>
To: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>, Dave Young <dyoung@redhat.com>,
 kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
 linux-fsdevel@vger.kernel.org, kernel-team@meta.com
Subject: [PATCH] fs/proc: fix softlockup in __read_vmcore
Message-ID: <20240507091858.36ff767f@imladris.surriel.com>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: riel@surriel.com

Series

fs/proc: fix softlockup in __read_vmcore | expand

Commit Message

Rik van Riel May 7, 2024, 1:18 p.m. UTC

While taking a kernel core dump with makedumpfile on a larger system,
softlockup messages often appear.

While softlockup warnings can be harmless, they can also interfere
with things like RCU freeing memory, which can be problematic when
the kdump kexec image is configured with as little memory as possible.

Avoid the softlockup, and give things like work items and RCU a
chance to do their thing during __read_vmcore by adding a cond_resched.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 fs/proc/vmcore.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Baoquan He May 9, 2024, 3:52 a.m. UTC | #1

Hi,

On 05/07/24 at 09:18am, Rik van Riel wrote:
> While taking a kernel core dump with makedumpfile on a larger system,
> softlockup messages often appear.
> 
> While softlockup warnings can be harmless, they can also interfere
> with things like RCU freeing memory, which can be problematic when
> the kdump kexec image is configured with as little memory as possible.
> 
> Avoid the softlockup, and give things like work items and RCU a
> chance to do their thing during __read_vmcore by adding a cond_resched.

Thanks for fixing this.

By the way, is it easy to reproduce? And should we add some trace of the
softlockup into log so that people can search for it and confirm when
encountering it?

Thanks
Baoquan

> ---
>  fs/proc/vmcore.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 1fb213f379a5..d06607a1f137 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -383,6 +383,8 @@ static ssize_t __read_vmcore(struct iov_iter *iter, loff_t *fpos)
>  		/* leave now if filled buffer already */
>  		if (!iov_iter_count(iter))
>  			return acc;
> +
> +		cond_resched();
>  	}
>  
>  	list_for_each_entry(m, &vmcore_list, list) {
> -- 
> 2.42.0
> 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>

Rik van Riel May 9, 2024, 1:41 p.m. UTC | #2

On Thu, 2024-05-09 at 11:52 +0800, Baoquan He wrote:
> Hi,
> 
> On 05/07/24 at 09:18am, Rik van Riel wrote:
> > While taking a kernel core dump with makedumpfile on a larger
> > system,
> > softlockup messages often appear.
> > 
> > While softlockup warnings can be harmless, they can also interfere
> > with things like RCU freeing memory, which can be problematic when
> > the kdump kexec image is configured with as little memory as
> > possible.
> > 
> > Avoid the softlockup, and give things like work items and RCU a
> > chance to do their thing during __read_vmcore by adding a
> > cond_resched.
> 
> Thanks for fixing this.
> 
> By the way, is it easy to reproduce? And should we add some trace of
> the
> softlockup into log so that people can search for it and confirm when
> encountering it?

It is pretty easy to reproduce, but it does not happen all the time.
With millions of systems, even rare errors are common :)

However, we have been running with this fix for long enough (we
deployed it in order to test it) that I don't think we have the 
warning stored any more. Those logs were rotated out long ago.

kind regards,

Rik

Baoquan He May 9, 2024, 3:30 p.m. UTC | #3

On 05/09/24 at 09:41am, Rik van Riel wrote:
> On Thu, 2024-05-09 at 11:52 +0800, Baoquan He wrote:
> > Hi,
> > 
> > On 05/07/24 at 09:18am, Rik van Riel wrote:
> > > While taking a kernel core dump with makedumpfile on a larger
> > > system,
> > > softlockup messages often appear.
> > > 
> > > While softlockup warnings can be harmless, they can also interfere
> > > with things like RCU freeing memory, which can be problematic when
> > > the kdump kexec image is configured with as little memory as
> > > possible.
> > > 
> > > Avoid the softlockup, and give things like work items and RCU a
> > > chance to do their thing during __read_vmcore by adding a
> > > cond_resched.
> > 
> > Thanks for fixing this.
> > 
> > By the way, is it easy to reproduce? And should we add some trace of
> > the
> > softlockup into log so that people can search for it and confirm when
> > encountering it?
> 
> It is pretty easy to reproduce, but it does not happen all the time.
> With millions of systems, even rare errors are common :)
> 
> However, we have been running with this fix for long enough (we
> deployed it in order to test it) that I don't think we have the 
> warning stored any more. Those logs were rotated out long ago.

OK, thanks for the explanation.

Acked-by: Baoquan He <bhe@redhat.com>

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 1fb213f379a5..d06607a1f137 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -383,6 +383,8 @@  static ssize_t __read_vmcore(struct iov_iter *iter, loff_t *fpos)
 		/* leave now if filled buffer already */
 		if (!iov_iter_count(iter))
 			return acc;
+
+		cond_resched();
 	}
 
 	list_for_each_entry(m, &vmcore_list, list) {

fs/proc: fix softlockup in __read_vmcore

Commit Message

Comments

Patch