mbox series

[v2,0/2] Fix generic/390 failure due to quota release after freeze

Message ID 20241121123855.645335-1-ojaswin@linux.ibm.com (mailing list archive)
Headers show
Series Fix generic/390 failure due to quota release after freeze | expand

Message

Ojaswin Mujoo Nov. 21, 2024, 12:38 p.m. UTC
Changes since v1:

 * Patch 1: Move flush_delayed_work() to start of function
 * Patch 2: Guard ext4_release_dquot against freeze

Regarding patch 2, as per my understanding of the journalling code,
right now ext4_release_dquot() can only be called from the
quota_realease_work workqueue and hence ideally should never have a
journal open but to future-proof it we make sure the journal is not
opened when calling sb_start_inwrite().

** Original Cover **

Recently we noticed generic/390 failing on powerpc systems. This test
basically does a freeze-unfreeze loop in parallel with fsstress on the
FS to detect any races in the code paths.

We noticed that the test started failing due to kernel WARN_ONs because
quota_release_work workqueue started executing while the FS was frozen
which led to creating new transactions in ext4_release_quota. 

Most of the details are in the bug however I'd just like to add that
I'm completely new to quota code so the patch, although fixing the
issue, might be not be logically the right thing to do. So reviews and
suggestions are welcome. 

Also, I can only replicate this race on one of my machines reliably and
does not appear on others.  I've tested with with fstests -g quota and
don't see any new failures.

Ojaswin Mujoo (2):
  quota: flush quota_release_work upon quota writeback
  ext4: protect ext4_release_dquot against freezing

 fs/ext4/super.c  | 17 +++++++++++++++++
 fs/quota/dquot.c |  2 ++
 2 files changed, 19 insertions(+)

Comments

Ojaswin Mujoo Nov. 21, 2024, 12:48 p.m. UTC | #1
On Thu, Nov 21, 2024 at 06:08:53PM +0530, Ojaswin Mujoo wrote:
> Changes since v1:

Forgot to link v1:

https://lore.kernel.org/linux-ext4/20241115183449.2058590-1-ojaswin@linux.ibm.com/T/#t

> 
>  * Patch 1: Move flush_delayed_work() to start of function
>  * Patch 2: Guard ext4_release_dquot against freeze
> 
> Regarding patch 2, as per my understanding of the journalling code,
> right now ext4_release_dquot() can only be called from the
> quota_realease_work workqueue and hence ideally should never have a
> journal open but to future-proof it we make sure the journal is not
> opened when calling sb_start_inwrite().
> 
> ** Original Cover **
> 
> Recently we noticed generic/390 failing on powerpc systems. This test
> basically does a freeze-unfreeze loop in parallel with fsstress on the
> FS to detect any races in the code paths.
> 
> We noticed that the test started failing due to kernel WARN_ONs because
> quota_release_work workqueue started executing while the FS was frozen
> which led to creating new transactions in ext4_release_quota. 
> 
> Most of the details are in the bug however I'd just like to add that
> I'm completely new to quota code so the patch, although fixing the
> issue, might be not be logically the right thing to do. So reviews and
> suggestions are welcome. 
> 
> Also, I can only replicate this race on one of my machines reliably and
> does not appear on others.  I've tested with with fstests -g quota and
> don't see any new failures.
> 
> Ojaswin Mujoo (2):
>   quota: flush quota_release_work upon quota writeback
>   ext4: protect ext4_release_dquot against freezing
> 
>  fs/ext4/super.c  | 17 +++++++++++++++++
>  fs/quota/dquot.c |  2 ++
>  2 files changed, 19 insertions(+)
> 
> -- 
> 2.43.5
>