diff mbox series

mdadm: Incremental mode creates file for udev rules

Message ID ded3b88e-c0e8-4a66-89f3-43bc6bb9664a@redhat.com (mailing list archive)
State New
Headers show
Series mdadm: Incremental mode creates file for udev rules | expand

Checks

Context Check Description
mdraidci/vmtest-md-6_14-PR fail merge-conflict

Commit Message

Nigel Croxon March 25, 2025, 7:18 p.m. UTC
Mounting an md device may fail during boot from mdadm's claim
on the device not being released before systemd attempts to mount.

While mdadm is still constructing the array (mdadm --incremental
that is called from within /usr/lib/udev/rules.d/64-md-raid-assembly.rules),
there is an attempt to mount the md device, but there is not a creation
of "/run/mdadm/creating-xxx" file when in incremental mode that
the rule is looking for.  Therefore the device is not marked
as SYSTEMD_READY=0  in
"/usr/lib/udev/rules.d/01-md-raid-creating.rules" and missing
synchronization using the "/run/mdadm/creating-xxx" file.

Enable creating the "/run/mdadm/creating-xxx" file during
incremental mode.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
---
  Incremental.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

  		if (mdfd < 0)
  			goto out_unlock;
@@ -599,6 +600,7 @@ int Incremental(struct mddev_dev *devlist, struct 
context *c,
  		rv = 0;
  	}
  out:
+	udev_unblock();
  	free(avail);
  	if (dfd >= 0)
  		close(dfd);

Comments

Xiao Ni March 26, 2025, 2:31 a.m. UTC | #1
On Wed, Mar 26, 2025 at 3:18 AM Nigel Croxon <ncroxon@redhat.com> wrote:
>
>
> Mounting an md device may fail during boot from mdadm's claim
> on the device not being released before systemd attempts to mount.
>
> While mdadm is still constructing the array (mdadm --incremental
> that is called from within /usr/lib/udev/rules.d/64-md-raid-assembly.rules),
> there is an attempt to mount the md device, but there is not a creation
> of "/run/mdadm/creating-xxx" file when in incremental mode that
> the rule is looking for.  Therefore the device is not marked
> as SYSTEMD_READY=0  in
> "/usr/lib/udev/rules.d/01-md-raid-creating.rules" and missing
> synchronization using the "/run/mdadm/creating-xxx" file.
>
> Enable creating the "/run/mdadm/creating-xxx" file during
> incremental mode.
>
> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
> ---
>   Incremental.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/Incremental.c b/Incremental.c
> index 228d2bdd..e0d3fce7 100644
> --- a/Incremental.c
> +++ b/Incremental.c
> @@ -30,6 +30,7 @@
>
>   #include      "mdadm.h"
>   #include      "xmalloc.h"
> +#include       "udev.h"
>
>   #include      <sys/wait.h>
>   #include      <dirent.h>
> @@ -286,7 +287,7 @@ int Incremental(struct mddev_dev *devlist, struct
> context *c,
>
>                 /* Couldn't find an existing array, maybe make a new one */
>                 mdfd = create_mddev(match ? match->devname : NULL, name_to_use,
> trustworthy,
> -                                   chosen_name, 0);
> +                                   chosen_name, 1);
>
>                 if (mdfd < 0)
>                         goto out_unlock;
> @@ -599,6 +600,7 @@ int Incremental(struct mddev_dev *devlist, struct
> context *c,
>                 rv = 0;
>         }
>   out:
> +       udev_unblock();
>         free(avail);
>         if (dfd >= 0)
>                 close(dfd);
> --
> 2.31.1
>

Hi Nigel

The incremental-assemble tries to assemble the array. One array has
more than one device. This patch calls udev_unblock and
/run/mdadm/creating-xxx is removed when the array is not ready. So it
needs to choose the right time when all devices are added to the array
and then calls udev_unblock.

Regards
Xiao
Paul Menzel March 26, 2025, 8 a.m. UTC | #2
Dear Nigel,


Thank you for your patch. It’d be great if you used imperative mood for 
the summary/title.

Am 25.03.25 um 20:18 schrieb Nigel Croxon:
> 

Just a note, that there is a blank line at the top of your commit 
message body.

> Mounting an md device may fail during boot from mdadm's claim
> on the device not being released before systemd attempts to mount.
> 
> While mdadm is still constructing the array (mdadm --incremental
> that is called from within /usr/lib/udev/rules.d/64-md-raid-assembly.rules),
> there is an attempt to mount the md device, but there is not a creation
> of "/run/mdadm/creating-xxx" file when in incremental mode that
> the rule is looking for.  Therefore the device is not marked
> as SYSTEMD_READY=0  in
> "/usr/lib/udev/rules.d/01-md-raid-creating.rules" and missing
> synchronization using the "/run/mdadm/creating-xxx" file.
> 
> Enable creating the "/run/mdadm/creating-xxx" file during
> incremental mode.
> 
> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
> ---
>   Incremental.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Incremental.c b/Incremental.c
> index 228d2bdd..e0d3fce7 100644
> --- a/Incremental.c
> +++ b/Incremental.c
> @@ -30,6 +30,7 @@
> 
>   #include    "mdadm.h"
>   #include    "xmalloc.h"
> +#include    "udev.h"
> 
>   #include    <sys/wait.h>
>   #include    <dirent.h>
> @@ -286,7 +287,7 @@ int Incremental(struct mddev_dev *devlist, struct context *c,
> 
>           /* Couldn't find an existing array, maybe make a new one */
>           mdfd = create_mddev(match ? match->devname : NULL, name_to_use, trustworthy,
> -                    chosen_name, 0);
> +                    chosen_name, 1);
> 
>           if (mdfd < 0)
>               goto out_unlock;
> @@ -599,6 +600,7 @@ int Incremental(struct mddev_dev *devlist, struct context *c,
>           rv = 0;
>       }
>   out:
> +    udev_unblock();
>       free(avail);
>       if (dfd >= 0)
>           close(dfd);


Kind regards,

Paul
Paul Menzel March 26, 2025, 9:15 a.m. UTC | #3
[Cc: Update Mariusz address]

Am 26.03.25 um 09:00 schrieb Paul Menzel:
> Dear Nigel,
> 
> 
> Thank you for your patch. It’d be great if you used imperative mood for 
> the summary/title.
> 
> Am 25.03.25 um 20:18 schrieb Nigel Croxon:
>>
> 
> Just a note, that there is a blank line at the top of your commit 
> message body.
> 
>> Mounting an md device may fail during boot from mdadm's claim
>> on the device not being released before systemd attempts to mount.
>>
>> While mdadm is still constructing the array (mdadm --incremental
>> that is called from within /usr/lib/udev/rules.d/64-md-raid- 
>> assembly.rules),
>> there is an attempt to mount the md device, but there is not a creation
>> of "/run/mdadm/creating-xxx" file when in incremental mode that
>> the rule is looking for.  Therefore the device is not marked
>> as SYSTEMD_READY=0  in
>> "/usr/lib/udev/rules.d/01-md-raid-creating.rules" and missing
>> synchronization using the "/run/mdadm/creating-xxx" file.
>>
>> Enable creating the "/run/mdadm/creating-xxx" file during
>> incremental mode.
>>
>> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
>> ---
>>   Incremental.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/Incremental.c b/Incremental.c
>> index 228d2bdd..e0d3fce7 100644
>> --- a/Incremental.c
>> +++ b/Incremental.c
>> @@ -30,6 +30,7 @@
>>
>>   #include    "mdadm.h"
>>   #include    "xmalloc.h"
>> +#include    "udev.h"
>>
>>   #include    <sys/wait.h>
>>   #include    <dirent.h>
>> @@ -286,7 +287,7 @@ int Incremental(struct mddev_dev *devlist, struct context *c,
>>
>>           /* Couldn't find an existing array, maybe make a new one */
>>           mdfd = create_mddev(match ? match->devname : NULL,  name_to_use, trustworthy,
>> -                    chosen_name, 0);
>> +                    chosen_name, 1);
>>
>>           if (mdfd < 0)
>>               goto out_unlock;
>> @@ -599,6 +600,7 @@ int Incremental(struct mddev_dev *devlist, struct context *c,
>>           rv = 0;
>>       }
>>   out:
>> +    udev_unblock();
>>       free(avail);
>>       if (dfd >= 0)
>>           close(dfd);
> 
> 
> Kind regards,
> 
> Paul
Mariusz Tkaczyk March 26, 2025, 12:48 p.m. UTC | #4
On Tue, 25 Mar 2025 15:18:00 -0400
Nigel Croxon <ncroxon@redhat.com> wrote:

> Mounting an md device may fail during boot from mdadm's claim
> on the device not being released before systemd attempts to mount.
> 
> While mdadm is still constructing the array (mdadm --incremental
> that is called from within
> /usr/lib/udev/rules.d/64-md-raid-assembly.rules), there is an attempt
> to mount the md device, but there is not a creation of
> "/run/mdadm/creating-xxx" file when in incremental mode that the rule
> is looking for.  Therefore the device is not marked as
> SYSTEMD_READY=0  in "/usr/lib/udev/rules.d/01-md-raid-creating.rules"
> and missing synchronization using the "/run/mdadm/creating-xxx" file.
> 
> Enable creating the "/run/mdadm/creating-xxx" file during
> incremental mode.

Hi Nigel,
The code is rather simple but the change is big. Before I will consider
it is safe to merge please describe the particular scenario you are
fixing.
It is known, persistent issue? Is is sporadic issue?

In the commit message please add why you think it is safe change.

This will affect every environment. We need to be certain that it will
not bring regression in booting flow. It might be hard to debug and
have big impact on users.

Thanks,
Mariusz

> 
> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
> ---
>   Incremental.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
Nigel Croxon March 26, 2025, 5:36 p.m. UTC | #5
On 3/26/25 8:48 AM, Mariusz Tkaczyk wrote:
> On Tue, 25 Mar 2025 15:18:00 -0400
> Nigel Croxon <ncroxon@redhat.com> wrote:
> 
>> Mounting an md device may fail during boot from mdadm's claim
>> on the device not being released before systemd attempts to mount.
>>
>> While mdadm is still constructing the array (mdadm --incremental
>> that is called from within
>> /usr/lib/udev/rules.d/64-md-raid-assembly.rules), there is an attempt
>> to mount the md device, but there is not a creation of
>> "/run/mdadm/creating-xxx" file when in incremental mode that the rule
>> is looking for.  Therefore the device is not marked as
>> SYSTEMD_READY=0  in "/usr/lib/udev/rules.d/01-md-raid-creating.rules"
>> and missing synchronization using the "/run/mdadm/creating-xxx" file.
>>
>> Enable creating the "/run/mdadm/creating-xxx" file during
>> incremental mode.
> 
> Hi Nigel,
> The code is rather simple but the change is big. Before I will consider
> it is safe to merge please describe the particular scenario you are
> fixing.
> It is known, persistent issue? Is is sporadic issue?
> 
> In the commit message please add why you think it is safe change.
> 
> This will affect every environment. We need to be certain that it will
> not bring regression in booting flow. It might be hard to debug and
> have big impact on users.
> 
> Thanks,
> Mariusz
> 
>>
>> Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
>> ---
>>    Incremental.c | 4 +++-
>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>
> 

From: Nigel Croxon <ncroxon@redhat.com>
Date: Wed, 26 Mar 2025 13:34:01 -0400
Subject: [PATCH] mdadm: enable sync file for udev rules

Mounting an md device may fail during boot from mdadm's claim
on the device not being released before systemd attempts to mount.

In this case it was found that essentially there is a race condition
occurring in which the mount cannot happen without some kind of delay
being added BEFORE the mount itself triggers, or manual intervention
after a timeout.

The findings:
the inode was for a tmp block node made by mdadm for md0.

crash> detailedsearch ff1b0c398ff28380
ff1b0c398f079720: ff1b0c398ff28380 slab:filp state:alloc 
obj:ff1b0c398f079700 size:256
ff1b0c398ff284f8: ff1b0c398ff28380 slab:shmem_inode_cache state:alloc 
obj:ff1b0c398ff28308 size:768

crash> struct file.f_inode,f_path ff1b0c398f079700
f_inode = 0xff1b0c398ff28380,
f_path = {
mnt = 0xff1b0c594aecc7a0,
dentry = 0xff1b0c3a8c614f00
},
crash> struct dentry.d_name 0xff1b0c3a8c614f00
d_name = {
{
{ hash = 3714992780, len = 16 },
hash_len = 72434469516
},
name = 0xff1b0c3a8c614f38 ".tmp.md.1454:9:0"
},

For the race condition, mdadm and udev have some infrastructure for making
the device be ignored while under construction. e.g.

$ cat lib/udev/rules.d/01-md-raid-creating.rules

do not edit this file, it will be overwritten on update
While mdadm is creating an array, it creates a file
/run/mdadm/creating-mdXXX. If that file exists, then
the array is not "ready" and we should make sure the
content is ignored.
KERNEL=="md*", TEST=="/run/mdadm/creating-$kernel", ENV{SYSTEMD_READY}="0"

However, this feature currently is only used by the mdadm create command.
(See calls to udev_block/udev_unblock in the mdadm code as to where and when
this behavior is used.) Any md array being started by incremental or
normal assemble commands does not use this udev integration. So assembly
of an existing array does not look to have any explicit protection from
systemd/udev seeing an array as in a usable state before an mdadm instance
with O_EXCL closes its file handle.
This is for the sake of showing the use case for such an option and why
it would be helpful to delay the mount itself.

While mdadm is still constructing the array (mdadm --incremental
that is called from within /usr/lib/udev/rules.d/64-md-raid-assembly.rules),
there is an attempt to mount the md device, but there is not a creation
of "/run/mdadm/creating-xxx" file when in incremental mode that
the rule is looking for.  Therefore the device is not marked
as SYSTEMD_READY=0  in
"/usr/lib/udev/rules.d/01-md-raid-creating.rules" and missing
synchronization using the "/run/mdadm/creating-xxx" file.

As to this change affecting containers or IMSM...
(container's array state is inactive all the time)

Even if the "array_state" reports "inactive" when previous components
are added, the mdadm call for the very last array component that makes
it usable/ready, still needs to be synced properly - mdadm needs to drop
the claim first (calling "close"), then delete the 
"/run/mdadm/creating-xxx".
Then lets the udev know it is clear to act now (the "udev_unblock" in
mdadm code that generates a synthetic udev event so the rules are
reevalutated). It's this processing of the very last array compoment
that is the issue here (which is not IO error, but it is that trying to
open the dev returns -EBUSY because of the exclusive claim that mdadm
still holds while the mdadm device is being processed already by udev in
parallel - and that is what the /run/mdadm/creating-xxx should prevent 
exactly).

The patch to Incremental.c is to enable creating the 
"/run/mdadm/creating-xxx"
file during incremental mode.

For the change to Create.c, the unlink is called right before dropping
the exculusive claim for the device. This should be the other way round
to avoid the race 100%. That is, if there's a "close" call and 
"udev_unblock"
call, the "close" should go first, then followed by "udev_unblock".

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
---
  Create.c      | 2 +-
  Incremental.c | 4 +++-
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Create.c b/Create.c
index fd6c9215..656de583 100644
--- a/Create.c
+++ b/Create.c
@@ -1318,8 +1318,8 @@ int Create(struct supertype *st, struct 
mddev_ident *ident, int subdevs,
  	} else {
  		pr_err("not starting array - not enough devices.\n");
  	}
-	udev_unblock();
  	close(mdfd);
+	udev_unblock();
  	sysfs_uevent(&info, "change");
  	dev_policy_free(custom_pols);

diff --git a/Incremental.c b/Incremental.c
index 228d2bdd..95de9598 100644
--- a/Incremental.c
+++ b/Incremental.c
@@ -30,6 +30,7 @@

  #include	"mdadm.h"
  #include	"xmalloc.h"
+#include	"udev.h"

  #include	<sys/wait.h>
  #include	<dirent.h>
@@ -286,7 +287,7 @@ int Incremental(struct mddev_dev *devlist, struct 
context *c,

  		/* Couldn't find an existing array, maybe make a new one */
  		mdfd = create_mddev(match ? match->devname : NULL, name_to_use, 
trustworthy,
-				    chosen_name, 0);
+				    chosen_name, 1);

  		if (mdfd < 0)
  			goto out_unlock;
@@ -606,6 +607,7 @@ out:
  		close(mdfd);
  	if (policy)
  		dev_policy_free(policy);
+	udev_unblock();
  	sysfs_free(sra);
  	return rv;
  out_unlock:
diff mbox series

Patch

diff --git a/Incremental.c b/Incremental.c
index 228d2bdd..e0d3fce7 100644
--- a/Incremental.c
+++ b/Incremental.c
@@ -30,6 +30,7 @@ 

  #include	"mdadm.h"
  #include	"xmalloc.h"
+#include	"udev.h"

  #include	<sys/wait.h>
  #include	<dirent.h>
@@ -286,7 +287,7 @@  int Incremental(struct mddev_dev *devlist, struct 
context *c,

  		/* Couldn't find an existing array, maybe make a new one */
  		mdfd = create_mddev(match ? match->devname : NULL, name_to_use, 
trustworthy,
-				    chosen_name, 0);
+				    chosen_name, 1);