diff mbox series

[ndctl,3/3] test/cxl-security.sh: avoid intermittent failures due to sasync probe

Message ID 20230217-coverity-fixes-v1-3-043fac896a40@intel.com
State Accepted
Commit 82a0cda56e10c1167e7e19a651d2dce7debc3b7e
Headers show
Series cxl/monitor: coverity and misc other fixes | expand

Commit Message

Verma, Vishal L Feb. 18, 2023, 12:40 a.m. UTC
This test failed intermittently because the ndctl-list operation right
after a 'modprobe cxl_test' could race the actual nmem devices getting
loaded.

Since CXL device probes are asynchronous, and cxl_acpi might've kicked
off a cxl_bus_rescan(), a cxl_flush() (via cxl_wait_probe()) can ensure
everything is loaded.

Add a plain cxl-list right after the modprobe to allow for a flush/wait
cycle.

Cc: Dave Jiang <dave.jiang@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 test/security.sh | 1 +
 1 file changed, 1 insertion(+)

Comments

Dave Jiang Feb. 21, 2023, 4:45 p.m. UTC | #1
On 2/17/23 5:40 PM, Vishal Verma wrote:
> This test failed intermittently because the ndctl-list operation right
> after a 'modprobe cxl_test' could race the actual nmem devices getting
> loaded.
> 
> Since CXL device probes are asynchronous, and cxl_acpi might've kicked
> off a cxl_bus_rescan(), a cxl_flush() (via cxl_wait_probe()) can ensure
> everything is loaded.
> 
> Add a plain cxl-list right after the modprobe to allow for a flush/wait
> cycle.
> 
> Cc: Dave Jiang <dave.jiang@intel.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>   test/security.sh | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/test/security.sh b/test/security.sh
> index 04f630e..fb04aa6 100755
> --- a/test/security.sh
> +++ b/test/security.sh
> @@ -225,6 +225,7 @@ if [ "$uid" -ne 0 ]; then
>   fi
>   
>   modprobe "$KMOD_TEST"
> +cxl list
>   setup
>   check_prereq "keyctl"
>   rc=1
>
Alison Schofield Feb. 21, 2023, 5:22 p.m. UTC | #2
On Fri, Feb 17, 2023 at 05:40:24PM -0700, Vishal Verma wrote:
> This test failed intermittently because the ndctl-list operation right
> after a 'modprobe cxl_test' could race the actual nmem devices getting
> loaded.
> 
> Since CXL device probes are asynchronous, and cxl_acpi might've kicked
> off a cxl_bus_rescan(), a cxl_flush() (via cxl_wait_probe()) can ensure
> everything is loaded.
> 
> Add a plain cxl-list right after the modprobe to allow for a flush/wait
> cycle.

Is this the preferred method to 'settle', instead of udevadm settle?

> 
> Cc: Dave Jiang <dave.jiang@intel.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  test/security.sh | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/test/security.sh b/test/security.sh
> index 04f630e..fb04aa6 100755
> --- a/test/security.sh
> +++ b/test/security.sh
> @@ -225,6 +225,7 @@ if [ "$uid" -ne 0 ]; then
>  fi
>  
>  modprobe "$KMOD_TEST"
> +cxl list
>  setup
>  check_prereq "keyctl"
>  rc=1
> 
> -- 
> 2.39.1
> 
>
Dan Williams Feb. 21, 2023, 6:08 p.m. UTC | #3
Alison Schofield wrote:
> On Fri, Feb 17, 2023 at 05:40:24PM -0700, Vishal Verma wrote:
> > This test failed intermittently because the ndctl-list operation right
> > after a 'modprobe cxl_test' could race the actual nmem devices getting
> > loaded.
> > 
> > Since CXL device probes are asynchronous, and cxl_acpi might've kicked
> > off a cxl_bus_rescan(), a cxl_flush() (via cxl_wait_probe()) can ensure
> > everything is loaded.
> > 
> > Add a plain cxl-list right after the modprobe to allow for a flush/wait
> > cycle.
> 
> Is this the preferred method to 'settle', instead of udevadm settle?

In general, 'udevadm settle' is only the first phase of flushing
subsystem setup work in that it can only flush the udev event queue.
I.e. a device arriving kicks off a KOBJ_ADD event, and once that event
is processed without kicking off more KOBJ_* events the queue is
settled. For CXL this means that the cxl_acpi and cxl_pci modules are
loaded, but the asynchronous work they kick off to probe devices and
rescan the bus is invisible to 'udevadm settle'.

Internally 'cxl list' is performing a 'udevadm settle' and then flushing
the follow on async work.
Verma, Vishal L Feb. 21, 2023, 6:13 p.m. UTC | #4
On Tue, 2023-02-21 at 09:22 -0800, Alison Schofield wrote:
> On Fri, Feb 17, 2023 at 05:40:24PM -0700, Vishal Verma wrote:
> > This test failed intermittently because the ndctl-list operation right
> > after a 'modprobe cxl_test' could race the actual nmem devices getting
> > loaded.
> > 
> > Since CXL device probes are asynchronous, and cxl_acpi might've kicked
> > off a cxl_bus_rescan(), a cxl_flush() (via cxl_wait_probe()) can ensure
> > everything is loaded.
> > 
> > Add a plain cxl-list right after the modprobe to allow for a flush/wait
> > cycle.
> 
> Is this the preferred method to 'settle', instead of udevadm settle?

Generally, no. Usually cxl tests would use cxl-cli commands, which now
have the necessary waits via cxl_wait_probe(), so even a 'udevadm
settle' shouldn't be needed.

In this case, the first thing we run is ndctl list, which waits for
nvdimm things to 'settle', but we were racing with cxl_test coming up,
which it (ndctl) knows nothing about.

> 
> > 
> > Cc: Dave Jiang <dave.jiang@intel.com>
> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  test/security.sh | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/test/security.sh b/test/security.sh
> > index 04f630e..fb04aa6 100755
> > --- a/test/security.sh
> > +++ b/test/security.sh
> > @@ -225,6 +225,7 @@ if [ "$uid" -ne 0 ]; then
> >  fi
> >  
> >  modprobe "$KMOD_TEST"
> > +cxl list
> >  setup
> >  check_prereq "keyctl"
> >  rc=1
> > 
> > -- 
> > 2.39.1
> > 
> >
Alison Schofield Feb. 21, 2023, 6:34 p.m. UTC | #5
On Tue, Feb 21, 2023 at 10:13:16AM -0800, Vishal Verma wrote:
> On Tue, 2023-02-21 at 09:22 -0800, Alison Schofield wrote:
> > On Fri, Feb 17, 2023 at 05:40:24PM -0700, Vishal Verma wrote:
> > > This test failed intermittently because the ndctl-list operation right
> > > after a 'modprobe cxl_test' could race the actual nmem devices getting
> > > loaded.
> > >
> > > Since CXL device probes are asynchronous, and cxl_acpi might've kicked
> > > off a cxl_bus_rescan(), a cxl_flush() (via cxl_wait_probe()) can ensure
> > > everything is loaded.
> > >
> > > Add a plain cxl-list right after the modprobe to allow for a flush/wait
> > > cycle.
> >
> > Is this the preferred method to 'settle', instead of udevadm settle?
> 
> Generally, no. Usually cxl tests would use cxl-cli commands, which now
> have the necessary waits via cxl_wait_probe(), so even a 'udevadm
> settle' shouldn't be needed.
> 
> In this case, the first thing we run is ndctl list, which waits for
> nvdimm things to 'settle', but we were racing with cxl_test coming up,
> which it (ndctl) knows nothing about.

OK - I'll stop doing the udevadm settle, since what I'm doing is pure
'cxl' and modprobe is always followed by cxl-cli commands. 

> 
> >
> > >
> > > Cc: Dave Jiang <dave.jiang@intel.com>
> > > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > > ---
> > >  test/security.sh | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/test/security.sh b/test/security.sh
> > > index 04f630e..fb04aa6 100755
> > > --- a/test/security.sh
> > > +++ b/test/security.sh
> > > @@ -225,6 +225,7 @@ if [ "$uid" -ne 0 ]; then
> > >  fi
> > >
> > >  modprobe "$KMOD_TEST"
> > > +cxl list
> > >  setup
> > >  check_prereq "keyctl"
> > >  rc=1
> > >
> > > --
> > > 2.39.1
> > >
> > >
>
diff mbox series

Patch

diff --git a/test/security.sh b/test/security.sh
index 04f630e..fb04aa6 100755
--- a/test/security.sh
+++ b/test/security.sh
@@ -225,6 +225,7 @@  if [ "$uid" -ne 0 ]; then
 fi
 
 modprobe "$KMOD_TEST"
+cxl list
 setup
 check_prereq "keyctl"
 rc=1