diff mbox series

i915/selftest/igt_mmap: let mmap tests run in kthread

Message ID mqzn3acyfarzlst3tt3mh5r4bvz4ntjkz5a66pip7qmm6hslb2@qc7g7j7q4z3y (mailing list archive)
State New
Headers show
Series i915/selftest/igt_mmap: let mmap tests run in kthread | expand

Commit Message

Mikolaj Wasiak Feb. 25, 2025, 10:41 a.m. UTC
When driver is loaded on system with numa nodes it might be run in kthread.
This makes it impossible to use current->mm in selftests which currently
creates null pointer exception.
This patch allows selftest to use current->mm by using active_mm.

Signed-off-by: Mikolaj Wasiak <mikolaj.wasiak@intel.com>
---

This patch is mostly damage control. By using active_mm we expose our
test to foreign memory mapping, which sometimes makes the test fail.
That is still better than just having null pointer exception in driver
code.

 .../drm/i915/gem/selftests/i915_gem_mman.c    | 61 ++++++++++++++-----
 drivers/gpu/drm/i915/selftests/igt_mmap.c     | 19 ++++++
 drivers/gpu/drm/i915/selftests/igt_mmap.h     |  3 +
 3 files changed, 67 insertions(+), 16 deletions(-)

Comments

Eugene Kobyak Feb. 26, 2025, 9:21 a.m. UTC | #1
> When driver is loaded on system with numa nodes it might be run in kthread.
> This makes it impossible to use current->mm in selftests which currently
> creates null pointer exception.
> This patch allows selftest to use current->mm by using active_mm.
> 
> Signed-off-by: Mikolaj Wasiak <mikolaj.wasiak@intel.com>

Hi, LGTM
Reviewed-by: Eugene Kobyak <eugene.kobyak@intel.com>

Best regards
Eugene
Krzysztof Karas Feb. 26, 2025, 9:41 a.m. UTC | #2
Hi Mikolaj,

> This makes it impossible to use current->mm in selftests which currently
> creates null pointer exception.
Did you see this happening locally on your setup or is this
already documented in an open issue? If latter is the case, then
you could write here something like this:
"Using current->mm may cause a null pointer exception in
selftests as seen in issue <issue id>".

> ---
> 
> This patch is mostly damage control. By using active_mm we expose our
> test to foreign memory mapping, which sometimes makes the test fail.
> That is still better than just having null pointer exception in driver
> code.
We talked about this offline a bit, but I think we should not
expose the foreign memory mapping, if that "sometimes makes the
test fail" (we have no way to determine the outcome of the test
and subjecting the testing suites to sporadical failures that we
have no idea when would happen makes the test lose its value).
Skipping this test on setups with NUMA nodes would let us avoid
the null pointer exceptions and give us time to figure out how
to run it safely on such setups. 

[...]
> +int igt_mmap_enable_current(void)
> +{
> +	if (current->flags & PF_KTHREAD) {
> +		if (!current->active_mm) {
Insetad of trying to pull foreign memory, we could return early
whenever we detect a kthread.

I also wander if this is the best place to insert the code
responsible for detecting these problematic conditions. Here we
need to return an error, which then causes the SUBTEST to fail,
but we cannot really do anything about it until we figure out
a way to run these tests safely everywhere. I think
`i915_subtest tests[]` could have two versions:
 1) one containing all the current SUBTEST entries for when we
  are sure it is safe to run all of them
 2) and the second that does not contain SUBTESTs that you
  modify in this patch, which would be chosen if we'd detect
  conditions that could cause these null pointer exceptions.
> +			pr_info("Couldn't get userspace mm in kthread\n");
> +			return -ENODATA;
> +		}
> +		kthread_use_mm(current->active_mm);

Best Regards,
Krzysztof
Andi Shyti Feb. 26, 2025, 10:59 a.m. UTC | #3
Hi Mikolaj,

On Tue, Feb 25, 2025 at 11:41:56AM +0100, Mikolaj Wasiak wrote:
> When driver is loaded on system with numa nodes it might be run in kthread.

nit: please, don't forget articles:

*the* driver
*the* system

:-)

> This makes it impossible to use current->mm in selftests which currently
> creates null pointer exception.
> This patch allows selftest to use current->mm by using active_mm.

What is the failure you are getting? I'm not sure this might be
the right solution, but it might make sense, though... if there
is a valid reason.

It looks a bit drastic to me that we want to force current->mm to
point to the active_mm without a real reason, just to avoid a
NULL pointer dereference.

I would first ask myself why are we getting a NULL pointer
dereference? Why do we need to use current->mm? Which memory's
should current->mm point to? Is it a kernel thread or a user
thread? Why are we peaking inside current->mm?

Remember that for kernel threads is normally fine to have
current->mm equal to NULL.

> Signed-off-by: Mikolaj Wasiak <mikolaj.wasiak@intel.com>
> ---
>
> This patch is mostly damage control. By using active_mm we expose our
> test to foreign memory mapping, which sometimes makes the test fail.
> That is still better than just having null pointer exception in driver
> code.

If we are hiding a bug with another bug then this patch is not
OK, even if the second bug is less serious. So, please, don't do
it.

>  .../drm/i915/gem/selftests/i915_gem_mman.c    | 61 ++++++++++++++-----
>  drivers/gpu/drm/i915/selftests/igt_mmap.c     | 19 ++++++
>  drivers/gpu/drm/i915/selftests/igt_mmap.h     |  3 +
>  3 files changed, 67 insertions(+), 16 deletions(-)

...

>  int i915_gem_mman_live_selftests(struct drm_i915_private *i915)
> @@ -1847,6 +1877,5 @@ int i915_gem_mman_live_selftests(struct drm_i915_private *i915)
>               SUBTEST(igt_mmap_revoke),
>               SUBTEST(igt_mmap_gpu),
>       };
> -

This change is out of context.

>       return i915_live_subtests(tests, i915);
>  }
> diff --git a/drivers/gpu/drm/i915/selftests/igt_mmap.c b/drivers/gpu/drm/i915/selftests/igt_mmap.c
> index e920a461bd36..5c63622879a2 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_mmap.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_mmap.c
> @@ -50,3 +50,22 @@ unsigned long igt_mmap_offset(struct drm_i915_private *i915,
>       fput(file);
>       return addr;
>  }
> +
> +

you have two blank lines here.

> +int igt_mmap_enable_current(void)
> +{
> +     if (current->flags & PF_KTHREAD) {
> +             if (!current->active_mm) {
> +                     pr_info("Couldn't get userspace mm in kthread\n");
> +                     return -ENODATA;

if we get here, then something really bad has happened: we have a
task without memory. Most probably, if this was true, we wouldn't
even reach this point. I might have used a GEM_BUG_ON() here.

> +             }
> +             kthread_use_mm(current->active_mm);
> +     }
> +     return 0;
> +}
> +
> +void igt_mmap_disable_current(void)
> +{
> +     if (current->flags & PF_KTHREAD)
> +             kthread_unuse_mm(current->active_mm);
> +}
> diff --git a/drivers/gpu/drm/i915/selftests/igt_mmap.h b/drivers/gpu/drm/i915/selftests/igt_mmap.h
> index acbe34d81a6d..58582396bdd7 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_mmap.h
> +++ b/drivers/gpu/drm/i915/selftests/igt_mmap.h
> @@ -18,4 +18,7 @@ unsigned long igt_mmap_offset(struct drm_i915_private *i915,
>                             unsigned long prot,
>                             unsigned long flags);
>
> +int igt_mmap_enable_current(void);
> +void igt_mmap_disable_current(void);

Please, don't make it a library call, it's just used by a few
functions in a file. No need to expose them.

Thanks,
Andi
Krzysztof Niemiec Feb. 26, 2025, 12:41 p.m. UTC | #4
On 2025-02-25 at 11:41:56 GMT, Mikolaj Wasiak wrote:
> When driver is loaded on system with numa nodes it might be run in kthread.
> This makes it impossible to use current->mm in selftests which currently
> creates null pointer exception.
> This patch allows selftest to use current->mm by using active_mm.
> 
> Signed-off-by: Mikolaj Wasiak <mikolaj.wasiak@intel.com>
> ---
> 
> This patch is mostly damage control. By using active_mm we expose our
> test to foreign memory mapping, which sometimes makes the test fail.
> That is still better than just having null pointer exception in driver
> code.
> 

We talked about this offline a bit, I'll recount what we determined here
just for the record.

On NUMA systems, the driver might be probed via a kthread consuming a
workqueue. This shouldn't be a problem as long as we don't rely on
current->mm (i915 usually doesn't, unless it's ioctls where it's fair
game) - but this test wasn't written with that in mind, hence the
derefs.

We can't just use current->active_mm in place of current->mm inside of
the test code, because one of the functions that the test uses
(vma_lookup to be exact) ends up using current->mm. That's too deep
into the kernel to ever touch it with such a patch.

The problem is, we also can't just set current->mm of the kthread
to current->active_mm, because God knows what that value can be (again,
we are executing in some random kthread that just happened to work on
the probe).

So I don't think this patch is a good idea; it IS better than just
having a NULL deref, but since active_mm is being so unreliable here,
that kind of defeats the point of running the test. Also, since this is
an mmap test and it does stuff like user pointers, it does not really
make much sense to run it in a kthread (unless it's explicitly forked
from a context, current->mm of which we have control over, but that's
not what's happening here).

I have two suggestions for a different fix:

1. Disable the test on systems with NUMA and/or if it's running in a
   kthread, on the premise that it doesn't make sense to run this
   specific test in a kthread. This is the easier option.

2. Find a way to pass an argument to the selftest, so we can pass a
   known mm to the test thread. Then set it as current->mm for the
   duration of the test like you're doing in your patch. We could pass
   the IGT process's mm to "emulate" having it being the initializer of
   the test task; this way we're being a good citizen and we don't mess
   with some other process memory. I figure this is the harder option.

I'd try 2 and if it doesn't work then just resort to 1 if there's no
better idea floating around.

Thanks
Krzysztof
Krzysztof Niemiec Feb. 26, 2025, 1:19 p.m. UTC | #5
On 2025-02-26 at 13:41:34 GMT, Krzysztof Niemiec wrote:
> On 2025-02-25 at 11:41:56 GMT, Mikolaj Wasiak wrote:
> > When driver is loaded on system with numa nodes it might be run in kthread.
> > This makes it impossible to use current->mm in selftests which currently
> > creates null pointer exception.
> > This patch allows selftest to use current->mm by using active_mm.
> > 
> > Signed-off-by: Mikolaj Wasiak <mikolaj.wasiak@intel.com>
> > ---
> > 
> > This patch is mostly damage control. By using active_mm we expose our
> > test to foreign memory mapping, which sometimes makes the test fail.
> > That is still better than just having null pointer exception in driver
> > code.
> > 
> 
> We talked about this offline a bit, I'll recount what we determined here
> just for the record.
> 
> On NUMA systems, the driver might be probed via a kthread consuming a
> workqueue. This shouldn't be a problem as long as we don't rely on
> current->mm (i915 usually doesn't, unless it's ioctls where it's fair
> game) - but this test wasn't written with that in mind, hence the
> derefs.
> 
> We can't just use current->active_mm in place of current->mm inside of
> the test code, because one of the functions that the test uses
> (vma_lookup to be exact)

I meant vm_mmap() (called by igt_mmap_offset() in the test). Slip of
tongue :')
Mikolaj Wasiak Feb. 26, 2025, 4:39 p.m. UTC | #6
Hi Andi,

> > This makes it impossible to use current->mm in selftests which currently
> > creates null pointer exception.
> > This patch allows selftest to use current->mm by using active_mm.
> 
> What is the failure you are getting? I'm not sure this might be
> the right solution, but it might make sense, though... if there
> is a valid reason.

current->mm is both used by the test and by functions it uses. It is used
as argument of vm_lookup() and inside of vm_mmap(), hence we cannot use
any other pointer.

> I would first ask myself why are we getting a NULL pointer
> dereference?

It's an expected behaviour when the system has multiple NUMA nodes.
The probe is called via work_on_cpu (pci_call_probe()) which makes it
run in separate workqueue on another cpu.
The workqueue is run in kthread and therefore we get NULL pointer.

> Why do we need to use current->mm? Which memory's
> should current->mm point to? Is it a kernel thread or a user
> thread? Why are we peaking inside current->mm?

This test is supposed to simulate user controling the graphics card -
get and set values in mapped /dev/dri/card0 file as a user.
It needs a mapping that the simulated user has access to. I don't know
how to let the user access the memory other way than creating a mapping
via vm_mmap() which uses current->mm.

> If we are hiding a bug with another bug then this patch is not
> OK, even if the second bug is less serious. So, please, don't do
> it.

Ok, I'll work on another solution.

MikoĊ‚aj
Janusz Krzysztofik Feb. 26, 2025, 6:01 p.m. UTC | #7
Hi,

On Wednesday, 26 February 2025 13:41:34 CET Krzysztof Niemiec wrote:
> On 2025-02-25 at 11:41:56 GMT, Mikolaj Wasiak wrote:
> > When driver is loaded on system with numa nodes it might be run in 
kthread.
> > This makes it impossible to use current->mm in selftests which currently
> > creates null pointer exception.
> > This patch allows selftest to use current->mm by using active_mm.
> > 
> > Signed-off-by: Mikolaj Wasiak <mikolaj.wasiak@intel.com>
> > ---
> > 
> > This patch is mostly damage control. By using active_mm we expose our
> > test to foreign memory mapping, which sometimes makes the test fail.
> > That is still better than just having null pointer exception in driver
> > code.
> > 
> 
> We talked about this offline a bit, I'll recount what we determined here
> just for the record.
> 
> On NUMA systems, the driver might be probed via a kthread consuming a
> workqueue. This shouldn't be a problem as long as we don't rely on
> current->mm (i915 usually doesn't, unless it's ioctls where it's fair
> game) - but this test wasn't written with that in mind, hence the
> derefs.
> 
> We can't just use current->active_mm in place of current->mm inside of
> the test code, because one of the functions that the test uses
> (vma_lookup to be exact) ends up using current->mm. That's too deep
> into the kernel to ever touch it with such a patch.
> 
> The problem is, we also can't just set current->mm of the kthread
> to current->active_mm, because God knows what that value can be (again,
> we are executing in some random kthread that just happened to work on
> the probe).
> 
> So I don't think this patch is a good idea; it IS better than just
> having a NULL deref, but since active_mm is being so unreliable here,
> that kind of defeats the point of running the test. Also, since this is
> an mmap test and it does stuff like user pointers, it does not really
> make much sense to run it in a kthread (unless it's explicitly forked
> from a context, current->mm of which we have control over, but that's
> not what's happening here).
> 
> I have two suggestions for a different fix:
> 
> 1. Disable the test on systems with NUMA and/or if it's running in a
>    kthread, on the premise that it doesn't make sense to run this
>    specific test in a kthread. This is the easier option.
> 
> 2. Find a way to pass an argument to the selftest, so we can pass a
>    known mm to the test thread. Then set it as current->mm for the
>    duration of the test like you're doing in your patch. We could pass
>    the IGT process's mm to "emulate" having it being the initializer of
>    the test task; this way we're being a good citizen and we don't mess
>    with some other process memory. I figure this is the harder option.
> 
> I'd try 2 and if it doesn't work then just resort to 1 if there's no
> better idea floating around.

I agree with both Andi and Krzysztof comments.

If the issue is tracked in our bug tracker then please provide a link to its 
record in a Link: or even Closes: tag.  Do you have call traces on hand?  
Probably yes, so please consider adding a concise excerpt to your description.

While looking for similar cases, I've found commit 51104c19d857 ("kunit: test: 
Add vm_mmap() allocation resource manager") that seems to have resolved a 
similar issue for then newly added kunit tests accessing current->mm.  Maybe 
the approach used there is worth of reusing it for i915 selftests.

Thanks,
Janusz


> 
> Thanks
> Krzysztof
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 804f74084bd4..34d22a99da65 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -974,6 +974,11 @@  static int igt_mmap(void *arg)
 	struct drm_i915_private *i915 = arg;
 	struct intel_memory_region *mr;
 	enum intel_region_id id;
+	int err;
+
+	err = igt_mmap_enable_current();
+	if (err)
+		return err;
 
 	for_each_memory_region(mr, i915, id) {
 		unsigned long sizes[] = {
@@ -988,7 +993,6 @@  static int igt_mmap(void *arg)
 
 		for (i = 0; i < ARRAY_SIZE(sizes); i++) {
 			struct drm_i915_gem_object *obj;
-			int err;
 
 			obj = __i915_gem_object_create_user(i915, sizes[i], &mr, 1);
 			if (obj == ERR_PTR(-ENODEV))
@@ -1005,11 +1009,13 @@  static int igt_mmap(void *arg)
 
 			i915_gem_object_put(obj);
 			if (err)
-				return err;
+				goto out_disable_current;
 		}
 	}
 
-	return 0;
+out_disable_current:
+	igt_mmap_disable_current();
+	return err;
 }
 
 static void igt_close_objects(struct drm_i915_private *i915,
@@ -1310,13 +1316,17 @@  static int igt_mmap_migrate(void *arg)
 	struct intel_memory_region *system = i915->mm.regions[INTEL_REGION_SMEM];
 	struct intel_memory_region *mr;
 	enum intel_region_id id;
+	int err;
+
+	err = igt_mmap_enable_current();
+	if (err)
+		return err;
 
 	for_each_memory_region(mr, i915, id) {
 		struct intel_memory_region *mixed[] = { mr, system };
 		struct intel_memory_region *single[] = { mr };
 		struct ttm_resource_manager *man = mr->region_private;
 		struct resource saved_io;
-		int err;
 
 		if (mr->private)
 			continue;
@@ -1400,10 +1410,12 @@  static int igt_mmap_migrate(void *arg)
 		i915_ttm_buddy_man_force_visible_size(man,
 						      resource_size(&mr->io) >> PAGE_SHIFT);
 		if (err)
-			return err;
+			goto out_disable_current;
 	}
 
-	return 0;
+out_disable_current:
+	igt_mmap_disable_current();
+	return err;
 }
 
 static const char *repr_mmap_type(enum i915_mmap_type type)
@@ -1506,10 +1518,14 @@  static int igt_mmap_access(void *arg)
 	struct drm_i915_private *i915 = arg;
 	struct intel_memory_region *mr;
 	enum intel_region_id id;
+	int err;
+
+	err = igt_mmap_enable_current();
+	if (err)
+		return err;
 
 	for_each_memory_region(mr, i915, id) {
 		struct drm_i915_gem_object *obj;
-		int err;
 
 		if (mr->private)
 			continue;
@@ -1533,10 +1549,12 @@  static int igt_mmap_access(void *arg)
 
 		i915_gem_object_put(obj);
 		if (err)
-			return err;
+			goto out_disable_current;
 	}
 
-	return 0;
+out_disable_current:
+	igt_mmap_disable_current();
+	return err;
 }
 
 static int __igt_mmap_gpu(struct drm_i915_private *i915,
@@ -1652,10 +1670,14 @@  static int igt_mmap_gpu(void *arg)
 	struct drm_i915_private *i915 = arg;
 	struct intel_memory_region *mr;
 	enum intel_region_id id;
+	int err;
+
+	err = igt_mmap_enable_current();
+	if (err)
+		return err;
 
 	for_each_memory_region(mr, i915, id) {
 		struct drm_i915_gem_object *obj;
-		int err;
 
 		if (mr->private)
 			continue;
@@ -1675,10 +1697,12 @@  static int igt_mmap_gpu(void *arg)
 
 		i915_gem_object_put(obj);
 		if (err)
-			return err;
+			goto out_disable_current;
 	}
 
-	return 0;
+out_disable_current:
+	igt_mmap_disable_current();
+	return err;
 }
 
 static int check_present_pte(pte_t *pte, unsigned long addr, void *data)
@@ -1806,10 +1830,14 @@  static int igt_mmap_revoke(void *arg)
 	struct drm_i915_private *i915 = arg;
 	struct intel_memory_region *mr;
 	enum intel_region_id id;
+	int err;
+
+	err = igt_mmap_enable_current();
+	if (err)
+		return err;
 
 	for_each_memory_region(mr, i915, id) {
 		struct drm_i915_gem_object *obj;
-		int err;
 
 		if (mr->private)
 			continue;
@@ -1829,10 +1857,12 @@  static int igt_mmap_revoke(void *arg)
 
 		i915_gem_object_put(obj);
 		if (err)
-			return err;
+			goto out_disable_current;
 	}
 
-	return 0;
+out_disable_current:
+	igt_mmap_disable_current();
+	return err;
 }
 
 int i915_gem_mman_live_selftests(struct drm_i915_private *i915)
@@ -1847,6 +1877,5 @@  int i915_gem_mman_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_mmap_revoke),
 		SUBTEST(igt_mmap_gpu),
 	};
-
 	return i915_live_subtests(tests, i915);
 }
diff --git a/drivers/gpu/drm/i915/selftests/igt_mmap.c b/drivers/gpu/drm/i915/selftests/igt_mmap.c
index e920a461bd36..5c63622879a2 100644
--- a/drivers/gpu/drm/i915/selftests/igt_mmap.c
+++ b/drivers/gpu/drm/i915/selftests/igt_mmap.c
@@ -50,3 +50,22 @@  unsigned long igt_mmap_offset(struct drm_i915_private *i915,
 	fput(file);
 	return addr;
 }
+
+
+int igt_mmap_enable_current(void)
+{
+	if (current->flags & PF_KTHREAD) {
+		if (!current->active_mm) {
+			pr_info("Couldn't get userspace mm in kthread\n");
+			return -ENODATA;
+		}
+		kthread_use_mm(current->active_mm);
+	}
+	return 0;
+}
+
+void igt_mmap_disable_current(void)
+{
+	if (current->flags & PF_KTHREAD)
+		kthread_unuse_mm(current->active_mm);
+}
diff --git a/drivers/gpu/drm/i915/selftests/igt_mmap.h b/drivers/gpu/drm/i915/selftests/igt_mmap.h
index acbe34d81a6d..58582396bdd7 100644
--- a/drivers/gpu/drm/i915/selftests/igt_mmap.h
+++ b/drivers/gpu/drm/i915/selftests/igt_mmap.h
@@ -18,4 +18,7 @@  unsigned long igt_mmap_offset(struct drm_i915_private *i915,
 			      unsigned long prot,
 			      unsigned long flags);
 
+int igt_mmap_enable_current(void);
+void igt_mmap_disable_current(void);
+
 #endif /* IGT_MMAP_H */