diff mbox series

[1/2] drm/i915/gem: Return -EINVAL instead of '0'

Message ID 20240616070349.250899-2-andi.shyti@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Sparse errors on the i915_gem_stolen | expand

Commit Message

Andi Shyti June 16, 2024, 7:03 a.m. UTC
Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's
supposed to return a pointer to the intel_memory_region
structure.

Sparse complains with the following message:

>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse:
   Using plain integer as NULL pointer

The caller checks for errors, and if no error is returned, it
stores the address of the stolen memory. Therefore, we can't
return NULL. Since we are handling a case of out-of-bounds, it's
appropriate to treat the "lmem_size < dsm_base" case as an error.

Return -EINVAL embedded in a pointer instead of '0' (or NULL).

This way, we avoid a potential NULL pointer dereference.

Since we are returning an error, it makes sense to print an error
message with drm_err() instead of a debug message using
drm_dbg().

Fixes: 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup warning")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Lucas De Marchi June 17, 2024, 12:55 p.m. UTC | #1
On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote:
>Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
>warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's
>supposed to return a pointer to the intel_memory_region
>structure.
>
>Sparse complains with the following message:
>
>>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse:
>   Using plain integer as NULL pointer
>
>The caller checks for errors, and if no error is returned, it
>stores the address of the stolen memory. Therefore, we can't
>return NULL. Since we are handling a case of out-of-bounds, it's
>appropriate to treat the "lmem_size < dsm_base" case as an error.

which completely invalidates the point of the commit that introduced this
regression. That was commit was supposed to do "let's continue, just
disabling stolen".  Apparently we should just revert that patch instead
since it introduced 2 new issues and didn't solve what it was supposed
to... for probe failures, we are completely fine leaving the warning
there.


Lucas De Marchi

>
>Return -EINVAL embedded in a pointer instead of '0' (or NULL).
>
>This way, we avoid a potential NULL pointer dereference.
>
>Since we are returning an error, it makes sense to print an error
>message with drm_err() instead of a debug message using
>drm_dbg().
>
>Fixes: 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup warning")
>Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
>---
> drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>index 004471f60117..bd774ce713cf 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>@@ -937,10 +937,10 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
> 		/* Use DSM base address instead for stolen memory */
> 		dsm_base = intel_uncore_read64(uncore, GEN6_DSMBASE) & GEN11_BDSM_MASK;
> 		if (lmem_size < dsm_base) {
>-			drm_dbg(&i915->drm,
>+			drm_err(&i915->drm,
> 				"Disabling stolen memory support due to OOB placement: lmem_size = %lli vs dsm_base = %lli\n",
> 				lmem_size, dsm_base);
>-			return 0;
>+			return ERR_PTR(-EINVAL);
> 		}
> 		dsm_size = ALIGN_DOWN(lmem_size - dsm_base, SZ_1M);
> 	}
>-- 
>2.45.1
>
Andi Shyti June 17, 2024, 2:22 p.m. UTC | #2
On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote:
> On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote:
> > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
> > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's
> > supposed to return a pointer to the intel_memory_region
> > structure.
> > 
> > Sparse complains with the following message:
> > 
> > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse:
> >   Using plain integer as NULL pointer
> > 
> > The caller checks for errors, and if no error is returned, it
> > stores the address of the stolen memory. Therefore, we can't
> > return NULL. Since we are handling a case of out-of-bounds, it's
> > appropriate to treat the "lmem_size < dsm_base" case as an error.
> 
> which completely invalidates the point of the commit that introduced this
> regression. That was commit was supposed to do "let's continue, just
> disabling stolen". 

Yes, correct, I missed the point while fixing stuff. But patch 2
is still valid.

Thanks,
Andi
Lucas De Marchi June 17, 2024, 3:46 p.m. UTC | #3
On Mon, Jun 17, 2024 at 04:22:11PM GMT, Andi Shyti wrote:
>On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote:
>> On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote:
>> > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
>> > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's
>> > supposed to return a pointer to the intel_memory_region
>> > structure.
>> >
>> > Sparse complains with the following message:
>> >
>> > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse:
>> >   Using plain integer as NULL pointer
>> >
>> > The caller checks for errors, and if no error is returned, it
>> > stores the address of the stolen memory. Therefore, we can't
>> > return NULL. Since we are handling a case of out-of-bounds, it's
>> > appropriate to treat the "lmem_size < dsm_base" case as an error.
>>
>> which completely invalidates the point of the commit that introduced this
>> regression. That was commit was supposed to do "let's continue, just
>> disabling stolen".
>
>Yes, correct, I missed the point while fixing stuff. But patch 2
>is still valid.

no, it's not. It's introduced by the same commit. I went to look into
this exactly because of the second issue: it broke 32b build in xe and
all the CI.Hooks in xe are failing.

Lucas De Marchi


>
>Thanks,
>Andi
Andi Shyti June 17, 2024, 6:38 p.m. UTC | #4
On Mon, Jun 17, 2024 at 10:46:07AM -0500, Lucas De Marchi wrote:
> On Mon, Jun 17, 2024 at 04:22:11PM GMT, Andi Shyti wrote:
> > On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote:
> > > On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote:
> > > > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
> > > > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's
> > > > supposed to return a pointer to the intel_memory_region
> > > > structure.
> > > >
> > > > Sparse complains with the following message:
> > > >
> > > > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse:
> > > >   Using plain integer as NULL pointer
> > > >
> > > > The caller checks for errors, and if no error is returned, it
> > > > stores the address of the stolen memory. Therefore, we can't
> > > > return NULL. Since we are handling a case of out-of-bounds, it's
> > > > appropriate to treat the "lmem_size < dsm_base" case as an error.
> > > 
> > > which completely invalidates the point of the commit that introduced this
> > > regression. That was commit was supposed to do "let's continue, just
> > > disabling stolen".
> > 
> > Yes, correct, I missed the point while fixing stuff. But patch 2
> > is still valid.
> 
> no, it's not. It's introduced by the same commit. I went to look into
> this exactly because of the second issue: it broke 32b build in xe and
> all the CI.Hooks in xe are failing.

yes, it's broken because it's using %lli, right? In 32b it should
be %li.

Patch 2 is replacing %lli with %pa which should fix the 32b
build.

I'm sending a new series now.

Andi

> Lucas De Marchi
> 
> 
> > 
> > Thanks,
> > Andi
Lucas De Marchi June 17, 2024, 10:29 p.m. UTC | #5
On Mon, Jun 17, 2024 at 08:38:20PM GMT, Andi Shyti wrote:
>On Mon, Jun 17, 2024 at 10:46:07AM -0500, Lucas De Marchi wrote:
>> On Mon, Jun 17, 2024 at 04:22:11PM GMT, Andi Shyti wrote:
>> > On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote:
>> > > On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote:
>> > > > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
>> > > > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's
>> > > > supposed to return a pointer to the intel_memory_region
>> > > > structure.
>> > > >
>> > > > Sparse complains with the following message:
>> > > >
>> > > > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse:
>> > > >   Using plain integer as NULL pointer
>> > > >
>> > > > The caller checks for errors, and if no error is returned, it
>> > > > stores the address of the stolen memory. Therefore, we can't
>> > > > return NULL. Since we are handling a case of out-of-bounds, it's
>> > > > appropriate to treat the "lmem_size < dsm_base" case as an error.
>> > >
>> > > which completely invalidates the point of the commit that introduced this
>> > > regression. That was commit was supposed to do "let's continue, just
>> > > disabling stolen".
>> >
>> > Yes, correct, I missed the point while fixing stuff. But patch 2
>> > is still valid.
>>
>> no, it's not. It's introduced by the same commit. I went to look into
>> this exactly because of the second issue: it broke 32b build in xe and
>> all the CI.Hooks in xe are failing.
>
>yes, it's broken because it's using %lli, right? In 32b it should
>be %li.
>
>Patch 2 is replacing %lli with %pa which should fix the 32b
>build.
>
>I'm sending a new series now.

wait... but instead of reverting you are sending a new series changing
the first patch to return NULL. However in your commit message you said
for this version:

	The caller checks for errors, and if no error is returned, it
	stores the address of the stolen memory. Therefore, we can't
	return NULL. Since we are handling a case of out-of-bounds, it's
	appropriate to treat the "lmem_size < dsm_base" case as an error.

	Return -EINVAL embedded in a pointer instead of '0' (or NULL).

	This way, we avoid a potential NULL pointer dereference.

So... what's it?  Can we return NULL or not? Is this tested on that
scenario with with small BAR or does the module
just fail to load later and explode?

Lucas De Marchi

>
>Andi
>
>> Lucas De Marchi
>>
>>
>> >
>> > Thanks,
>> > Andi
Andi Shyti June 18, 2024, 7:26 a.m. UTC | #6
Hi Lucas,

On Mon, Jun 17, 2024 at 05:29:24PM -0500, Lucas De Marchi wrote:
> On Mon, Jun 17, 2024 at 08:38:20PM GMT, Andi Shyti wrote:
> > On Mon, Jun 17, 2024 at 10:46:07AM -0500, Lucas De Marchi wrote:
> > > On Mon, Jun 17, 2024 at 04:22:11PM GMT, Andi Shyti wrote:
> > > > On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote:
> > > > > On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote:
> > > > > > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
> > > > > > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's
> > > > > > supposed to return a pointer to the intel_memory_region
> > > > > > structure.
> > > > > >
> > > > > > Sparse complains with the following message:
> > > > > >
> > > > > > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse:
> > > > > >   Using plain integer as NULL pointer
> > > > > >
> > > > > > The caller checks for errors, and if no error is returned, it
> > > > > > stores the address of the stolen memory. Therefore, we can't
> > > > > > return NULL. Since we are handling a case of out-of-bounds, it's
> > > > > > appropriate to treat the "lmem_size < dsm_base" case as an error.
> > > > >
> > > > > which completely invalidates the point of the commit that introduced this
> > > > > regression. That was commit was supposed to do "let's continue, just
> > > > > disabling stolen".
> > > >
> > > > Yes, correct, I missed the point while fixing stuff. But patch 2
> > > > is still valid.
> > > 
> > > no, it's not. It's introduced by the same commit. I went to look into
> > > this exactly because of the second issue: it broke 32b build in xe and
> > > all the CI.Hooks in xe are failing.
> > 
> > yes, it's broken because it's using %lli, right? In 32b it should
> > be %li.
> > 
> > Patch 2 is replacing %lli with %pa which should fix the 32b
> > build.
> > 
> > I'm sending a new series now.
> 
> wait... but instead of reverting you are sending a new series changing
> the first patch to return NULL. However in your commit message you said
> for this version:

this series of two patches is not making any logical change, it's
just fixing sparse errors (along with an i386 build warning).

> 	The caller checks for errors, and if no error is returned, it
> 	stores the address of the stolen memory. Therefore, we can't
> 	return NULL. Since we are handling a case of out-of-bounds, it's
> 	appropriate to treat the "lmem_size < dsm_base" case as an error.
> 
> 	Return -EINVAL embedded in a pointer instead of '0' (or NULL).
> 
> 	This way, we avoid a potential NULL pointer dereference.
> 
> So... what's it?  Can we return NULL or not? Is this tested on that
> scenario with with small BAR or does the module
> just fail to load later and explode?

Originally the patch just replaced '0' with NULL, but then before
sending I checked again, misread the code and changed the patch.
It's perfectly safe to return NULL (as I wrote in the cover
letter of v2), it just disables the stolen memory.

Jonathan's original patch is right. We also discussed it offline
with last European night.

Please, then, ignore this v1 and consider only v2.

Andi
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 004471f60117..bd774ce713cf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -937,10 +937,10 @@  i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
 		/* Use DSM base address instead for stolen memory */
 		dsm_base = intel_uncore_read64(uncore, GEN6_DSMBASE) & GEN11_BDSM_MASK;
 		if (lmem_size < dsm_base) {
-			drm_dbg(&i915->drm,
+			drm_err(&i915->drm,
 				"Disabling stolen memory support due to OOB placement: lmem_size = %lli vs dsm_base = %lli\n",
 				lmem_size, dsm_base);
-			return 0;
+			return ERR_PTR(-EINVAL);
 		}
 		dsm_size = ALIGN_DOWN(lmem_size - dsm_base, SZ_1M);
 	}