Message ID | 20240616070349.250899-2-andi.shyti@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Sparse errors on the i915_gem_stolen | expand |
On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote: >Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup >warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's >supposed to return a pointer to the intel_memory_region >structure. > >Sparse complains with the following message: > >>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse: > Using plain integer as NULL pointer > >The caller checks for errors, and if no error is returned, it >stores the address of the stolen memory. Therefore, we can't >return NULL. Since we are handling a case of out-of-bounds, it's >appropriate to treat the "lmem_size < dsm_base" case as an error. which completely invalidates the point of the commit that introduced this regression. That was commit was supposed to do "let's continue, just disabling stolen". Apparently we should just revert that patch instead since it introduced 2 new issues and didn't solve what it was supposed to... for probe failures, we are completely fine leaving the warning there. Lucas De Marchi > >Return -EINVAL embedded in a pointer instead of '0' (or NULL). > >This way, we avoid a potential NULL pointer dereference. > >Since we are returning an error, it makes sense to print an error >message with drm_err() instead of a debug message using >drm_dbg(). > >Fixes: 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup warning") >Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> >Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> >--- > drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c >index 004471f60117..bd774ce713cf 100644 >--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c >+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c >@@ -937,10 +937,10 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type, > /* Use DSM base address instead for stolen memory */ > dsm_base = intel_uncore_read64(uncore, GEN6_DSMBASE) & GEN11_BDSM_MASK; > if (lmem_size < dsm_base) { >- drm_dbg(&i915->drm, >+ drm_err(&i915->drm, > "Disabling stolen memory support due to OOB placement: lmem_size = %lli vs dsm_base = %lli\n", > lmem_size, dsm_base); >- return 0; >+ return ERR_PTR(-EINVAL); > } > dsm_size = ALIGN_DOWN(lmem_size - dsm_base, SZ_1M); > } >-- >2.45.1 >
On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote: > On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote: > > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup > > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's > > supposed to return a pointer to the intel_memory_region > > structure. > > > > Sparse complains with the following message: > > > > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse: > > Using plain integer as NULL pointer > > > > The caller checks for errors, and if no error is returned, it > > stores the address of the stolen memory. Therefore, we can't > > return NULL. Since we are handling a case of out-of-bounds, it's > > appropriate to treat the "lmem_size < dsm_base" case as an error. > > which completely invalidates the point of the commit that introduced this > regression. That was commit was supposed to do "let's continue, just > disabling stolen". Yes, correct, I missed the point while fixing stuff. But patch 2 is still valid. Thanks, Andi
On Mon, Jun 17, 2024 at 04:22:11PM GMT, Andi Shyti wrote: >On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote: >> On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote: >> > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup >> > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's >> > supposed to return a pointer to the intel_memory_region >> > structure. >> > >> > Sparse complains with the following message: >> > >> > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse: >> > Using plain integer as NULL pointer >> > >> > The caller checks for errors, and if no error is returned, it >> > stores the address of the stolen memory. Therefore, we can't >> > return NULL. Since we are handling a case of out-of-bounds, it's >> > appropriate to treat the "lmem_size < dsm_base" case as an error. >> >> which completely invalidates the point of the commit that introduced this >> regression. That was commit was supposed to do "let's continue, just >> disabling stolen". > >Yes, correct, I missed the point while fixing stuff. But patch 2 >is still valid. no, it's not. It's introduced by the same commit. I went to look into this exactly because of the second issue: it broke 32b build in xe and all the CI.Hooks in xe are failing. Lucas De Marchi > >Thanks, >Andi
On Mon, Jun 17, 2024 at 10:46:07AM -0500, Lucas De Marchi wrote: > On Mon, Jun 17, 2024 at 04:22:11PM GMT, Andi Shyti wrote: > > On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote: > > > On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote: > > > > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup > > > > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's > > > > supposed to return a pointer to the intel_memory_region > > > > structure. > > > > > > > > Sparse complains with the following message: > > > > > > > > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse: > > > > Using plain integer as NULL pointer > > > > > > > > The caller checks for errors, and if no error is returned, it > > > > stores the address of the stolen memory. Therefore, we can't > > > > return NULL. Since we are handling a case of out-of-bounds, it's > > > > appropriate to treat the "lmem_size < dsm_base" case as an error. > > > > > > which completely invalidates the point of the commit that introduced this > > > regression. That was commit was supposed to do "let's continue, just > > > disabling stolen". > > > > Yes, correct, I missed the point while fixing stuff. But patch 2 > > is still valid. > > no, it's not. It's introduced by the same commit. I went to look into > this exactly because of the second issue: it broke 32b build in xe and > all the CI.Hooks in xe are failing. yes, it's broken because it's using %lli, right? In 32b it should be %li. Patch 2 is replacing %lli with %pa which should fix the 32b build. I'm sending a new series now. Andi > Lucas De Marchi > > > > > > Thanks, > > Andi
On Mon, Jun 17, 2024 at 08:38:20PM GMT, Andi Shyti wrote: >On Mon, Jun 17, 2024 at 10:46:07AM -0500, Lucas De Marchi wrote: >> On Mon, Jun 17, 2024 at 04:22:11PM GMT, Andi Shyti wrote: >> > On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote: >> > > On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote: >> > > > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup >> > > > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's >> > > > supposed to return a pointer to the intel_memory_region >> > > > structure. >> > > > >> > > > Sparse complains with the following message: >> > > > >> > > > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse: >> > > > Using plain integer as NULL pointer >> > > > >> > > > The caller checks for errors, and if no error is returned, it >> > > > stores the address of the stolen memory. Therefore, we can't >> > > > return NULL. Since we are handling a case of out-of-bounds, it's >> > > > appropriate to treat the "lmem_size < dsm_base" case as an error. >> > > >> > > which completely invalidates the point of the commit that introduced this >> > > regression. That was commit was supposed to do "let's continue, just >> > > disabling stolen". >> > >> > Yes, correct, I missed the point while fixing stuff. But patch 2 >> > is still valid. >> >> no, it's not. It's introduced by the same commit. I went to look into >> this exactly because of the second issue: it broke 32b build in xe and >> all the CI.Hooks in xe are failing. > >yes, it's broken because it's using %lli, right? In 32b it should >be %li. > >Patch 2 is replacing %lli with %pa which should fix the 32b >build. > >I'm sending a new series now. wait... but instead of reverting you are sending a new series changing the first patch to return NULL. However in your commit message you said for this version: The caller checks for errors, and if no error is returned, it stores the address of the stolen memory. Therefore, we can't return NULL. Since we are handling a case of out-of-bounds, it's appropriate to treat the "lmem_size < dsm_base" case as an error. Return -EINVAL embedded in a pointer instead of '0' (or NULL). This way, we avoid a potential NULL pointer dereference. So... what's it? Can we return NULL or not? Is this tested on that scenario with with small BAR or does the module just fail to load later and explode? Lucas De Marchi > >Andi > >> Lucas De Marchi >> >> >> > >> > Thanks, >> > Andi
Hi Lucas, On Mon, Jun 17, 2024 at 05:29:24PM -0500, Lucas De Marchi wrote: > On Mon, Jun 17, 2024 at 08:38:20PM GMT, Andi Shyti wrote: > > On Mon, Jun 17, 2024 at 10:46:07AM -0500, Lucas De Marchi wrote: > > > On Mon, Jun 17, 2024 at 04:22:11PM GMT, Andi Shyti wrote: > > > > On Mon, Jun 17, 2024 at 07:55:10AM -0500, Lucas De Marchi wrote: > > > > > On Sun, Jun 16, 2024 at 09:03:48AM GMT, Andi Shyti wrote: > > > > > > Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup > > > > > > warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's > > > > > > supposed to return a pointer to the intel_memory_region > > > > > > structure. > > > > > > > > > > > > Sparse complains with the following message: > > > > > > > > > > > > > > drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse: > > > > > > Using plain integer as NULL pointer > > > > > > > > > > > > The caller checks for errors, and if no error is returned, it > > > > > > stores the address of the stolen memory. Therefore, we can't > > > > > > return NULL. Since we are handling a case of out-of-bounds, it's > > > > > > appropriate to treat the "lmem_size < dsm_base" case as an error. > > > > > > > > > > which completely invalidates the point of the commit that introduced this > > > > > regression. That was commit was supposed to do "let's continue, just > > > > > disabling stolen". > > > > > > > > Yes, correct, I missed the point while fixing stuff. But patch 2 > > > > is still valid. > > > > > > no, it's not. It's introduced by the same commit. I went to look into > > > this exactly because of the second issue: it broke 32b build in xe and > > > all the CI.Hooks in xe are failing. > > > > yes, it's broken because it's using %lli, right? In 32b it should > > be %li. > > > > Patch 2 is replacing %lli with %pa which should fix the 32b > > build. > > > > I'm sending a new series now. > > wait... but instead of reverting you are sending a new series changing > the first patch to return NULL. However in your commit message you said > for this version: this series of two patches is not making any logical change, it's just fixing sparse errors (along with an i386 build warning). > The caller checks for errors, and if no error is returned, it > stores the address of the stolen memory. Therefore, we can't > return NULL. Since we are handling a case of out-of-bounds, it's > appropriate to treat the "lmem_size < dsm_base" case as an error. > > Return -EINVAL embedded in a pointer instead of '0' (or NULL). > > This way, we avoid a potential NULL pointer dereference. > > So... what's it? Can we return NULL or not? Is this tested on that > scenario with with small BAR or does the module > just fail to load later and explode? Originally the patch just replaced '0' with NULL, but then before sending I checked again, misread the code and changed the patch. It's perfectly safe to return NULL (as I wrote in the cover letter of v2), it just disables the stolen memory. Jonathan's original patch is right. We also discussed it offline with last European night. Please, then, ignore this v1 and consider only v2. Andi
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 004471f60117..bd774ce713cf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -937,10 +937,10 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type, /* Use DSM base address instead for stolen memory */ dsm_base = intel_uncore_read64(uncore, GEN6_DSMBASE) & GEN11_BDSM_MASK; if (lmem_size < dsm_base) { - drm_dbg(&i915->drm, + drm_err(&i915->drm, "Disabling stolen memory support due to OOB placement: lmem_size = %lli vs dsm_base = %lli\n", lmem_size, dsm_base); - return 0; + return ERR_PTR(-EINVAL); } dsm_size = ALIGN_DOWN(lmem_size - dsm_base, SZ_1M); }