Message ID | 20200721171936.81563-1-michael.j.ruhl@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] io-mapping: Indicate mapping failure | expand |
On Tue, 21 Jul 2020 13:19:36 -0400 "Michael J. Ruhl" <michael.j.ruhl@intel.com> wrote: > The !ATOMIC_IOMAP version of io_maping_init_wc will always return > success, even when the ioremap fails. > > Since the ATOMIC_IOMAP version returns NULL when the init fails, and > callers check for a NULL return on error this is unexpected. > > During a device probe, where the ioremap failed, a crash can look > like this: > > BUG: unable to handle page fault for address: 0000000000210000 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > Oops: 0002 [#1] PREEMPT SMP > CPU: 0 PID: 177 Comm: > RIP: 0010:fill_page_dma [i915] > gen8_ppgtt_create [i915] > i915_ppgtt_create [i915] > intel_gt_init [i915] > i915_gem_init [i915] > i915_driver_probe [i915] > pci_device_probe > really_probe > driver_probe_device > > The remap failure occurred much earlier in the probe. If it had > been propagated, the driver would have exited with an error. > > Return NULL on ioremap failure. > > ... > > --- a/include/linux/io-mapping.h > +++ b/include/linux/io-mapping.h > @@ -118,7 +118,7 @@ io_mapping_init_wc(struct io_mapping *iomap, > iomap->prot = pgprot_noncached(PAGE_KERNEL); > #endif > > - return iomap; > + return iomap->iomem ? iomap : NULL; > } > > static inline void LGTM. However I do think it would be stylistically better/typical to detect and handle the error early, rather than to blunder on, pointlessly initializing things? --- a/include/linux/io-mapping.h~io-mapping-indicate-mapping-failure-fix +++ a/include/linux/io-mapping.h @@ -107,9 +107,12 @@ io_mapping_init_wc(struct io_mapping *io resource_size_t base, unsigned long size) { + iomap->iomem = ioremap_wc(base, size); + if (!iomap->iomem) + return NULL; + iomap->base = base; iomap->size = size; - iomap->iomem = ioremap_wc(base, size); #if defined(pgprot_noncached_wc) /* archs can't agree on a name ... */ iomap->prot = pgprot_noncached_wc(PAGE_KERNEL); #elif defined(pgprot_writecombine) @@ -118,7 +121,7 @@ io_mapping_init_wc(struct io_mapping *io iomap->prot = pgprot_noncached(PAGE_KERNEL); #endif - return iomap->iomem ? iomap : NULL; + return iomap; } static inline void
>-----Original Message----- >From: Andrew Morton <akpm@linux-foundation.org> >Sent: Tuesday, July 21, 2020 4:57 PM >To: Ruhl, Michael J <michael.j.ruhl@intel.com> >Cc: dri-devel@lists.freedesktop.org; Mike Rapoport <rppt@linux.ibm.com>; >Andy Shevchenko <andriy.shevchenko@linux.intel.com>; Chris Wilson ><chris@chris-wilson.co.uk>; stable@vger.kernel.org >Subject: Re: [PATCH v2] io-mapping: Indicate mapping failure > >On Tue, 21 Jul 2020 13:19:36 -0400 "Michael J. Ruhl" ><michael.j.ruhl@intel.com> wrote: > >> The !ATOMIC_IOMAP version of io_maping_init_wc will always return >> success, even when the ioremap fails. >> >> Since the ATOMIC_IOMAP version returns NULL when the init fails, and >> callers check for a NULL return on error this is unexpected. >> >> During a device probe, where the ioremap failed, a crash can look >> like this: >> >> BUG: unable to handle page fault for address: 0000000000210000 >> #PF: supervisor write access in kernel mode >> #PF: error_code(0x0002) - not-present page >> Oops: 0002 [#1] PREEMPT SMP >> CPU: 0 PID: 177 Comm: >> RIP: 0010:fill_page_dma [i915] >> gen8_ppgtt_create [i915] >> i915_ppgtt_create [i915] >> intel_gt_init [i915] >> i915_gem_init [i915] >> i915_driver_probe [i915] >> pci_device_probe >> really_probe >> driver_probe_device >> >> The remap failure occurred much earlier in the probe. If it had >> been propagated, the driver would have exited with an error. >> >> Return NULL on ioremap failure. >> >> ... >> >> --- a/include/linux/io-mapping.h >> +++ b/include/linux/io-mapping.h >> @@ -118,7 +118,7 @@ io_mapping_init_wc(struct io_mapping *iomap, >> iomap->prot = pgprot_noncached(PAGE_KERNEL); >> #endif >> >> - return iomap; >> + return iomap->iomem ? iomap : NULL; >> } >> >> static inline void > >LGTM. However I do think it would be stylistically better/typical to >detect and handle the error early, rather than to blunder on, >pointlessly initializing things? Yeah, I pondered that, and then didn't do it... >--- a/include/linux/io-mapping.h~io-mapping-indicate-mapping-failure-fix >+++ a/include/linux/io-mapping.h >@@ -107,9 +107,12 @@ io_mapping_init_wc(struct io_mapping *io > resource_size_t base, > unsigned long size) > { >+ iomap->iomem = ioremap_wc(base, size); >+ if (!iomap->iomem) >+ return NULL; >+ This does make more sense. I am confused by the two follow up emails I just got. Shall I resubmit, or is this path (if !iomap->iomem) return NULL) now in the tree.
On Tue, 21 Jul 2020 21:02:44 +0000 "Ruhl, Michael J" <michael.j.ruhl@intel.com> wrote: > >--- a/include/linux/io-mapping.h~io-mapping-indicate-mapping-failure-fix > >+++ a/include/linux/io-mapping.h > >@@ -107,9 +107,12 @@ io_mapping_init_wc(struct io_mapping *io > > resource_size_t base, > > unsigned long size) > > { > >+ iomap->iomem = ioremap_wc(base, size); > >+ if (!iomap->iomem) > >+ return NULL; > >+ > > This does make more sense. > > I am confused by the two follow up emails I just got. One was your original patch, the other is my suggested alteration. > Shall I resubmit, or is this path (if !iomap->iomem) return NULL) > now in the tree. All is OK. If my alteration is acceptable (and, preferably, tested!) then when the time comes, I'll fold it into the base patch, add a note indicating this change and shall then send it to Linus.
On Tue, Jul 21, 2020 at 11:24 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 21 Jul 2020 21:02:44 +0000 "Ruhl, Michael J" <michael.j.ruhl@intel.com> wrote: > > > >--- a/include/linux/io-mapping.h~io-mapping-indicate-mapping-failure-fix > > >+++ a/include/linux/io-mapping.h > > >@@ -107,9 +107,12 @@ io_mapping_init_wc(struct io_mapping *io > > > resource_size_t base, > > > unsigned long size) > > > { > > >+ iomap->iomem = ioremap_wc(base, size); > > >+ if (!iomap->iomem) > > >+ return NULL; > > >+ > > > > This does make more sense. > > > > I am confused by the two follow up emails I just got. > > One was your original patch, the other is my suggested alteration. > > > Shall I resubmit, or is this path (if !iomap->iomem) return NULL) > > now in the tree. > > All is OK. If my alteration is acceptable (and, preferably, tested!) > then when the time comes, I'll fold it into the base patch, add a > note indicating this change and shall then send it to Linus. Your alternative also matches the other implementation of io_mapping_init_wc, I was kinda tempted to do that suggestion too just because of that. But then didn't send out that email. -Daniel
>-----Original Message----- >From: Andrew Morton <akpm@linux-foundation.org> >Sent: Tuesday, July 21, 2020 5:24 PM >To: Ruhl, Michael J <michael.j.ruhl@intel.com> >Cc: dri-devel@lists.freedesktop.org; Mike Rapoport <rppt@linux.ibm.com>; >Andy Shevchenko <andriy.shevchenko@linux.intel.com>; Chris Wilson ><chris@chris-wilson.co.uk>; stable@vger.kernel.org >Subject: Re: [PATCH v2] io-mapping: Indicate mapping failure > >On Tue, 21 Jul 2020 21:02:44 +0000 "Ruhl, Michael J" ><michael.j.ruhl@intel.com> wrote: > >> >--- a/include/linux/io-mapping.h~io-mapping-indicate-mapping-failure-fix >> >+++ a/include/linux/io-mapping.h >> >@@ -107,9 +107,12 @@ io_mapping_init_wc(struct io_mapping *io >> > resource_size_t base, >> > unsigned long size) >> > { >> >+ iomap->iomem = ioremap_wc(base, size); >> >+ if (!iomap->iomem) >> >+ return NULL; >> >+ >> >> This does make more sense. >> >> I am confused by the two follow up emails I just got. > >One was your original patch, the other is my suggested alteration. > >> Shall I resubmit, or is this path (if !iomap->iomem) return NULL) >> now in the tree. > >All is OK. If my alteration is acceptable (and, preferably, tested!) >then when the time comes, I'll fold it into the base patch, add a >note indicating this change and shall then send it to Linus. I am good with the change and have tested it. Instead of the system crashing I get: i915 0000:01:00.0: [drm] *ERROR* Failed to setup region(-5) type=1 i915 0000:01:00.0: Device initialization failed (-5) i915: probe of 0000:01:00.0 failed with error -5 Which is the expected error. If you would like this for the updated patch: Tested-By: Michael J. Ruhl <michael.j.ruhl@intel.com> Thanks! Mike
diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h index 0beaa3eba155..5641e06cbcf7 100644 --- a/include/linux/io-mapping.h +++ b/include/linux/io-mapping.h @@ -118,7 +118,7 @@ io_mapping_init_wc(struct io_mapping *iomap, iomap->prot = pgprot_noncached(PAGE_KERNEL); #endif - return iomap; + return iomap->iomem ? iomap : NULL; } static inline void
The !ATOMIC_IOMAP version of io_maping_init_wc will always return success, even when the ioremap fails. Since the ATOMIC_IOMAP version returns NULL when the init fails, and callers check for a NULL return on error this is unexpected. During a device probe, where the ioremap failed, a crash can look like this: BUG: unable to handle page fault for address: 0000000000210000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page Oops: 0002 [#1] PREEMPT SMP CPU: 0 PID: 177 Comm: RIP: 0010:fill_page_dma [i915] gen8_ppgtt_create [i915] i915_ppgtt_create [i915] intel_gt_init [i915] i915_gem_init [i915] i915_driver_probe [i915] pci_device_probe really_probe driver_probe_device The remap failure occurred much earlier in the probe. If it had been propagated, the driver would have exited with an error. Return NULL on ioremap failure. Fixes: cafaf14a5d8f ("io-mapping: Always create a struct to hold metadata about the io-mapping") Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> --- v2: reflect review comments --- include/linux/io-mapping.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)