Message ID | 20201030100815.2269-12-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | None | expand |
On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote: > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs > files, and the old proc interface. Two check against > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM, > this starts to matter, since we don't want random userspace having > access to PCI BARs while a driver is loaded and using it. > > Fix this by adding the same iomem_is_exclusive() check we already have > on the sysfs side in pci_mmap_resource(). > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently only used in a few places: e1000_probe() calls pci_request_selected_regions_exclusive(), ne_pci_probe() calls pci_request_regions_exclusive(), vmbus_allocate_mmio() calls request_mem_region_exclusive() which raises the question of whether it's worth keeping IORESOURCE_EXCLUSIVE at all. I'm totally fine with removing it completely. But if you want it, Acked-by: Bjorn Helgaas <bhelgaas@google.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Kees Cook <keescook@chromium.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: linux-mm@kvack.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-samsung-soc@vger.kernel.org > Cc: linux-media@vger.kernel.org > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > -- > v2: Improve commit message (Bjorn) > --- > drivers/pci/proc.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c > index d35186b01d98..3a2f90beb4cb 100644 > --- a/drivers/pci/proc.c > +++ b/drivers/pci/proc.c > @@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma) > else > return -EINVAL; > } > + > + if (dev->resource[i].flags & IORESOURCE_MEM && > + iomem_is_exclusive(dev->resource[i].start)) > + return -EINVAL; > + > ret = pci_mmap_page_range(dev, i, vma, > fpriv->mmap_state, write_combine); > if (ret < 0) > -- > 2.28.0 >
On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote: > > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs > > files, and the old proc interface. Two check against > > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM, > > this starts to matter, since we don't want random userspace having > > access to PCI BARs while a driver is loaded and using it. > > > > Fix this by adding the same iomem_is_exclusive() check we already have > > on the sysfs side in pci_mmap_resource(). > > > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently > only used in a few places: > > e1000_probe() calls pci_request_selected_regions_exclusive(), > ne_pci_probe() calls pci_request_regions_exclusive(), > vmbus_allocate_mmio() calls request_mem_region_exclusive() > > which raises the question of whether it's worth keeping > IORESOURCE_EXCLUSIVE at all. I'm totally fine with removing it > completely. Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd be in favor of removing it as well.
On Tue, Nov 3, 2020 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote: > > > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs > > > files, and the old proc interface. Two check against > > > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM, > > > this starts to matter, since we don't want random userspace having > > > access to PCI BARs while a driver is loaded and using it. > > > > > > Fix this by adding the same iomem_is_exclusive() check we already have > > > on the sysfs side in pci_mmap_resource(). > > > > > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently > > only used in a few places: > > > > e1000_probe() calls pci_request_selected_regions_exclusive(), > > ne_pci_probe() calls pci_request_regions_exclusive(), > > vmbus_allocate_mmio() calls request_mem_region_exclusive() > > > > which raises the question of whether it's worth keeping > > IORESOURCE_EXCLUSIVE at all. I'm totally fine with removing it > > completely. > > Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to > IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd > be in favor of removing it as well. Still has some value since it enforces exclusive access even if the config isn't enabled, and iirc e1000 had some fun with userspace tools clobbering the firmware and bricking the chip. Another thing I kinda wondered, since pci maintainer is here: At least in drivers/gpu I see very few drivers explicitly requestion regions (this might be a historical artifact due to the shadow attach stuff before we had real modesetting drivers). And pci core doesn't do that either, even when a driver is bound. Is this intentional, or should/could we do better? Since drivers work happily without reserving regions I don't think "the drivers need to remember to do this" will ever really work out well. -Daniel
On Wed, Nov 04, 2020 at 09:44:04AM +0100, Daniel Vetter wrote: > On Tue, Nov 3, 2020 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote: > > On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote: > > > > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs > > > > files, and the old proc interface. Two check against > > > > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM, > > > > this starts to matter, since we don't want random userspace having > > > > access to PCI BARs while a driver is loaded and using it. > > > > > > > > Fix this by adding the same iomem_is_exclusive() check we already have > > > > on the sysfs side in pci_mmap_resource(). > > > > > > > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > > This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently > > > only used in a few places: > > > > > > e1000_probe() calls pci_request_selected_regions_exclusive(), > > > ne_pci_probe() calls pci_request_regions_exclusive(), > > > vmbus_allocate_mmio() calls request_mem_region_exclusive() > > > > > > which raises the question of whether it's worth keeping > > > IORESOURCE_EXCLUSIVE at all. I'm totally fine with removing it > > > completely. > > > > Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to > > IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd > > be in favor of removing it as well. > > Still has some value since it enforces exclusive access even if the > config isn't enabled, and iirc e1000 had some fun with userspace tools > clobbering the firmware and bricking the chip. There's *some* value; I'm just skeptical since only three drivers use it. IORESOURCE_EXCLUSIVE is from e8de1481fd71 ("resource: allow MMIO exclusivity for device drivers"), and the commit message says this is only active when CONFIG_STRICT_DEVMEM is set. I didn't check to see whether that's still true. That commit adds a bunch of wrappers and "__"-prefixed functions to pass the IORESOURCE_EXCLUSIVE flag around. That's a fair bit of uglification for three drivers. > Another thing I kinda wondered, since pci maintainer is here: At least > in drivers/gpu I see very few drivers explicitly requestion regions > (this might be a historical artifact due to the shadow attach stuff > before we had real modesetting drivers). And pci core doesn't do that > either, even when a driver is bound. Is this intentional, or > should/could we do better? Since drivers work happily without > reserving regions I don't think "the drivers need to remember to do > this" will ever really work out well. You're right, many drivers don't call pci_request_regions(). Maybe we could do better, but I haven't looked into that recently. There is a related note in Documentation/PCI/pci.rst that's been there for a long time (it refers to "pci_request_resources()", which has never existed AFAICT). I'm certainly open to proposals.
On Wed, Nov 4, 2020 at 8:50 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Nov 04, 2020 at 09:44:04AM +0100, Daniel Vetter wrote: > > On Tue, Nov 3, 2020 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote: > > > > > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs > > > > > files, and the old proc interface. Two check against > > > > > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM, > > > > > this starts to matter, since we don't want random userspace having > > > > > access to PCI BARs while a driver is loaded and using it. > > > > > > > > > > Fix this by adding the same iomem_is_exclusive() check we already have > > > > > on the sysfs side in pci_mmap_resource(). > > > > > > > > > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > > > > This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently > > > > only used in a few places: > > > > > > > > e1000_probe() calls pci_request_selected_regions_exclusive(), > > > > ne_pci_probe() calls pci_request_regions_exclusive(), > > > > vmbus_allocate_mmio() calls request_mem_region_exclusive() > > > > > > > > which raises the question of whether it's worth keeping > > > > IORESOURCE_EXCLUSIVE at all. I'm totally fine with removing it > > > > completely. > > > > > > Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to > > > IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd > > > be in favor of removing it as well. > > > > Still has some value since it enforces exclusive access even if the > > config isn't enabled, and iirc e1000 had some fun with userspace tools > > clobbering the firmware and bricking the chip. > > There's *some* value; I'm just skeptical since only three drivers use > it. > > IORESOURCE_EXCLUSIVE is from e8de1481fd71 ("resource: allow MMIO > exclusivity for device drivers"), and the commit message says this is > only active when CONFIG_STRICT_DEVMEM is set. I didn't check to see > whether that's still true. > > That commit adds a bunch of wrappers and "__"-prefixed functions to > pass the IORESOURCE_EXCLUSIVE flag around. That's a fair bit of > uglification for three drivers. > > > Another thing I kinda wondered, since pci maintainer is here: At least > > in drivers/gpu I see very few drivers explicitly requestion regions > > (this might be a historical artifact due to the shadow attach stuff > > before we had real modesetting drivers). And pci core doesn't do that > > either, even when a driver is bound. Is this intentional, or > > should/could we do better? Since drivers work happily without > > reserving regions I don't think "the drivers need to remember to do > > this" will ever really work out well. > > You're right, many drivers don't call pci_request_regions(). Maybe we > could do better, but I haven't looked into that recently. There is a > related note in Documentation/PCI/pci.rst that's been there for a long > time (it refers to "pci_request_resources()", which has never existed > AFAICT). I'm certainly open to proposals. It seems a bug that the kernel permits MMIO regions with side effects to be ioremap()'ed without request_mem_region() on the resource. I wonder how much log spam would happen if ioremap() reported whenever a non-IORESOURE_BUSY range was passed to it? The current state of affairs to trust *remap users to have claimed their remap target seems too ingrained to unwind now.
On Wed, Nov 04, 2020 at 12:12:15PM -0800, Dan Williams wrote: > On Wed, Nov 4, 2020 at 8:50 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Wed, Nov 04, 2020 at 09:44:04AM +0100, Daniel Vetter wrote: > > > On Tue, Nov 3, 2020 at 11:09 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > On Tue, Nov 3, 2020 at 1:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > On Fri, Oct 30, 2020 at 11:08:11AM +0100, Daniel Vetter wrote: > > > > > > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs > > > > > > files, and the old proc interface. Two check against > > > > > > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM, > > > > > > this starts to matter, since we don't want random userspace having > > > > > > access to PCI BARs while a driver is loaded and using it. > > > > > > > > > > > > Fix this by adding the same iomem_is_exclusive() check we already have > > > > > > on the sysfs side in pci_mmap_resource(). > > > > > > > > > > > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > > > > > > This is OK with me but it looks like IORESOURCE_EXCLUSIVE is currently > > > > > only used in a few places: > > > > > > > > > > e1000_probe() calls pci_request_selected_regions_exclusive(), > > > > > ne_pci_probe() calls pci_request_regions_exclusive(), > > > > > vmbus_allocate_mmio() calls request_mem_region_exclusive() > > > > > > > > > > which raises the question of whether it's worth keeping > > > > > IORESOURCE_EXCLUSIVE at all. I'm totally fine with removing it > > > > > completely. > > > > > > > > Now that CONFIG_IO_STRICT_DEVMEM upgrades IORESOURCE_BUSY to > > > > IORESOURCE_EXCLUSIVE semantics the latter has lost its meaning so I'd > > > > be in favor of removing it as well. > > > > > > Still has some value since it enforces exclusive access even if the > > > config isn't enabled, and iirc e1000 had some fun with userspace tools > > > clobbering the firmware and bricking the chip. > > > > There's *some* value; I'm just skeptical since only three drivers use > > it. > > > > IORESOURCE_EXCLUSIVE is from e8de1481fd71 ("resource: allow MMIO > > exclusivity for device drivers"), and the commit message says this is > > only active when CONFIG_STRICT_DEVMEM is set. I didn't check to see > > whether that's still true. > > > > That commit adds a bunch of wrappers and "__"-prefixed functions to > > pass the IORESOURCE_EXCLUSIVE flag around. That's a fair bit of > > uglification for three drivers. > > > > > Another thing I kinda wondered, since pci maintainer is here: At least > > > in drivers/gpu I see very few drivers explicitly requestion regions > > > (this might be a historical artifact due to the shadow attach stuff > > > before we had real modesetting drivers). And pci core doesn't do that > > > either, even when a driver is bound. Is this intentional, or > > > should/could we do better? Since drivers work happily without > > > reserving regions I don't think "the drivers need to remember to do > > > this" will ever really work out well. > > > > You're right, many drivers don't call pci_request_regions(). Maybe we > > could do better, but I haven't looked into that recently. There is a > > related note in Documentation/PCI/pci.rst that's been there for a long > > time (it refers to "pci_request_resources()", which has never existed > > AFAICT). I'm certainly open to proposals. > > It seems a bug that the kernel permits MMIO regions with side effects > to be ioremap()'ed without request_mem_region() on the resource. I > wonder how much log spam would happen if ioremap() reported whenever a > non-IORESOURE_BUSY range was passed to it? The current state of > affairs to trust *remap users to have claimed their remap target seems > too ingrained to unwind now. Yeah I think that's hopeless. I think the only feasible approach is if bus drivers claim resources by default when a driver is bound (it should nest, so if the driver claims again, I think that should all keep working), just using the driver name. Probably with some special casing for legacy io (vgaarb.c should claim these I guess). Still probably tons of fallout. Once that's rolled out to all bus drivers we could perhaps add the ioremap check without drowning in log spam. Still a multi-year project I think :-/ -Daniel
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index d35186b01d98..3a2f90beb4cb 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma) else return -EINVAL; } + + if (dev->resource[i].flags & IORESOURCE_MEM && + iomem_is_exclusive(dev->resource[i].start)) + return -EINVAL; + ret = pci_mmap_page_range(dev, i, vma, fpriv->mmap_state, write_combine); if (ret < 0)