Message ID | 20201009075934.3509076-10-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | follow_pfn and other iomap races | expand |
Hi, Em Fri, 9 Oct 2020 09:59:26 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > Way back it was a reasonable assumptions that iomem mappings never > change the pfn range they point at. But this has changed: > > - gpu drivers dynamically manage their memory nowadays, invalidating > ptes with unmap_mapping_range when buffers get moved > > - contiguous dma allocations have moved from dedicated carvetouts to > cma regions. This means if we miss the unmap the pfn might contain > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > - even /dev/mem now invalidates mappings when the kernel requests that > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > ("/dev/mem: Revoke mappings when a driver claims the region") > > Accessing pfns obtained from ptes without holding all the locks is > therefore no longer a good idea. > > Unfortunately there's some users where this is not fixable (like v4l > userptr of iomem mappings) or involves a pile of work (vfio type1 > iommu). For now annotate these as unsafe and splat appropriately. > > This patch adds an unsafe_follow_pfn, which later patches will then > roll out to all appropriate places. NACK, as this breaks an existing userspace API on media. While I agree that using the userptr on media is something that new drivers may not support, as DMABUF is a better way of handling it, changing this for existing ones is a big no, as it may break usersapace. The right approach here would be to be able to fine-tune support for it on a per-driver basis, e. g. disabling such feature only for drivers that would use a movable page. The media subsystem has already a way to disable USERPTR support from VB2. the right approach would be to ensure that newer drivers will only set this if they won't use movable pages. Regards, Mauro > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Kees Cook <keescook@chromium.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: linux-mm@kvack.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-samsung-soc@vger.kernel.org > Cc: linux-media@vger.kernel.org > Cc: kvm@vger.kernel.org > --- > include/linux/mm.h | 2 ++ > mm/memory.c | 32 +++++++++++++++++++++++++++++++- > mm/nommu.c | 17 +++++++++++++++++ > security/Kconfig | 13 +++++++++++++ > 4 files changed, 63 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2a16631c1fda..ec8c90928fc9 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address, > pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp); > int follow_pfn(struct vm_area_struct *vma, unsigned long address, > unsigned long *pfn); > +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, > + unsigned long *pfn); > int follow_phys(struct vm_area_struct *vma, unsigned long address, > unsigned int flags, unsigned long *prot, resource_size_t *phys); > int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, > diff --git a/mm/memory.c b/mm/memory.c > index f7cbc4dde0ef..7c7b234ffb24 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd); > * @address: user virtual address > * @pfn: location to store found PFN > * > - * Only IO mappings and raw PFN mappings are allowed. > + * Only IO mappings and raw PFN mappings are allowed. Note that callers must > + * ensure coherency with pte updates by using a &mmu_notifier to follow updates. > + * If this is not feasible, or the access to the @pfn is only very short term, > + * use follow_pte_pmd() instead and hold the pagetable lock for the duration of > + * the access instead. Any caller not following these requirements must use > + * unsafe_follow_pfn() instead. > * > * Return: zero and the pfn at @pfn on success, -ve otherwise. > */ > @@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, > } > EXPORT_SYMBOL(follow_pfn); > > +/** > + * unsafe_follow_pfn - look up PFN at a user virtual address > + * @vma: memory mapping > + * @address: user virtual address > + * @pfn: location to store found PFN > + * > + * Only IO mappings and raw PFN mappings are allowed. > + * > + * Returns zero and the pfn at @pfn on success, -ve otherwise. > + */ > +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, > + unsigned long *pfn) > +{ > +#ifdef CONFIG_STRICT_FOLLOW_PFN > + pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n"); > + return -EINVAL; > +#else > + WARN_ONCE(1, "unsafe follow_pfn usage\n"); > + add_taint(TAINT_USER, LOCKDEP_STILL_OK); > + > + return follow_pfn(vma, address, pfn); > +#endif > +} > +EXPORT_SYMBOL(unsafe_follow_pfn); > + > #ifdef CONFIG_HAVE_IOREMAP_PROT > int follow_phys(struct vm_area_struct *vma, > unsigned long address, unsigned int flags, > diff --git a/mm/nommu.c b/mm/nommu.c > index 75a327149af1..3db2910f0d64 100644 > --- a/mm/nommu.c > +++ b/mm/nommu.c > @@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, > } > EXPORT_SYMBOL(follow_pfn); > > +/** > + * unsafe_follow_pfn - look up PFN at a user virtual address > + * @vma: memory mapping > + * @address: user virtual address > + * @pfn: location to store found PFN > + * > + * Only IO mappings and raw PFN mappings are allowed. > + * > + * Returns zero and the pfn at @pfn on success, -ve otherwise. > + */ > +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, > + unsigned long *pfn) > +{ > + return follow_pfn(vma, address, pfn); > +} > +EXPORT_SYMBOL(unsafe_follow_pfn); > + > LIST_HEAD(vmap_area_list); > > void vfree(const void *addr) > diff --git a/security/Kconfig b/security/Kconfig > index 7561f6f99f1d..48945402e103 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH > If you wish for all usermode helper programs to be disabled, > specify an empty string here (i.e. ""). > > +config STRICT_FOLLOW_PFN > + bool "Disable unsafe use of follow_pfn" > + depends on MMU > + help > + Some functionality in the kernel follows userspace mappings to iomem > + ranges in an unsafe matter. Examples include v4l userptr for zero-copy > + buffers sharing. > + > + If this option is switched on, such access is rejected. Only enable > + this option when you must run userspace which requires this. > + > + If in doubt, say Y. > + > source "security/selinux/Kconfig" > source "security/smack/Kconfig" > source "security/tomoyo/Kconfig" Thanks, Mauro
On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote: > Hi, > > Em Fri, 9 Oct 2020 09:59:26 +0200 > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > > > Way back it was a reasonable assumptions that iomem mappings never > > change the pfn range they point at. But this has changed: > > > > - gpu drivers dynamically manage their memory nowadays, invalidating > > ptes with unmap_mapping_range when buffers get moved > > > > - contiguous dma allocations have moved from dedicated carvetouts to > > cma regions. This means if we miss the unmap the pfn might contain > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > > > - even /dev/mem now invalidates mappings when the kernel requests that > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > > ("/dev/mem: Revoke mappings when a driver claims the region") > > > > Accessing pfns obtained from ptes without holding all the locks is > > therefore no longer a good idea. > > > > Unfortunately there's some users where this is not fixable (like v4l > > userptr of iomem mappings) or involves a pile of work (vfio type1 > > iommu). For now annotate these as unsafe and splat appropriately. > > > > This patch adds an unsafe_follow_pfn, which later patches will then > > roll out to all appropriate places. > > NACK, as this breaks an existing userspace API on media. It doesn't break it. You get a big warning the thing is broken and it keeps working. We can't leave such a huge security hole open - it impacts other subsystems, distros need to be able to run in a secure mode. > While I agree that using the userptr on media is something that > new drivers may not support, as DMABUF is a better way of > handling it, changing this for existing ones is a big no, > as it may break usersapace. media community needs to work to fix this, not pretend it is OK to keep going as-is. Dealing with security issues is the one case where an uABI break might be acceptable. If you want to NAK it then you need to come up with the work to do something here correctly that will support the old drivers without the kernel taint. Unfortunately making things uncomfortable for the subsystem is the big hammer the core kernel needs to use to actually get this security work done by those responsible. Jason
Em Fri, 9 Oct 2020 09:21:11 -0300 Jason Gunthorpe <jgg@ziepe.ca> escreveu: > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote: > > Hi, > > > > Em Fri, 9 Oct 2020 09:59:26 +0200 > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > > > > > Way back it was a reasonable assumptions that iomem mappings never > > > change the pfn range they point at. But this has changed: > > > > > > - gpu drivers dynamically manage their memory nowadays, invalidating > > > ptes with unmap_mapping_range when buffers get moved > > > > > > - contiguous dma allocations have moved from dedicated carvetouts to > > > cma regions. This means if we miss the unmap the pfn might contain > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > > > > > - even /dev/mem now invalidates mappings when the kernel requests that > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > > > ("/dev/mem: Revoke mappings when a driver claims the region") > > > > > > Accessing pfns obtained from ptes without holding all the locks is > > > therefore no longer a good idea. > > > > > > Unfortunately there's some users where this is not fixable (like v4l > > > userptr of iomem mappings) or involves a pile of work (vfio type1 > > > iommu). For now annotate these as unsafe and splat appropriately. > > > > > > This patch adds an unsafe_follow_pfn, which later patches will then > > > roll out to all appropriate places. > > > > NACK, as this breaks an existing userspace API on media. > > It doesn't break it. You get a big warning the thing is broken and it > keeps working. > > We can't leave such a huge security hole open - it impacts other > subsystems, distros need to be able to run in a secure mode. Well, if distros disable it, then apps will break. > > While I agree that using the userptr on media is something that > > new drivers may not support, as DMABUF is a better way of > > handling it, changing this for existing ones is a big no, > > as it may break usersapace. > > media community needs to work to fix this, not pretend it is OK to > keep going as-is. > Dealing with security issues is the one case where an uABI break might > be acceptable. > > If you want to NAK it then you need to come up with the work to do > something here correctly that will support the old drivers without the > kernel taint. > > Unfortunately making things uncomfortable for the subsystem is the big > hammer the core kernel needs to use to actually get this security work > done by those responsible. I'm not pretending that this is ok. Just pointing that the approach taken is NOT OK. I'm not a mm/ expert, but, from what I understood from Daniel's patch description is that this is unsafe *only if* __GFP_MOVABLE is used. Well, no drivers inside the media subsystem uses such flag, although they may rely on some infrastructure that could be using it behind the bars. If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE flag that it would be denying the core mm code to set __GFP_MOVABLE. Please let address the issue on this way, instead of broken an userspace API that it is there since 1991. Thanks, Mauro
Em Fri, 9 Oct 2020 14:37:23 +0200 Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu: > Em Fri, 9 Oct 2020 09:21:11 -0300 > Jason Gunthorpe <jgg@ziepe.ca> escreveu: > > > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote: > > > Hi, > > > > > > Em Fri, 9 Oct 2020 09:59:26 +0200 > > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > > > > > > > Way back it was a reasonable assumptions that iomem mappings never > > > > change the pfn range they point at. But this has changed: > > > > > > > > - gpu drivers dynamically manage their memory nowadays, invalidating > > > > ptes with unmap_mapping_range when buffers get moved > > > > > > > > - contiguous dma allocations have moved from dedicated carvetouts to > > > > cma regions. This means if we miss the unmap the pfn might contain > > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > > > > > > > - even /dev/mem now invalidates mappings when the kernel requests that > > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > > > > ("/dev/mem: Revoke mappings when a driver claims the region") > > > > > > > > Accessing pfns obtained from ptes without holding all the locks is > > > > therefore no longer a good idea. > > > > > > > > Unfortunately there's some users where this is not fixable (like v4l > > > > userptr of iomem mappings) or involves a pile of work (vfio type1 > > > > iommu). For now annotate these as unsafe and splat appropriately. > > > > > > > > This patch adds an unsafe_follow_pfn, which later patches will then > > > > roll out to all appropriate places. > > > > > > NACK, as this breaks an existing userspace API on media. > > > > It doesn't break it. You get a big warning the thing is broken and it > > keeps working. > > > > We can't leave such a huge security hole open - it impacts other > > subsystems, distros need to be able to run in a secure mode. > > Well, if distros disable it, then apps will break. > > > > While I agree that using the userptr on media is something that > > > new drivers may not support, as DMABUF is a better way of > > > handling it, changing this for existing ones is a big no, > > > as it may break usersapace. > > > > media community needs to work to fix this, not pretend it is OK to > > keep going as-is. > > > Dealing with security issues is the one case where an uABI break might > > be acceptable. > > > > If you want to NAK it then you need to come up with the work to do > > something here correctly that will support the old drivers without the > > kernel taint. > > > > Unfortunately making things uncomfortable for the subsystem is the big > > hammer the core kernel needs to use to actually get this security work > > done by those responsible. > > > I'm not pretending that this is ok. Just pointing that the approach > taken is NOT OK. > > I'm not a mm/ expert, but, from what I understood from Daniel's patch > description is that this is unsafe *only if* __GFP_MOVABLE is used. > > Well, no drivers inside the media subsystem uses such flag, although > they may rely on some infrastructure that could be using it behind > the bars. > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > Please let address the issue on this way, instead of broken an > userspace API that it is there since 1991. In time: I meant to say 1998. Thanks, Mauro
On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > I'm not a mm/ expert, but, from what I understood from Daniel's patch > description is that this is unsafe *only if* __GFP_MOVABLE is used. No, it is unconditionally unsafe. The CMA movable mappings are specific VMAs that will have bad issues here, but there are other types too. The only way to do something at a VMA level is to have a list of OK VMAs, eg because they were creatd via a special mmap helper from the media subsystem. > Well, no drivers inside the media subsystem uses such flag, although > they may rely on some infrastructure that could be using it behind > the bars. It doesn't matter, nothing prevents the user from calling media APIs on mmaps it gets from other subsystems. > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > flag that it would be denying the core mm code to set __GFP_MOVABLE. We can't tell from the VMA these kinds of details.. It has to go the other direction, evey mmap that might be used as a userptr here has to be found and the VMA specially created to allow its use. At least that is a kernel only change, but will need people with the HW to do this work. > Please let address the issue on this way, instead of broken an > userspace API that it is there since 1991. It has happened before :( It took 4 years for RDMA to undo the uAPI breakage caused by a security fix for something that was a 15 years old bug. Jason
On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch > > description is that this is unsafe *only if* __GFP_MOVABLE is used. > > No, it is unconditionally unsafe. The CMA movable mappings are > specific VMAs that will have bad issues here, but there are other > types too. > > The only way to do something at a VMA level is to have a list of OK > VMAs, eg because they were creatd via a special mmap helper from the > media subsystem. > > > Well, no drivers inside the media subsystem uses such flag, although > > they may rely on some infrastructure that could be using it behind > > the bars. > > It doesn't matter, nothing prevents the user from calling media APIs > on mmaps it gets from other subsystems. I think a good first step would be to disable userptr of non struct page backed storage going forward for any new hw support. Even on existing drivers. dma-buf sharing has been around for long enough now that this shouldn't be a problem. Unfortunately right now this doesn't seem to exist, so the entire problem keeps getting perpetuated. > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > We can't tell from the VMA these kinds of details.. > > It has to go the other direction, evey mmap that might be used as a > userptr here has to be found and the VMA specially created to allow > its use. At least that is a kernel only change, but will need people > with the HW to do this work. I think the only reasonable way to keep this working is: - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma); - add dma-buf export support to fbdev and v4l - roll this out everywhere we still need it. Realistically this just isn't going to happen. And anything else just reimplements half of dma-buf, which is kinda pointless (you need minimally refcounting and some way to get at a promise of a permanent sg list for dma. Plus probably the vmap for kernel cpu access. > > Please let address the issue on this way, instead of broken an > > userspace API that it is there since 1991. > > It has happened before :( It took 4 years for RDMA to undo the uAPI > breakage caused by a security fix for something that was a 15 years > old bug. Yeah we have a bunch of these on the drm side too. Some of them are really just "you have to upgrade userspace", and there's no real fix for the security nightmare without that. -Daniel
On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote: > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > > > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > > > We can't tell from the VMA these kinds of details.. > > > > It has to go the other direction, evey mmap that might be used as a > > userptr here has to be found and the VMA specially created to allow > > its use. At least that is a kernel only change, but will need people > > with the HW to do this work. > > I think the only reasonable way to keep this working is: > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma); > - add dma-buf export support to fbdev and v4l > - roll this out everywhere we still need it. It seems to me there is a technical way forward to restore user compat, so it is really no different than RDMA/DRM pain we both suffered before. Thus no justification to NAK it. If media wants things to keep working they have to do the technical path like you outline above. > Realistically this just isn't going to happen. If your series goes ahead it will get solved. Someone will take on the huge project to either add DMA buf to the drivers people still care about, or do the work above to transparently handle in kernel. If we allow things to keep working without consequence then nobody will do it. The only reason we did the 4 years of work in RDMA was because Linus went in and broke the uABI for a security fix. It was hundreds of patches to fix it, so I don't have much sympathy for "it is too hard" here. Jason
On Fri, Oct 9, 2020 at 8:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote: > > > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > > > > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > > > > > We can't tell from the VMA these kinds of details.. > > > > > > It has to go the other direction, evey mmap that might be used as a > > > userptr here has to be found and the VMA specially created to allow > > > its use. At least that is a kernel only change, but will need people > > > with the HW to do this work. > > > > I think the only reasonable way to keep this working is: > > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma); > > - add dma-buf export support to fbdev and v4l > > - roll this out everywhere we still need it. > > It seems to me there is a technical way forward to restore user > compat, so it is really no different than RDMA/DRM pain we both > suffered before. > > Thus no justification to NAK it. If media wants things to keep working > they have to do the technical path like you outline above. > > > Realistically this just isn't going to happen. > > If your series goes ahead it will get solved. Someone will take on the > huge project to either add DMA buf to the drivers people still care > about, or do the work above to transparently handle in kernel. > > If we allow things to keep working without consequence then nobody > will do it. > > The only reason we did the 4 years of work in RDMA was because Linus > went in and broke the uABI for a security fix. It was hundreds of > patches to fix it, so I don't have much sympathy for "it is too hard" > here. Oh fully agreeing with you here, I just wanted to lay out that a) there is a solid plan to fix it and b) it's way too much work for me to just type it as a part of a "learn me some core mm semantics" project :-) I was hoping that we could get away with a special marker for problematic vma, and filter those out. But after all the digging I've noticed that on anything remotely modern, there's just nothing left. Device memory management has become massively more dynamic in the past 10 years. -Daniel
Em Fri, 9 Oct 2020 19:52:05 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > > > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch > > > description is that this is unsafe *only if* __GFP_MOVABLE is used. > > > > No, it is unconditionally unsafe. The CMA movable mappings are > > specific VMAs that will have bad issues here, but there are other > > types too. I didn't check the mm dirty details, but I strongly suspect that the mm code has a way to prevent moving a mmapped page while it is still in usage. If not, then the entire movable pages concept sounds broken to me, and has to be fixed at mm subsystem. > > > > The only way to do something at a VMA level is to have a list of OK > > VMAs, eg because they were creatd via a special mmap helper from the > > media subsystem. I'm not sure if I'm following you on that. The media API can work with different ways of sending buffers to userspace: - read(); - using the overlay mode. This interface is deprecated in practice, being replaced by DMABUF. Only a few older hardware supports it, and it depends on an special X11 helper driver for it to work. - via DMABUF: https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html - via mmap, using a mmap helper: https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html - via mmap, using userspace-provided pointers: https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html The existing open-source programs usually chose one or more of the above modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O mode is not supported, userspace has to select a different method. Most userspace open source programs have fallback support: if one mmap I/O method fails, it selects another one, although this is not a mandatory requirement. I found (and fixed) a few ones that were relying exclusively on userptr support, but I didn't make a comprehensive check. Also there are a number of relevant closed-source apps that we have no idea about what methods they use, like Skype, and other similar videoconferencing programs. Breaking support for those, specially at a time where people are relying on it in order to participate on conferences and doing internal meetings is a **very bad** idea. So, whatever solution is taken, it should not be dumping warning messages at the system and tainting the Kernel, but, instead, checking if the userspace request is valid or not. If it is invalid, return the proper error code via the right V4L2 ioctl, in order to let userspace choose a different method. I the request is valid, refcount the pages for them to not be moved while they're still in usage. - Let me provide some background about how things work at the media subsytem. If you want to know more, the userspace-provided memory mapped pointers work is described here: https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp Basically, userspace calls either one of those ioctls: VIDIOC_CREATE_BUFS: https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs() VIDIOC_REQBUFS https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs() Both internally calls vb2_verify_memory_type(), which is responsible for checking if the provided pointers are OK for the usage and/or do all necessary page allocations, and taking care of any special requirements. This could easily have some additional checks to verify if the requested VMA address has pages that are movable or not, ensuring that ensure that the VMA is OK, and locking them in order to prevent the mm code to move such pages while they are in usage by the media subsystem. Now, as I said before, I don't know the dirty details about how to lock those pages at the mm subsystem in order to avoid it to move the used pages. Yet, when vb2_create_framevec() is called, the check if VMA is OK should already be happened at vb2_verify_memory_type(). - It should be noticed that the dirty hack added by patch 09/17 and 10/17 affects *all* types of memory allocations at V4L2, as this kAPI is called by the 3 different memory models supported at the media subsystem: drivers/media/common/videobuf2/videobuf2-vmalloc.c drivers/media/common/videobuf2/videobuf2-dma-contig.c drivers/media/common/videobuf2/videobuf2-dma-sg.c In other words, with this code: int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, unsigned long *pfn) { #ifdef CONFIG_STRICT_FOLLOW_PFN pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n"); return -EINVAL; #else WARN_ONCE(1, "unsafe follow_pfn usage\n"); add_taint(TAINT_USER, LOCKDEP_STILL_OK); return follow_pfn(vma, address, pfn); #endif you're unconditionally breaking the media userspace API support not only for embedded systems that could be using userptr instead of DMA_BUF, but also for *all* video devices, including USB cameras. This is **NOT** an acceptable solution. So, I stand my NACK to this approach. > > > Well, no drivers inside the media subsystem uses such flag, although > > > they may rely on some infrastructure that could be using it behind > > > the bars. > > > > It doesn't matter, nothing prevents the user from calling media APIs > > on mmaps it gets from other subsystems. > > I think a good first step would be to disable userptr of non struct > page backed storage going forward for any new hw support. Even on > existing drivers. dma-buf sharing has been around for long enough now > that this shouldn't be a problem. Unfortunately right now this doesn't > seem to exist, so the entire problem keeps getting perpetuated. Well, the media uAPI does support DMABUF (both import and export): https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf And I fully agree that newer programs should use DMABUF when sharing buffers with DRM subsystem, but that's not my main concern. What I do want is to not break userspace support nor to taint the Kernel due to a valid uAPI call. A valid uAPI call should check if the parameters passed though it are valid. If they are, it should handle. Otherwise, it should return -EINVAL, without tainting the Kernel or printing warning messages. The approach took by patches 09/17 and 10/17 doesn't do that. Instead, they just unconditionally breaks the media uAPI. What should be done, instead, is to drop patch 10/17, and work on a way for the code inside vb2_create_framevec() to ensure that, if USERPTR is used, the memory pages will be properly locked while the driver is using, returning -EINVAL only if there's no way to proceed, without tainting the Kernel. > > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > > > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > > > We can't tell from the VMA these kinds of details.. > > > > It has to go the other direction, evey mmap that might be used as a > > userptr here has to be found and the VMA specially created to allow > > its use. At least that is a kernel only change, but will need people > > with the HW to do this work. > > I think the only reasonable way to keep this working is: > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma); Not sure how an userspace buffer could be mapped to be using it, specially since the buffer may not even be imported/exported from the DRM subsystem, but it could simply be allocated via glibc calloc()/malloc(). > - add dma-buf export support to fbdev and v4l V4L has support for it already. > - roll this out everywhere we still need it. That's where things are hard. This is not like DRM, where the APIs are called via some open source libraries that are also managed by DRM upstream developers. In the case of the media subsystem, while we added a libv4l sometime ago, not all userspace apps use it, as a large part of them used to exist before the addition of the libraries. Also, we're currently trying to deprecate libv4l, at least for embedded systems, in favor of libcamera. On media, there are lots of closed source apps that uses the media API directly. Even talking about open source ones, there are lots of different apps, including not only media-specific apps, but also generic ones, like web browsers, which don't use the libraries we wrote. An userspace API breakage would take *huge* efforts and will take lots of time to have it merged everywhere. It will cause lots of troubles everywhere. > Realistically this just isn't going to happen. Why not? Any app developer could already use DMA-BUF if required, as the upstream support was added several years ago. > And anything else just > reimplements half of dma-buf, It is just the opposite: those uAPIs came years before dma-buf. In other words, it was dma-buf that re-implemented it ;-) Now, I agree with you that dma-buf is a way cooler than the past alternatives. - It sounds to me that you're considering on only one use case of USERPTR: to pass a buffer created from DRM. As far as I'm aware, only embedded userspace applications actually do that. Yet, there are a number of other applications that do something like the userptr_capture() function on this code: https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c E. g. using glibc alloc functions like calloc() to allocate memory, passing the user-allocated data to the Kernel via something like this: struct v4l2_requestbuffers req; struct v4l2_buffer buf; int n_buffers = 2; req.count = 2; req.type = V4L2_BUF_TYPE_VIDEO_CAPTURE; req.memory = V4L2_MEMORY_USERPTR; if (ioctl(fd, VIDIOC_REQBUFS, &req)) return errno; for (i = 0; i < req.count; ++i) { buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE; buf.memory = V4L2_MEMORY_USERPTR; buf.index = i; buf.m.userptr = (unsigned long)buffers[i].start; buf.length = buffers[i].length; if (ioctl(fd, VIDIOC_QBUF, &buf)) return errno; } if (ioctl(fd, VIDIOC_STREAMON, &req.type)) return errno; /* some capture loop */ ioctl(fd, VIDIOC_STREAMOFF, &req.type); I can't possibly see *any* security issues with the above code. As I said before, VIDIOC_REQBUFS should be checking if the userspace buffers are OK and ensure that their refcounts will be incremented, in order to avoid mm to move the pages used there, freeing the refconts when VIDIOC_STREAMOFF - or close(fd) - is called. > which is kinda pointless (you need > minimally refcounting and some way to get at a promise of a permanent > sg list for dma. Plus probably the vmap for kernel cpu access. Yeah, refcounting needs to happen. Thanks, Mauro
Hi Mauro, You might want to read the patches more carefully, because what you're demanding is what my patches do. Short summary: - if STRICT_FOLLOW_PFN is set: a) normal memory is handled as-is (i.e. your example works) through the addition of FOLL_LONGTERM. This is the "pin the pages correctly" approach you're demanding b) for non-page memory (zerocopy sharing before dma-buf was upstreamed is the only use-case for this) it is correctly rejected with -EINVAL - if you do have blobby userspace which requires the zero-copy using userptr to work, and doesn't have any of the fallbacks implemented that you describe, this would be a regression. That's why STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue in this usage, Marek also confirmed that the removal of the vma copy code a few years ago essentially broke even the weak assumptions that made the code work 10+ years ago when it was merged. so tdlr; Everything you described will keep working even with the new flag set, and everything you demand must be implemented _is_ implemented in this patch series. Also please keep in mind that we are _not_ talking about the general userptr support that was merge ~20 years ago. This patch series here is _only_ about the zerocpy userptr support merged with 50ac952d2263 ("[media] videobuf2-dma-sg: Support io userptr operations on io memory") in 2013. Why this hack was merged in 2013 when we merged dma-buf almost 2 years before that I have no idea about. Imo that patch simply should never have landed, and instead dma-buf support prioritized. Cheers, Daniel On Sat, Oct 10, 2020 at 11:21 AM Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote: > > Em Fri, 9 Oct 2020 19:52:05 +0200 > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > > > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > > > > > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch > > > > description is that this is unsafe *only if* __GFP_MOVABLE is used. > > > > > > No, it is unconditionally unsafe. The CMA movable mappings are > > > specific VMAs that will have bad issues here, but there are other > > > types too. > > I didn't check the mm dirty details, but I strongly suspect that the mm > code has a way to prevent moving a mmapped page while it is still in usage. > > If not, then the entire movable pages concept sounds broken to me, and > has to be fixed at mm subsystem. > > > > > > > The only way to do something at a VMA level is to have a list of OK > > > VMAs, eg because they were creatd via a special mmap helper from the > > > media subsystem. > > I'm not sure if I'm following you on that. The media API can work with > different ways of sending buffers to userspace: > > - read(); > > - using the overlay mode. This interface is deprecated in > practice, being replaced by DMABUF. Only a few older hardware > supports it, and it depends on an special X11 helper driver > for it to work. > > - via DMABUF: > https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html > > - via mmap, using a mmap helper: > https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html > > - via mmap, using userspace-provided pointers: > https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html > > The existing open-source programs usually chose one or more of the above > modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O > mode is not supported, userspace has to select a different method. > > Most userspace open source programs have fallback support: if one > mmap I/O method fails, it selects another one, although this is not > a mandatory requirement. I found (and fixed) a few ones that were > relying exclusively on userptr support, but I didn't make a > comprehensive check. > > Also there are a number of relevant closed-source apps that we have no > idea about what methods they use, like Skype, and other similar > videoconferencing programs. Breaking support for those, specially at > a time where people are relying on it in order to participate on > conferences and doing internal meetings is a **very bad** idea. > > So, whatever solution is taken, it should not be dumping warning > messages at the system and tainting the Kernel, but, instead, checking > if the userspace request is valid or not. If it is invalid, return the > proper error code via the right V4L2 ioctl, in order to let userspace > choose a different method. I the request is valid, refcount the pages > for them to not be moved while they're still in usage. > > - > > Let me provide some background about how things work at the media > subsytem. If you want to know more, the userspace-provided memory > mapped pointers work is described here: > > https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp > > Basically, userspace calls either one of those ioctls: > > VIDIOC_CREATE_BUFS: > https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html > > Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs() > > VIDIOC_REQBUFS > https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs > > Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs() > > Both internally calls vb2_verify_memory_type(), which is responsible > for checking if the provided pointers are OK for the usage and/or do > all necessary page allocations, and taking care of any special > requirements. This could easily have some additional checks to > verify if the requested VMA address has pages that are movable or > not, ensuring that ensure that the VMA is OK, and locking them in > order to prevent the mm code to move such pages while they are in > usage by the media subsystem. > > Now, as I said before, I don't know the dirty details about how > to lock those pages at the mm subsystem in order to avoid it > to move the used pages. Yet, when vb2_create_framevec() > is called, the check if VMA is OK should already be happened > at vb2_verify_memory_type(). > > - > > It should be noticed that the dirty hack added by patch 09/17 > and 10/17 affects *all* types of memory allocations at V4L2, > as this kAPI is called by the 3 different memory models > supported at the media subsystem: > > drivers/media/common/videobuf2/videobuf2-vmalloc.c > drivers/media/common/videobuf2/videobuf2-dma-contig.c > drivers/media/common/videobuf2/videobuf2-dma-sg.c > > In other words, with this code: > > int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, > unsigned long *pfn) > { > #ifdef CONFIG_STRICT_FOLLOW_PFN > pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n"); > return -EINVAL; > #else > WARN_ONCE(1, "unsafe follow_pfn usage\n"); > add_taint(TAINT_USER, LOCKDEP_STILL_OK); > > return follow_pfn(vma, address, pfn); > #endif > > you're unconditionally breaking the media userspace API support not > only for embedded systems that could be using userptr instead of > DMA_BUF, but also for *all* video devices, including USB cameras. > > This is **NOT** an acceptable solution. > > So, I stand my NACK to this approach. > > > > > Well, no drivers inside the media subsystem uses such flag, although > > > > they may rely on some infrastructure that could be using it behind > > > > the bars. > > > > > > It doesn't matter, nothing prevents the user from calling media APIs > > > on mmaps it gets from other subsystems. > > > > I think a good first step would be to disable userptr of non struct > > page backed storage going forward for any new hw support. Even on > > existing drivers. dma-buf sharing has been around for long enough now > > that this shouldn't be a problem. Unfortunately right now this doesn't > > seem to exist, so the entire problem keeps getting perpetuated. > > Well, the media uAPI does support DMABUF (both import and export): > > https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html > https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf > > And I fully agree that newer programs should use DMABUF when sharing > buffers with DRM subsystem, but that's not my main concern. > > What I do want is to not break userspace support nor to taint the Kernel > due to a valid uAPI call. > > A valid uAPI call should check if the parameters passed though it are > valid. If they are, it should handle. Otherwise, it should return > -EINVAL, without tainting the Kernel or printing warning messages. > > The approach took by patches 09/17 and 10/17 doesn't do that. > Instead, they just unconditionally breaks the media uAPI. > > What should be done, instead, is to drop patch 10/17, and work on > a way for the code inside vb2_create_framevec() to ensure that, if > USERPTR is used, the memory pages will be properly locked while the > driver is using, returning -EINVAL only if there's no way to proceed, > without tainting the Kernel. > > > > > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > > > > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > > > > > We can't tell from the VMA these kinds of details.. > > > > > > It has to go the other direction, evey mmap that might be used as a > > > userptr here has to be found and the VMA specially created to allow > > > its use. At least that is a kernel only change, but will need people > > > with the HW to do this work. > > > > I think the only reasonable way to keep this working is: > > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma); > > Not sure how an userspace buffer could be mapped to be using it, > specially since the buffer may not even be imported/exported > from the DRM subsystem, but it could simply be allocated > via glibc calloc()/malloc(). > > > - add dma-buf export support to fbdev and v4l > > V4L has support for it already. > > > - roll this out everywhere we still need it. > > That's where things are hard. This is not like DRM, where the APIs > are called via some open source libraries that are also managed > by DRM upstream developers. > > In the case of the media subsystem, while we added a libv4l sometime > ago, not all userspace apps use it, as a large part of them used > to exist before the addition of the libraries. Also, we're currently > trying to deprecate libv4l, at least for embedded systems, in favor > of libcamera. > > On media, there are lots of closed source apps that uses the media API > directly. Even talking about open source ones, there are lots of > different apps, including not only media-specific apps, but also > generic ones, like web browsers, which don't use the libraries we > wrote. > > An userspace API breakage would take *huge* efforts and will take > lots of time to have it merged everywhere. It will cause lots of > troubles everywhere. > > > Realistically this just isn't going to happen. > > Why not? Any app developer could already use DMA-BUF if required, > as the upstream support was added several years ago. > > > And anything else just > > reimplements half of dma-buf, > > It is just the opposite: those uAPIs came years before dma-buf. > In other words, it was dma-buf that re-implemented it ;-) > > Now, I agree with you that dma-buf is a way cooler than the past > alternatives. > > - > > It sounds to me that you're considering on only one use case of > USERPTR: to pass a buffer created from DRM. As far as I'm aware, > only embedded userspace applications actually do that. > > Yet, there are a number of other applications that do something like > the userptr_capture() function on this code: > > https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c > > E. g. using glibc alloc functions like calloc() to allocate memory, > passing the user-allocated data to the Kernel via something like this: > > struct v4l2_requestbuffers req; > struct v4l2_buffer buf; > int n_buffers = 2; > > req.count = 2; > req.type = V4L2_BUF_TYPE_VIDEO_CAPTURE; > req.memory = V4L2_MEMORY_USERPTR; > if (ioctl(fd, VIDIOC_REQBUFS, &req)) > return errno; > > for (i = 0; i < req.count; ++i) { > buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE; > buf.memory = V4L2_MEMORY_USERPTR; > buf.index = i; > buf.m.userptr = (unsigned long)buffers[i].start; > buf.length = buffers[i].length; > if (ioctl(fd, VIDIOC_QBUF, &buf)) > return errno; > } > if (ioctl(fd, VIDIOC_STREAMON, &req.type)) > return errno; > > /* some capture loop */ > > ioctl(fd, VIDIOC_STREAMOFF, &req.type); > > I can't possibly see *any* security issues with the above code. > > As I said before, VIDIOC_REQBUFS should be checking if the userspace > buffers are OK and ensure that their refcounts will be incremented, > in order to avoid mm to move the pages used there, freeing the > refconts when VIDIOC_STREAMOFF - or close(fd) - is called. > > > which is kinda pointless (you need > > minimally refcounting and some way to get at a promise of a permanent > > sg list for dma. Plus probably the vmap for kernel cpu access. > > Yeah, refcounting needs to happen. > > Thanks, > Mauro -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Em Sat, 10 Oct 2020 12:53:49 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > Hi Mauro, > > You might want to read the patches more carefully, because what you're > demanding is what my patches do. Short summary: > > - if STRICT_FOLLOW_PFN is set: > a) normal memory is handled as-is (i.e. your example works) through > the addition of FOLL_LONGTERM. This is the "pin the pages correctly" > approach you're demanding > b) for non-page memory (zerocopy sharing before dma-buf was upstreamed > is the only use-case for this) it is correctly rejected with -EINVAL > > - if you do have blobby userspace which requires the zero-copy using > userptr to work, and doesn't have any of the fallbacks implemented > that you describe, this would be a regression. That's why > STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue > in this usage, Marek also confirmed that the removal of the vma copy > code a few years ago essentially broke even the weak assumptions that > made the code work 10+ years ago when it was merged. > > so tdlr; Everything you described will keep working even with the new > flag set, and everything you demand must be implemented _is_ > implemented in this patch series. > > Also please keep in mind that we are _not_ talking about the general > userptr support that was merge ~20 years ago. This patch series here > is _only_ about the zerocpy userptr support merged with 50ac952d2263 > ("[media] videobuf2-dma-sg: Support io userptr operations on io > memory") in 2013. Ok, now it is making more sense. Please update the comments for patch 10/17 to describe the above. We need some time to test this though, in order to check if no regressions were added (except the ones due to changeset 50ac952d2263). > > Why this hack was merged in 2013 when we merged dma-buf almost 2 years > before that I have no idea about. Imo that patch simply should never > have landed, and instead dma-buf support prioritized. If I recall correctly, we didn't have any DMABUF support at the media subsystem, back on 2013. It took some time for the DMA-BUF to arrive at media, as this was not a top priority. Also, there aren't many developers that understand the memory model well enough to implement DMA-BUF support and touch the VB2 code, which is quite complex, as it supports lots of different ways for I/O, plus works with vmalloc, DMA contig and DMA scatter/gather. Changes there should carefully be tested against different drivers, in order to avoid regressions on it. > Cheers, Daniel Thanks, Mauro
On Sat, Oct 10, 2020 at 1:39 PM Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote: > > Em Sat, 10 Oct 2020 12:53:49 +0200 > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > > > Hi Mauro, > > > > You might want to read the patches more carefully, because what you're > > demanding is what my patches do. Short summary: > > > > - if STRICT_FOLLOW_PFN is set: > > a) normal memory is handled as-is (i.e. your example works) through > > the addition of FOLL_LONGTERM. This is the "pin the pages correctly" > > approach you're demanding > > b) for non-page memory (zerocopy sharing before dma-buf was upstreamed > > is the only use-case for this) it is correctly rejected with -EINVAL > > > > - if you do have blobby userspace which requires the zero-copy using > > userptr to work, and doesn't have any of the fallbacks implemented > > that you describe, this would be a regression. That's why > > STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue > > in this usage, Marek also confirmed that the removal of the vma copy > > code a few years ago essentially broke even the weak assumptions that > > made the code work 10+ years ago when it was merged. > > > > so tdlr; Everything you described will keep working even with the new > > flag set, and everything you demand must be implemented _is_ > > implemented in this patch series. > > > > Also please keep in mind that we are _not_ talking about the general > > userptr support that was merge ~20 years ago. This patch series here > > is _only_ about the zerocpy userptr support merged with 50ac952d2263 > > ("[media] videobuf2-dma-sg: Support io userptr operations on io > > memory") in 2013. > > Ok, now it is making more sense. Please update the comments for > patch 10/17 to describe the above. Will do. > We need some time to test this though, in order to check if no > regressions were added (except the ones due to changeset 50ac952d2263). Yeah testing of the previous patches to switch to FOLL_LONGTERM would be really good. I also need that for habanalabs and ideally exynos too. All the userptr for normal memory should keep working, and with FOLL_LONGTERM it should actually work better, since with that it should now correctly interact with pagecache and fs code, not just with anon memory from malloc. Thanks, Daniel > > Why this hack was merged in 2013 when we merged dma-buf almost 2 years > > before that I have no idea about. Imo that patch simply should never > > have landed, and instead dma-buf support prioritized. > > If I recall correctly, we didn't have any DMABUF support > at the media subsystem, back on 2013. > > It took some time for the DMA-BUF to arrive at media, as this > was not a top priority. Also, there aren't many developers that > understand the memory model well enough to implement DMA-BUF support > and touch the VB2 code, which is quite complex, as it supports > lots of different ways for I/O, plus works with vmalloc, DMA > contig and DMA scatter/gather. > > Changes there should carefully be tested against different > drivers, in order to avoid regressions on it. > > > Cheers, Daniel > > Thanks, > Mauro
Hi Daniel, On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > > > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch > > > description is that this is unsafe *only if* __GFP_MOVABLE is used. > > > > No, it is unconditionally unsafe. The CMA movable mappings are > > specific VMAs that will have bad issues here, but there are other > > types too. > > > > The only way to do something at a VMA level is to have a list of OK > > VMAs, eg because they were creatd via a special mmap helper from the > > media subsystem. > > > > > Well, no drivers inside the media subsystem uses such flag, although > > > they may rely on some infrastructure that could be using it behind > > > the bars. > > > > It doesn't matter, nothing prevents the user from calling media APIs > > on mmaps it gets from other subsystems. > > I think a good first step would be to disable userptr of non struct > page backed storage going forward for any new hw support. Even on > existing drivers. dma-buf sharing has been around for long enough now > that this shouldn't be a problem. Unfortunately right now this doesn't > seem to exist, so the entire problem keeps getting perpetuated. > > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > > > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > > > We can't tell from the VMA these kinds of details.. > > > > It has to go the other direction, evey mmap that might be used as a > > userptr here has to be found and the VMA specially created to allow > > its use. At least that is a kernel only change, but will need people > > with the HW to do this work. > > I think the only reasonable way to keep this working is: > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma); > - add dma-buf export support to fbdev and v4l I assume you mean V4L2 and not the obsolete V4L that is emulated in the userspace by libv4l. If so, every video device that uses videobuf2 gets DMA-buf export for free and there is nothing needed to enable it. We probably still have a few legacy drivers using videobuf (non-2), but IMHO those should be safe to put behind some disabled-by-default Kconfig symbol or even completely drop, as the legacy framework has been deprecated for many years already. > - roll this out everywhere we still need it. > > Realistically this just isn't going to happen. And anything else just > reimplements half of dma-buf, which is kinda pointless (you need > minimally refcounting and some way to get at a promise of a permanent > sg list for dma. Plus probably the vmap for kernel cpu access. > > > > Please let address the issue on this way, instead of broken an > > > userspace API that it is there since 1991. > > > > It has happened before :( It took 4 years for RDMA to undo the uAPI > > breakage caused by a security fix for something that was a 15 years > > old bug. > > Yeah we have a bunch of these on the drm side too. Some of them are > really just "you have to upgrade userspace", and there's no real fix > for the security nightmare without that. I think we need to phase out such userspace indeed. The Kconfig symbol allows enabling the unsafe functionality for anyone who still needs it, so I think it's not entirely a breakage. Best regards, Tomasz
Hi Mauro, On Fri, Oct 9, 2020 at 2:37 PM Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote: > > Em Fri, 9 Oct 2020 09:21:11 -0300 > Jason Gunthorpe <jgg@ziepe.ca> escreveu: > > > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote: > > > Hi, > > > > > > Em Fri, 9 Oct 2020 09:59:26 +0200 > > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > > > > > > > Way back it was a reasonable assumptions that iomem mappings never > > > > change the pfn range they point at. But this has changed: > > > > > > > > - gpu drivers dynamically manage their memory nowadays, invalidating > > > > ptes with unmap_mapping_range when buffers get moved > > > > > > > > - contiguous dma allocations have moved from dedicated carvetouts to > > > > cma regions. This means if we miss the unmap the pfn might contain > > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > > > > > > > - even /dev/mem now invalidates mappings when the kernel requests that > > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > > > > ("/dev/mem: Revoke mappings when a driver claims the region") > > > > > > > > Accessing pfns obtained from ptes without holding all the locks is > > > > therefore no longer a good idea. > > > > > > > > Unfortunately there's some users where this is not fixable (like v4l > > > > userptr of iomem mappings) or involves a pile of work (vfio type1 > > > > iommu). For now annotate these as unsafe and splat appropriately. > > > > > > > > This patch adds an unsafe_follow_pfn, which later patches will then > > > > roll out to all appropriate places. > > > > > > NACK, as this breaks an existing userspace API on media. > > > > It doesn't break it. You get a big warning the thing is broken and it > > keeps working. > > > > We can't leave such a huge security hole open - it impacts other > > subsystems, distros need to be able to run in a secure mode. > > Well, if distros disable it, then apps will break. > Do we have any information on userspace that actually needs this functionality? Note that we're _not_ talking here about the complete USERPTR functionality, but rather just the very corner case of carveout memory not backed by struct pages. Given that the current in-tree ways of reserving carveout memory, such as shared-dma-pool, actually give memory backed by struct pages, do we even have a source of such legacy memory in the kernel today? I think that given that this is a very niche functionality, we could have it disabled by default for security reasons and if someone _really_ (i.e. there is no replacement) needs it, they probably need to use a custom kernel build anyway for their exotic hardware setup (with PFN-backed carveout memory), so they can enable it. > > > While I agree that using the userptr on media is something that > > > new drivers may not support, as DMABUF is a better way of > > > handling it, changing this for existing ones is a big no, > > > as it may break usersapace. > > > > media community needs to work to fix this, not pretend it is OK to > > keep going as-is. > > > Dealing with security issues is the one case where an uABI break might > > be acceptable. > > > > If you want to NAK it then you need to come up with the work to do > > something here correctly that will support the old drivers without the > > kernel taint. > > > > Unfortunately making things uncomfortable for the subsystem is the big > > hammer the core kernel needs to use to actually get this security work > > done by those responsible. > > > I'm not pretending that this is ok. Just pointing that the approach > taken is NOT OK. > > I'm not a mm/ expert, but, from what I understood from Daniel's patch > description is that this is unsafe *only if* __GFP_MOVABLE is used. > > Well, no drivers inside the media subsystem uses such flag, although > they may rely on some infrastructure that could be using it behind > the bars. > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > Please let address the issue on this way, instead of broken an > userspace API that it is there since 1991. Note that USERPTR as a whole generally has been considered deprecated in V4L2 for many years and people have been actively discouraged to use it. And, still, we're just talking here about the very rare corner case, not the whole USERPTR API. Best regards, Tomasz
Hi Daniel, On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote: > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > > > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch > > > description is that this is unsafe *only if* __GFP_MOVABLE is used. > > > > No, it is unconditionally unsafe. The CMA movable mappings are > > specific VMAs that will have bad issues here, but there are other > > types too. > > > > The only way to do something at a VMA level is to have a list of OK > > VMAs, eg because they were creatd via a special mmap helper from the > > media subsystem. > > > > > Well, no drivers inside the media subsystem uses such flag, although > > > they may rely on some infrastructure that could be using it behind > > > the bars. > > > > It doesn't matter, nothing prevents the user from calling media APIs > > on mmaps it gets from other subsystems. > > I think a good first step would be to disable userptr of non struct > page backed storage going forward for any new hw support. Even on > existing drivers. dma-buf sharing has been around for long enough now > that this shouldn't be a problem. Unfortunately right now this doesn't > seem to exist, so the entire problem keeps getting perpetuated. On the V4L2 side, I think we should disable USERPTR for any new driver, period. That's what I've been recommended when reviewing patches for several years already. It's a deprecated API, it should be phased out, which starts by not allowing any new use case. > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > > > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > > > We can't tell from the VMA these kinds of details.. > > > > It has to go the other direction, evey mmap that might be used as a > > userptr here has to be found and the VMA specially created to allow > > its use. At least that is a kernel only change, but will need people > > with the HW to do this work. > > I think the only reasonable way to keep this working is: > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma); > - add dma-buf export support to fbdev and v4l > - roll this out everywhere we still need it. > > Realistically this just isn't going to happen. And anything else just > reimplements half of dma-buf, which is kinda pointless (you need > minimally refcounting and some way to get at a promise of a permanent > sg list for dma. Plus probably the vmap for kernel cpu access. > > > > Please let address the issue on this way, instead of broken an > > > userspace API that it is there since 1991. > > > > It has happened before :( It took 4 years for RDMA to undo the uAPI > > breakage caused by a security fix for something that was a 15 years > > old bug. > > Yeah we have a bunch of these on the drm side too. Some of them are > really just "you have to upgrade userspace", and there's no real fix > for the security nightmare without that.
Hi Tomasz, On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote: > On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > > > > > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch > > > > description is that this is unsafe *only if* __GFP_MOVABLE is used. > > > > > > No, it is unconditionally unsafe. The CMA movable mappings are > > > specific VMAs that will have bad issues here, but there are other > > > types too. > > > > > > The only way to do something at a VMA level is to have a list of OK > > > VMAs, eg because they were creatd via a special mmap helper from the > > > media subsystem. > > > > > > > Well, no drivers inside the media subsystem uses such flag, although > > > > they may rely on some infrastructure that could be using it behind > > > > the bars. > > > > > > It doesn't matter, nothing prevents the user from calling media APIs > > > on mmaps it gets from other subsystems. > > > > I think a good first step would be to disable userptr of non struct > > page backed storage going forward for any new hw support. Even on > > existing drivers. dma-buf sharing has been around for long enough now > > that this shouldn't be a problem. Unfortunately right now this doesn't > > seem to exist, so the entire problem keeps getting perpetuated. > > > > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > > > > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > > > > > We can't tell from the VMA these kinds of details.. > > > > > > It has to go the other direction, evey mmap that might be used as a > > > userptr here has to be found and the VMA specially created to allow > > > its use. At least that is a kernel only change, but will need people > > > with the HW to do this work. > > > > I think the only reasonable way to keep this working is: > > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma); > > - add dma-buf export support to fbdev and v4l > > I assume you mean V4L2 and not the obsolete V4L that is emulated in > the userspace by libv4l. If so, every video device that uses videobuf2 > gets DMA-buf export for free and there is nothing needed to enable it. > > We probably still have a few legacy drivers using videobuf (non-2), > but IMHO those should be safe to put behind some disabled-by-default > Kconfig symbol or even completely drop, as the legacy framework has > been deprecated for many years already. There's 8 drivers left, and they support a very large number of devices. I expect unhappy users distros stop shipping them. On the other hand, videobuf has been deprecated for a loooooooong time, so there has been plenty of time to convert the remaining drivers to videobuf2. If nobody can do it, then we'll have to drop support for these devices given the security issues. We have moved media drivers to staging in the past when there wasn't enough maintenance effort, we could do so here too. > > - roll this out everywhere we still need it. > > > > Realistically this just isn't going to happen. And anything else just > > reimplements half of dma-buf, which is kinda pointless (you need > > minimally refcounting and some way to get at a promise of a permanent > > sg list for dma. Plus probably the vmap for kernel cpu access. > > > > > > Please let address the issue on this way, instead of broken an > > > > userspace API that it is there since 1991. > > > > > > It has happened before :( It took 4 years for RDMA to undo the uAPI > > > breakage caused by a security fix for something that was a 15 years > > > old bug. > > > > Yeah we have a bunch of these on the drm side too. Some of them are > > really just "you have to upgrade userspace", and there's no real fix > > for the security nightmare without that. > > I think we need to phase out such userspace indeed. The Kconfig symbol > allows enabling the unsafe functionality for anyone who still needs > it, so I think it's not entirely a breakage.
On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Hi Tomasz, > > On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote: > > On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > > > > > > > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch > > > > > description is that this is unsafe *only if* __GFP_MOVABLE is used. > > > > > > > > No, it is unconditionally unsafe. The CMA movable mappings are > > > > specific VMAs that will have bad issues here, but there are other > > > > types too. > > > > > > > > The only way to do something at a VMA level is to have a list of OK > > > > VMAs, eg because they were creatd via a special mmap helper from the > > > > media subsystem. > > > > > > > > > Well, no drivers inside the media subsystem uses such flag, although > > > > > they may rely on some infrastructure that could be using it behind > > > > > the bars. > > > > > > > > It doesn't matter, nothing prevents the user from calling media APIs > > > > on mmaps it gets from other subsystems. > > > > > > I think a good first step would be to disable userptr of non struct > > > page backed storage going forward for any new hw support. Even on > > > existing drivers. dma-buf sharing has been around for long enough now > > > that this shouldn't be a problem. Unfortunately right now this doesn't > > > seem to exist, so the entire problem keeps getting perpetuated. > > > > > > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE > > > > > flag that it would be denying the core mm code to set __GFP_MOVABLE. > > > > > > > > We can't tell from the VMA these kinds of details.. > > > > > > > > It has to go the other direction, evey mmap that might be used as a > > > > userptr here has to be found and the VMA specially created to allow > > > > its use. At least that is a kernel only change, but will need people > > > > with the HW to do this work. > > > > > > I think the only reasonable way to keep this working is: > > > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma); > > > - add dma-buf export support to fbdev and v4l > > > > I assume you mean V4L2 and not the obsolete V4L that is emulated in > > the userspace by libv4l. If so, every video device that uses videobuf2 > > gets DMA-buf export for free and there is nothing needed to enable it. Yeah. And I missed that v4l2 added dma-buf export too. > > We probably still have a few legacy drivers using videobuf (non-2), > > but IMHO those should be safe to put behind some disabled-by-default > > Kconfig symbol or even completely drop, as the legacy framework has > > been deprecated for many years already. > > There's 8 drivers left, and they support a very large number of devices. > I expect unhappy users distros stop shipping them. On the other hand, > videobuf has been deprecated for a loooooooong time, so there has been > plenty of time to convert the remaining drivers to videobuf2. If nobody > can do it, then we'll have to drop support for these devices given the > security issues. Again, the issue here is _only_ with follow_pfn. For videobuf1 this means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2 it means it's broken for all usage (not just zero-copy userptr), because videbuf-dma-contig.c lacks the pin_user_pages path. But that would be easy to add if this poses a problem I think - we just need to carry over the pin_user_pages_fast logic from videbuf2, no driver changes required. But of course I don't think we should do that before someone reports the regression, since videobuf1 userptr is doubly deprecated :-) Everything else keeps working with videobuf1 with my patch series. So depending upon which videobuf1 implementations these 8 drivers use, you might not even have any real breakage there. > We have moved media drivers to staging in the past when there wasn't > enough maintenance effort, we could do so here too. I'm not breaking the world with this, it's really very minimal use-case. At least as far as I'm understanding the entire media subsystem here. -Daniel > > > - roll this out everywhere we still need it. > > > > > > Realistically this just isn't going to happen. And anything else just > > > reimplements half of dma-buf, which is kinda pointless (you need > > > minimally refcounting and some way to get at a promise of a permanent > > > sg list for dma. Plus probably the vmap for kernel cpu access. > > > > > > > > Please let address the issue on this way, instead of broken an > > > > > userspace API that it is there since 1991. > > > > > > > > It has happened before :( It took 4 years for RDMA to undo the uAPI > > > > breakage caused by a security fix for something that was a 15 years > > > > old bug. > > > > > > Yeah we have a bunch of these on the drm side too. Some of them are > > > really just "you have to upgrade userspace", and there's no real fix > > > for the security nightmare without that. > > > > I think we need to phase out such userspace indeed. The Kconfig symbol > > allows enabling the unsafe functionality for anyone who still needs > > it, so I think it's not entirely a breakage. > > -- > Regards, > > Laurent Pinchart
Em Sat, 10 Oct 2020 23:50:27 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart > <laurent.pinchart@ideasonboard.com> wrote: > > > > > We probably still have a few legacy drivers using videobuf (non-2), > > > but IMHO those should be safe to put behind some disabled-by-default > > > Kconfig symbol or even completely drop, as the legacy framework has > > > been deprecated for many years already. > > > > There's 8 drivers left, and they support a very large number of devices. > > I expect unhappy users distros stop shipping them. On the other hand, > > videobuf has been deprecated for a loooooooong time, so there has been > > plenty of time to convert the remaining drivers to videobuf2. If nobody > > can do it, then we'll have to drop support for these devices given the > > security issues. > > Again, the issue here is _only_ with follow_pfn. For videobuf1 this > means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2 > it means it's broken for all usage (not just zero-copy userptr), > because videbuf-dma-contig.c lacks the pin_user_pages path. Well, follow_pfn() is used only by videbuf-dma-contig.c. If this is the only part of VB1 that will have userptr broken, then there's just one driver that might be affected: davinci. Yet, taking a deeper look: $ git grep include drivers/media/platform/davinci/|grep -i videobuf drivers/media/platform/davinci/vpif_capture.h:#include <media/videobuf2-dma-contig.h> drivers/media/platform/davinci/vpif_display.h:#include <media/videobuf2-dma-contig.h> It sounds to me that it was already converted to VB2, but some VB1 symbols were not converted at its Kconfig. It sounds to me that there are other drivers with some VB1 left overs at Kconfig, as those are the only ones using VB1 those days: $ for i in $(git grep media/videobuf drivers |grep -v videobuf2 |grep -v v4l2-core|cut -d: -f1); do dirname $i; done|sort|uniq drivers/media/pci/bt8xx drivers/media/pci/cx18 drivers/media/platform drivers/media/usb/tm6000 drivers/media/usb/zr364xx drivers/staging/media/atomisp/pci > But that > would be easy to add if this poses a problem I think - we just need > to carry over the pin_user_pages_fast logic from videbuf2, no driver > changes required. But of course I don't think we should do that before > someone reports the regression, since videobuf1 userptr is doubly > deprecated :-) I think otherwise. Keeping a broken component at the Kernel is a bad idea. Yet, from my quick search above, it sounds to me that it is time for us to retire the VB1 DMA contig support as a hole, as there's no client for it anymore. I'll work on some patches cleaning up the VB1 left overs at Kconfig and removing videbuf-dma-contig.c for good, if there's no hidden dependency on it. Thanks, Mauro
Em Sun, 11 Oct 2020 08:27:41 +0200 Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu: > Em Sat, 10 Oct 2020 23:50:27 +0200 > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu: > > > On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart > > <laurent.pinchart@ideasonboard.com> wrote: > > > > > > > > We probably still have a few legacy drivers using videobuf (non-2), > > > > but IMHO those should be safe to put behind some disabled-by-default > > > > Kconfig symbol or even completely drop, as the legacy framework has > > > > been deprecated for many years already. > > > > > > There's 8 drivers left, and they support a very large number of devices. > > > I expect unhappy users distros stop shipping them. On the other hand, > > > videobuf has been deprecated for a loooooooong time, so there has been > > > plenty of time to convert the remaining drivers to videobuf2. If nobody > > > can do it, then we'll have to drop support for these devices given the > > > security issues. > > > > Again, the issue here is _only_ with follow_pfn. For videobuf1 this > > means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2 > > it means it's broken for all usage (not just zero-copy userptr), > > because videbuf-dma-contig.c lacks the pin_user_pages path. > > Well, follow_pfn() is used only by videbuf-dma-contig.c. If this is > the only part of VB1 that will have userptr broken, then there's > just one driver that might be affected: davinci. > > Yet, taking a deeper look: > > $ git grep include drivers/media/platform/davinci/|grep -i videobuf > drivers/media/platform/davinci/vpif_capture.h:#include <media/videobuf2-dma-contig.h> > drivers/media/platform/davinci/vpif_display.h:#include <media/videobuf2-dma-contig.h> > > It sounds to me that it was already converted to VB2, but some VB1 > symbols were not converted at its Kconfig. > > It sounds to me that there are other drivers with some VB1 left overs > at Kconfig, as those are the only ones using VB1 those days: > > $ for i in $(git grep media/videobuf drivers |grep -v videobuf2 |grep -v v4l2-core|cut -d: -f1); do dirname $i; done|sort|uniq > drivers/media/pci/bt8xx > drivers/media/pci/cx18 > drivers/media/platform > drivers/media/usb/tm6000 > drivers/media/usb/zr364xx > drivers/staging/media/atomisp/pci This is incomplete. There are two drivers that include videobuf indirectly: include/media/davinci/vpfe_capture.h include/media/drv-intf/saa7146_vv.h I double-checked that DaVinci still uses VB1. There are actually two clients for videbuf-dma-contig.c: davinci and fsl-viu. Those two will be affected, if we don't add pin_user_pages_fast() support to VB1 or convert them to VB2. > > > But that > > would be easy to add if this poses a problem I think - we just need > > to carry over the pin_user_pages_fast logic from videbuf2, no driver > > changes required. But of course I don't think we should do that before > > someone reports the regression, since videobuf1 userptr is doubly > > deprecated :-) > > I think otherwise. Keeping a broken component at the Kernel is > a bad idea. > > Yet, from my quick search above, it sounds to me that it is time for > us to retire the VB1 DMA contig support as a hole, as there's no client > for it anymore. > > I'll work on some patches cleaning up the VB1 left overs at > Kconfig and removing videbuf-dma-contig.c for good, if there's > no hidden dependency on it. > > > Thanks, > Mauro Thanks, Mauro
Hi Jason, On 09.10.2020 14:48, Jason Gunthorpe wrote: > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > >> I'm not a mm/ expert, but, from what I understood from Daniel's patch >> description is that this is unsafe *only if* __GFP_MOVABLE is used. > No, it is unconditionally unsafe. The CMA movable mappings are > specific VMAs that will have bad issues here, but there are other > types too. I'm trying to follow this thread, but I really wonder what do you mean by CMA movable mappings? If a buffer has been allocated from CMA and used for DMA, it won't be moved in the memory. It will stay at the same physical memory address all the time until freed by the owner. It just a matter of proper usage count tracking to delay freeing if it is still used somewhere. Best regards
On Mon, Oct 12, 2020 at 12:47 PM Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > Hi Jason, > > On 09.10.2020 14:48, Jason Gunthorpe wrote: > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote: > > > >> I'm not a mm/ expert, but, from what I understood from Daniel's patch > >> description is that this is unsafe *only if* __GFP_MOVABLE is used. > > No, it is unconditionally unsafe. The CMA movable mappings are > > specific VMAs that will have bad issues here, but there are other > > types too. > > I'm trying to follow this thread, but I really wonder what do you mean > by CMA movable mappings? If a buffer has been allocated from CMA and > used for DMA, it won't be moved in the memory. It will stay at the same > physical memory address all the time until freed by the owner. It just a > matter of proper usage count tracking to delay freeing if it is still > used somewhere. Yup. The problem is that this usage count tracking doesn't exist. And drivers could at least in theory treat CMA like vram and swap buffers in&out of it, so just refcounting the userspace vma isn't enough. In practice, right now, it might be enough for CMA drivers though (but there's more that's possible here). -Daniel
diff --git a/include/linux/mm.h b/include/linux/mm.h index 2a16631c1fda..ec8c90928fc9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address, pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp); int follow_pfn(struct vm_area_struct *vma, unsigned long address, unsigned long *pfn); +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, + unsigned long *pfn); int follow_phys(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned long *prot, resource_size_t *phys); int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/memory.c b/mm/memory.c index f7cbc4dde0ef..7c7b234ffb24 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd); * @address: user virtual address * @pfn: location to store found PFN * - * Only IO mappings and raw PFN mappings are allowed. + * Only IO mappings and raw PFN mappings are allowed. Note that callers must + * ensure coherency with pte updates by using a &mmu_notifier to follow updates. + * If this is not feasible, or the access to the @pfn is only very short term, + * use follow_pte_pmd() instead and hold the pagetable lock for the duration of + * the access instead. Any caller not following these requirements must use + * unsafe_follow_pfn() instead. * * Return: zero and the pfn at @pfn on success, -ve otherwise. */ @@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, } EXPORT_SYMBOL(follow_pfn); +/** + * unsafe_follow_pfn - look up PFN at a user virtual address + * @vma: memory mapping + * @address: user virtual address + * @pfn: location to store found PFN + * + * Only IO mappings and raw PFN mappings are allowed. + * + * Returns zero and the pfn at @pfn on success, -ve otherwise. + */ +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, + unsigned long *pfn) +{ +#ifdef CONFIG_STRICT_FOLLOW_PFN + pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n"); + return -EINVAL; +#else + WARN_ONCE(1, "unsafe follow_pfn usage\n"); + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + + return follow_pfn(vma, address, pfn); +#endif +} +EXPORT_SYMBOL(unsafe_follow_pfn); + #ifdef CONFIG_HAVE_IOREMAP_PROT int follow_phys(struct vm_area_struct *vma, unsigned long address, unsigned int flags, diff --git a/mm/nommu.c b/mm/nommu.c index 75a327149af1..3db2910f0d64 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, } EXPORT_SYMBOL(follow_pfn); +/** + * unsafe_follow_pfn - look up PFN at a user virtual address + * @vma: memory mapping + * @address: user virtual address + * @pfn: location to store found PFN + * + * Only IO mappings and raw PFN mappings are allowed. + * + * Returns zero and the pfn at @pfn on success, -ve otherwise. + */ +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, + unsigned long *pfn) +{ + return follow_pfn(vma, address, pfn); +} +EXPORT_SYMBOL(unsafe_follow_pfn); + LIST_HEAD(vmap_area_list); void vfree(const void *addr) diff --git a/security/Kconfig b/security/Kconfig index 7561f6f99f1d..48945402e103 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH If you wish for all usermode helper programs to be disabled, specify an empty string here (i.e. ""). +config STRICT_FOLLOW_PFN + bool "Disable unsafe use of follow_pfn" + depends on MMU + help + Some functionality in the kernel follows userspace mappings to iomem + ranges in an unsafe matter. Examples include v4l userptr for zero-copy + buffers sharing. + + If this option is switched on, such access is rejected. Only enable + this option when you must run userspace which requires this. + + If in doubt, say Y. + source "security/selinux/Kconfig" source "security/smack/Kconfig" source "security/tomoyo/Kconfig"
Way back it was a reasonable assumptions that iomem mappings never change the pfn range they point at. But this has changed: - gpu drivers dynamically manage their memory nowadays, invalidating ptes with unmap_mapping_range when buffers get moved - contiguous dma allocations have moved from dedicated carvetouts to cma regions. This means if we miss the unmap the pfn might contain pagecache or anon memory (well anything allocated with GFP_MOVEABLE) - even /dev/mem now invalidates mappings when the kernel requests that iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") Accessing pfns obtained from ptes without holding all the locks is therefore no longer a good idea. Unfortunately there's some users where this is not fixable (like v4l userptr of iomem mappings) or involves a pile of work (vfio type1 iommu). For now annotate these as unsafe and splat appropriately. This patch adds an unsafe_follow_pfn, which later patches will then roll out to all appropriate places. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org --- include/linux/mm.h | 2 ++ mm/memory.c | 32 +++++++++++++++++++++++++++++++- mm/nommu.c | 17 +++++++++++++++++ security/Kconfig | 13 +++++++++++++ 4 files changed, 63 insertions(+), 1 deletion(-)