Message ID | 20240717222429.2011540-1-axelrasmussen@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Backport VFIO refactor to fix fork ordering bug | expand |
On Wed, 17 Jul 2024 15:24:26 -0700 Axel Rasmussen <axelrasmussen@google.com> wrote: > 35e351780fa9 ("fork: defer linking file vma until vma is fully initialized") > switched the ordering of vm_ops->open() and copy_page_range() on fork. This is a > bug for VFIO, because it causes two problems: > > 1. Because open() is called before copy_page_range(), the range can conceivably > have unmapped 'holes' in it. This causes the code underneath untrack_pfn() to > WARN. > > 2. More seriously, open() is trying to guarantee that the entire range is > zapped, so any future accesses in the child will result in the VFIO fault > handler being called. Because we copy_page_range() *after* open() (and > therefore after zapping), this guarantee is violated. > > We can't revert 35e351780fa9, because it fixes a real bug for hugetlbfs. The fix > is also not as simple as just reodering open() and copy_page_range(), as Miaohe > points out in [1]. So, although these patches are kind of large for stable, just > backport this refactoring which completely sidesteps the issue. > > Note that patch 2 is the key one here which fixes the issue. Patch 1 is a > prerequisite required for patch 2 to build / work. This would almost be enough, > but we might see significantly regressed performance. Patch 3 fixes that up, > putting performance back on par with what it was before. > > Note [1] also has a more full discussion justifying taking these backports. > > I proposed the same backport for 6.9 [2], and now for 6.6. 6.6 is the oldest > kernel which needs the change: 35e351780fa9 was reverted for unrelated reasons > in 6.1, and was never backported to 5.15 or earlier. AFAICT 35e351780fa9 was reverted in linux-6.6.y as well, so why isn't this one a 4-part series concluding with a new backport of that commit? I think without that, we don't need these in 6.6 either. Thanks, Alex > > [1]: https://lore.kernel.org/all/20240702042948.2629267-1-leah.rumancik@gmail.com/T/ > [2]: https://lore.kernel.org/r/20240717213339.1921530-1-axelrasmussen@google.com > > Alex Williamson (3): > vfio: Create vfio_fs_type with inode per device > vfio/pci: Use unmap_mapping_range() > vfio/pci: Insert full vma on mmap'd MMIO fault > > drivers/vfio/device_cdev.c | 7 + > drivers/vfio/group.c | 7 + > drivers/vfio/pci/vfio_pci_core.c | 271 ++++++++----------------------- > drivers/vfio/vfio_main.c | 44 +++++ > include/linux/vfio.h | 1 + > include/linux/vfio_pci_core.h | 2 - > 6 files changed, 125 insertions(+), 207 deletions(-) > > -- > 2.45.2.993.g49e7a77208-goog >
On Wed, Jul 17, 2024 at 3:31 PM Alex Williamson <alex.williamson@redhat.com> wrote: > > On Wed, 17 Jul 2024 15:24:26 -0700 > Axel Rasmussen <axelrasmussen@google.com> wrote: > > > 35e351780fa9 ("fork: defer linking file vma until vma is fully initialized") > > switched the ordering of vm_ops->open() and copy_page_range() on fork. This is a > > bug for VFIO, because it causes two problems: > > > > 1. Because open() is called before copy_page_range(), the range can conceivably > > have unmapped 'holes' in it. This causes the code underneath untrack_pfn() to > > WARN. > > > > 2. More seriously, open() is trying to guarantee that the entire range is > > zapped, so any future accesses in the child will result in the VFIO fault > > handler being called. Because we copy_page_range() *after* open() (and > > therefore after zapping), this guarantee is violated. > > > > We can't revert 35e351780fa9, because it fixes a real bug for hugetlbfs. The fix > > is also not as simple as just reodering open() and copy_page_range(), as Miaohe > > points out in [1]. So, although these patches are kind of large for stable, just > > backport this refactoring which completely sidesteps the issue. > > > > Note that patch 2 is the key one here which fixes the issue. Patch 1 is a > > prerequisite required for patch 2 to build / work. This would almost be enough, > > but we might see significantly regressed performance. Patch 3 fixes that up, > > putting performance back on par with what it was before. > > > > Note [1] also has a more full discussion justifying taking these backports. > > > > I proposed the same backport for 6.9 [2], and now for 6.6. 6.6 is the oldest > > kernel which needs the change: 35e351780fa9 was reverted for unrelated reasons > > in 6.1, and was never backported to 5.15 or earlier. > > AFAICT 35e351780fa9 was reverted in linux-6.6.y as well, so why isn't > this one a 4-part series concluding with a new backport of that commit? > I think without that, we don't need these in 6.6 either. Thanks, Ah! You are correct, I had failed to notice: dd782da47 ("Revert "fork: defer linking file vma until vma is fully initialized"") in linux-6.6.y. So, please ignore this series for 6.6 then. :) > > Alex > > > > > [1]: https://lore.kernel.org/all/20240702042948.2629267-1-leah.rumancik@gmail.com/T/ > > [2]: https://lore.kernel.org/r/20240717213339.1921530-1-axelrasmussen@google.com > > > > Alex Williamson (3): > > vfio: Create vfio_fs_type with inode per device > > vfio/pci: Use unmap_mapping_range() > > vfio/pci: Insert full vma on mmap'd MMIO fault > > > > drivers/vfio/device_cdev.c | 7 + > > drivers/vfio/group.c | 7 + > > drivers/vfio/pci/vfio_pci_core.c | 271 ++++++++----------------------- > > drivers/vfio/vfio_main.c | 44 +++++ > > include/linux/vfio.h | 1 + > > include/linux/vfio_pci_core.h | 2 - > > 6 files changed, 125 insertions(+), 207 deletions(-) > > > > -- > > 2.45.2.993.g49e7a77208-goog > > >