Message ID | 20240812181225.1360970-1-peterx@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | mm/mprotect: Fix dax puds | expand |
Peter Xu <peterx@redhat.com> writes: > [Based on mm-unstable, commit 98808d08fc0f, Aug 7th. NOTE: it is > intentional to not have rebased to latest mm-unstable, as this is to > replace the queued v4] > > v5 Changelog: > - Rename patch subject "mm/x86: arch_check_zapped_pud()", add "Implement" [tglx] > - Mostly rewrote commit messages for the x86 patches, follow -tip rules [tglx] > - Line wrap fixes (to mostly avoid newlines when unnecessary) [tglx] > - English fixes [tglx] > - Fix a build issue only happens with i386 pae + clang > https://lore.kernel.org/r/202408111850.Y7rbVXOo-lkp@intel.com > > v1: https://lore.kernel.org/r/20240621142504.1940209-1-peterx@redhat.com > v2: https://lore.kernel.org/r/20240703212918.2417843-1-peterx@redhat.com > v3: https://lore.kernel.org/r/20240715192142.3241557-1-peterx@redhat.com > v4: https://lore.kernel.org/r/20240807194812.819412-1-peterx@redhat.com > > Dax supports pud pages for a while, but mprotect on puds was missing since > the start. This series tries to fix that by providing pud handling in > mprotect(). The goal is to add more types of pud mappings like hugetlb or > pfnmaps. This series paves way for it by fixing known pud entries. > > Considering nobody reported this until when I looked at those other types > of pud mappings, I am thinking maybe it doesn't need to be a fix for stable > and this may not need to be backported. I would guess whoever cares about > mprotect() won't care 1G dax puds yet, vice versa. I hope fixing that in > new kernels would be fine, but I'm open to suggestions. > > There're a few small things changed to teach mprotect work on PUDs. E.g. it > will need to start with dropping NUMA_HUGE_PTE_UPDATES which may stop > making sense when there can be more than one type of huge pte. OTOH, we'll > also need to push the mmu notifiers from pmd to pud layers, which might > need some attention but so far I think it's safe. For such details, please > refer to each patch's commit message. > > The mprotect() pud process should be straightforward, as I kept it as > simple as possible. There's no NUMA handled as dax simply doesn't support > that. There's also no userfault involvements as file memory (even if work > with userfault-wp async mode) will need to split a pud, so pud entry > doesn't need to yet know userfault's existance (but hugetlb entries will; > that's also for later). > > Tests > ===== > > What I did test: > > - cross-build tests that I normally cover [1] > > - smoke tested on x86_64 the simplest program [2] on dev_dax 1G PUD > mprotect() using QEMU's nvdimm emulations [3] and ndctl to create > namespaces with proper alignments, which used to throw "bad pud" but now > it'll run through all fine. I checked sigbus happens if with illegal > access on protected puds. > > - vmtests. > > What I didn't test: > > - fsdax: I wanted to also give it a shot, but only until then I noticed it > doesn't seem to be supported (according to dax_iomap_fault(), which will > always fallback on PUD_ORDER). I did remember it was supported before, I > could miss something important there.. please shoot if so. > > - userfault wp-async: I also wanted to test userfault-wp async be able to > split huge puds (here it's simply a clear_pud.. though), but it won't > work for devdax anyway due to not allowed to do smaller than 1G faults in > this case. So skip too. > > - Power, as no hardware on hand. Does it need some specific configuration, or just any Power machine will do? cheers
On Tue, Aug 13, 2024 at 10:50:04PM +1000, Michael Ellerman wrote: > > - Power, as no hardware on hand. > > Does it need some specific configuration, or just any Power machine will do? I am not familiar with both dax and power to be sure on the hardware implications here, sorry. I think as long as one can create some /dev/dax with 1g mapping alignment (I suppose normally with ndctl) then it should be able to hit the mprotect() path. One can verify first without this series that it could trigger a bad pud, then it'll be away after this series applied. Meanwhile the mprotect() should apply on the 1g dax range properly. Thanks,
On Mon, Aug 12, 2024 at 8:12 PM Peter Xu <peterx@redhat.com> wrote: > Dax supports pud pages for a while, but mprotect on puds was missing since > the start. This series tries to fix that by providing pud handling in > mprotect(). The goal is to add more types of pud mappings like hugetlb or > pfnmaps. This series paves way for it by fixing known pud entries. Do people actually use hardware where they can use PUD THP mappings for DAX? I thought that was just some esoteric feature that isn't actually usable on almost any system. Was I wrong about that? I think another example that probably doesn't play entirely nice with PUD THP mappings is mremap()'s move_page_tables(). If dax_get_unmapped_area() allows creating a VMA at an unaligned start address (which I think it does?), move_page_tables() can probably end up copying from an aligned address mapped with a huge PUD entry to an unaligned address that needs to be mapped at the PTE level, and I think that will probably cause it to call into get_old_pmd() while a huge PUD entry is still present, which will probably get us a pud_bad() error or such?
On Mon, Nov 11, 2024 at 10:20:59PM +0100, Jann Horn wrote: > On Mon, Aug 12, 2024 at 8:12 PM Peter Xu <peterx@redhat.com> wrote: > > Dax supports pud pages for a while, but mprotect on puds was missing since > > the start. This series tries to fix that by providing pud handling in > > mprotect(). The goal is to add more types of pud mappings like hugetlb or > > pfnmaps. This series paves way for it by fixing known pud entries. > > Do people actually use hardware where they can use PUD THP mappings > for DAX? I thought that was just some esoteric feature that isn't > actually usable on almost any system. > Was I wrong about that? I did run it with a qemu emulated nvdimm device. Though in reality I've no idea on how many people are using it.. > > I think another example that probably doesn't play entirely nice with > PUD THP mappings is mremap()'s move_page_tables(). If > dax_get_unmapped_area() allows creating a VMA at an unaligned start > address (which I think it does?), move_page_tables() can probably end > up copying from an aligned address mapped with a huge PUD entry to an > unaligned address that needs to be mapped at the PTE level, and I > think that will probably cause it to call into get_old_pmd() while a > huge PUD entry is still present, which will probably get us a > pud_bad() error or such? I think you're probably right, that we have other places that may not work well with pud mappings. I also wonder whether dax_get_unmapped_area() needs to properly handle MAP_FIXED, even for PMD mappings. It looks like it always fallbacks to the default mm_get_unmapped_area() with FIXED, which have no idea on dax->alignment so it'll always allow it.. The issue is I'm not sure dax pmd can be split at all, while I think split-able is needed when mremap from a pmd-aligned address to a !pmd-aligned address. Thanks,
On 13.11.24 17:39, Peter Xu wrote: > On Mon, Nov 11, 2024 at 10:20:59PM +0100, Jann Horn wrote: >> On Mon, Aug 12, 2024 at 8:12 PM Peter Xu <peterx@redhat.com> wrote: >>> Dax supports pud pages for a while, but mprotect on puds was missing since >>> the start. This series tries to fix that by providing pud handling in >>> mprotect(). The goal is to add more types of pud mappings like hugetlb or >>> pfnmaps. This series paves way for it by fixing known pud entries. >> >> Do people actually use hardware where they can use PUD THP mappings >> for DAX? I thought that was just some esoteric feature that isn't >> actually usable on almost any system. >> Was I wrong about that? > > I did run it with a qemu emulated nvdimm device. Though in reality I've no > idea on how many people are using it.. I wonder if we still have to support it ... or could disable it+rip it out.
On Wed, Nov 13, 2024 at 05:42:15PM +0100, David Hildenbrand wrote: > On 13.11.24 17:39, Peter Xu wrote: > > On Mon, Nov 11, 2024 at 10:20:59PM +0100, Jann Horn wrote: > > > On Mon, Aug 12, 2024 at 8:12 PM Peter Xu <peterx@redhat.com> wrote: > > > > Dax supports pud pages for a while, but mprotect on puds was missing since > > > > the start. This series tries to fix that by providing pud handling in > > > > mprotect(). The goal is to add more types of pud mappings like hugetlb or > > > > pfnmaps. This series paves way for it by fixing known pud entries. > > > > > > Do people actually use hardware where they can use PUD THP mappings > > > for DAX? I thought that was just some esoteric feature that isn't > > > actually usable on almost any system. > > > Was I wrong about that? > > > > I did run it with a qemu emulated nvdimm device. Though in reality I've no > > idea on how many people are using it.. > > I wonder if we still have to support it ... or could disable it+rip it out. Note that in my previous email, I also mentioned mremap() for PMD on dax too. If that's a real problem, it won't be fixed even if dropping dax PUD support. And we definitely want to understand whether there're still users on pud dax to consider dropping anything.. it could still be that both mprotect() and mremap() are not yet used in the current use cases. Thanks,
On 13.11.24 18:56, Peter Xu wrote: > On Wed, Nov 13, 2024 at 05:42:15PM +0100, David Hildenbrand wrote: >> On 13.11.24 17:39, Peter Xu wrote: >>> On Mon, Nov 11, 2024 at 10:20:59PM +0100, Jann Horn wrote: >>>> On Mon, Aug 12, 2024 at 8:12 PM Peter Xu <peterx@redhat.com> wrote: >>>>> Dax supports pud pages for a while, but mprotect on puds was missing since >>>>> the start. This series tries to fix that by providing pud handling in >>>>> mprotect(). The goal is to add more types of pud mappings like hugetlb or >>>>> pfnmaps. This series paves way for it by fixing known pud entries. >>>> >>>> Do people actually use hardware where they can use PUD THP mappings >>>> for DAX? I thought that was just some esoteric feature that isn't >>>> actually usable on almost any system. >>>> Was I wrong about that? >>> >>> I did run it with a qemu emulated nvdimm device. Though in reality I've no >>> idea on how many people are using it.. >> >> I wonder if we still have to support it ... or could disable it+rip it out. > > Note that in my previous email, I also mentioned mremap() for PMD on dax > too. If that's a real problem, it won't be fixed even if dropping dax PUD > support. > Very true. > And we definitely want to understand whether there're still users on pud > dax to consider dropping anything.. it could still be that both mprotect() > and mremap() are not yet used in the current use cases. Right, but at least NVDIMMs are getting less important. Just a thought if this is really still worth having.