Message ID | 20210204183433.1431202-10-axelrasmussen@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | userfaultfd: add minor fault handling | expand |
Hi Axel- one typo found: On 2/4/21 10:34 AM, Axel Rasmussen wrote: > Reword / reorganize things a little bit into "lists", so new features / > modes / ioctls can sort of just be appended. Good plan. > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> > --- > Documentation/admin-guide/mm/userfaultfd.rst | 107 ++++++++++++------- > 1 file changed, 66 insertions(+), 41 deletions(-) > > diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst > index 65eefa66c0ba..cfd3daf59d0e 100644 > --- a/Documentation/admin-guide/mm/userfaultfd.rst > +++ b/Documentation/admin-guide/mm/userfaultfd.rst [snip] > - > -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should > -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to > -register a memory range in the ``userfaultfd`` by setting the > +events, except page fault notifications, may be generated: > + > +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events > + other than page faults are supported. These events are described in more > + detail below in the `Non-cooperative userfaultfd`_ section. > + > +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` > + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` > + registrations for hugetlbfs and shared memory (covering all shmem APIs, > + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, > + etc) virtual memory areas, respectively. > + > +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports > + ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory > + areas. > + > +The userland application should set the feature flags it intends to use (ah, userspace has moved to userland temporarily. :) > +when envoking the ``UFFDIO_API`` ioctl, to request that those features be invoking > +enabled if supported. thanks.
On Thu, Feb 4, 2021 at 11:57 AM Randy Dunlap <rdunlap@infradead.org> wrote: > > Hi Axel- > > one typo found: > > On 2/4/21 10:34 AM, Axel Rasmussen wrote: > > Reword / reorganize things a little bit into "lists", so new features / > > modes / ioctls can sort of just be appended. > > Good plan. > > > > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> > > --- > > Documentation/admin-guide/mm/userfaultfd.rst | 107 ++++++++++++------- > > 1 file changed, 66 insertions(+), 41 deletions(-) > > > > diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst > > index 65eefa66c0ba..cfd3daf59d0e 100644 > > --- a/Documentation/admin-guide/mm/userfaultfd.rst > > +++ b/Documentation/admin-guide/mm/userfaultfd.rst > > [snip] > > > - > > -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should > > -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to > > -register a memory range in the ``userfaultfd`` by setting the > > +events, except page fault notifications, may be generated: > > + > > +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events > > + other than page faults are supported. These events are described in more > > + detail below in the `Non-cooperative userfaultfd`_ section. > > + > > +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` > > + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` > > + registrations for hugetlbfs and shared memory (covering all shmem APIs, > > + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, > > + etc) virtual memory areas, respectively. > > + > > +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports > > + ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory > > + areas. > > + > > +The userland application should set the feature flags it intends to use > > (ah, userspace has moved to userland temporarily. :) For better or worse, other parts of the document I'm not touching also use this wording. Maybe we should s/userland/userspace/g, but perhaps better done as a separate commit to keep this diff focused? Anecdotally, the use of "userland" doesn't seem to be completely unprecedented (e.g. grep -r "userland" | wc -l yields 566 matches in the kernel tree). I don't have strong feelings, and I was amused by picturing some Shire-esque countryside with a friendly sign that reads: ~userland welcomes you~. :) > > > +when envoking the ``UFFDIO_API`` ioctl, to request that those features be > > invoking Whoops! Will send a new version with this fix. Thanks! > > > +enabled if supported. > > > thanks. > -- > ~Randy >
On 2/4/21 1:04 PM, Axel Rasmussen wrote: > On Thu, Feb 4, 2021 at 11:57 AM Randy Dunlap <rdunlap@infradead.org> wrote: >> >> Hi Axel- >> >> one typo found: >> >> On 2/4/21 10:34 AM, Axel Rasmussen wrote: >>> Reword / reorganize things a little bit into "lists", so new features / >>> modes / ioctls can sort of just be appended. >> >> Good plan. >> >>> >>> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> >>> --- >>> Documentation/admin-guide/mm/userfaultfd.rst | 107 ++++++++++++------- >>> 1 file changed, 66 insertions(+), 41 deletions(-) >>> >>> diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst >>> index 65eefa66c0ba..cfd3daf59d0e 100644 >>> --- a/Documentation/admin-guide/mm/userfaultfd.rst >>> +++ b/Documentation/admin-guide/mm/userfaultfd.rst >> >> [snip] >> >>> - >>> -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should >>> -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to >>> -register a memory range in the ``userfaultfd`` by setting the >>> +events, except page fault notifications, may be generated: >>> + >>> +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events >>> + other than page faults are supported. These events are described in more >>> + detail below in the `Non-cooperative userfaultfd`_ section. >>> + >>> +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` >>> + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` >>> + registrations for hugetlbfs and shared memory (covering all shmem APIs, >>> + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, >>> + etc) virtual memory areas, respectively. >>> + >>> +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports >>> + ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory >>> + areas. >>> + >>> +The userland application should set the feature flags it intends to use >> >> (ah, userspace has moved to userland temporarily. :) > > For better or worse, other parts of the document I'm not touching also > use this wording. Maybe we should s/userland/userspace/g, but perhaps > better done as a separate commit to keep this diff focused? > Anecdotally, the use of "userland" doesn't seem to be completely > unprecedented (e.g. grep -r "userland" | wc -l yields 566 matches in > the kernel tree). > > I don't have strong feelings, and I was amused by picturing some > Shire-esque countryside with a friendly sign that reads: ~userland > welcomes you~. :) I'm OK with not changing it. Up to you.
diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 65eefa66c0ba..cfd3daf59d0e 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -63,36 +63,36 @@ the generic ioctl available. The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl defines what memory types are supported by the ``userfaultfd`` and what -events, except page fault notifications, may be generated. - -If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs -virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in -``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be -set if the kernel supports registering ``userfaultfd`` ranges on shared -memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, -``MAP_SHARED``, ``memfd_create``, etc). - -The userland application that wants to use ``userfaultfd`` with hugetlbfs -or shared memory need to set the corresponding flag in -``uffdio_api.features`` to enable those features. - -If the userland desires to receive notifications for events other than -page faults, it has to verify that ``uffdio_api.features`` has appropriate -``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more -detail below in `Non-cooperative userfaultfd`_ section. - -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to -register a memory range in the ``userfaultfd`` by setting the +events, except page fault notifications, may be generated: + +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events + other than page faults are supported. These events are described in more + detail below in the `Non-cooperative userfaultfd`_ section. + +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` + registrations for hugetlbfs and shared memory (covering all shmem APIs, + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, + etc) virtual memory areas, respectively. + +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports + ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory + areas. + +The userland application should set the feature flags it intends to use +when envoking the ``UFFDIO_API`` ioctl, to request that those features be +enabled if supported. + +Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER`` +ioctl should be invoked (if present in the returned ``uffdio_api.ioctls`` +bitmask) to register a memory range in the ``userfaultfd`` by setting the uffdio_register structure accordingly. The ``uffdio_register.mode`` bitmask will specify to the kernel which kind of faults to track for -the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing -pages). The ``UFFDIO_REGISTER`` ioctl will return the +the range. The ``UFFDIO_REGISTER`` ioctl will return the ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve userfaults on the range registered. Not all ioctls will necessarily be -supported for all memory types depending on the underlying virtual -memory backend (anonymous memory vs tmpfs vs real filebacked -mappings). +supported for all memory types (e.g. anonymous memory vs. shmem vs. +hugetlbfs), or all types of intercepted faults. Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove @@ -100,21 +100,46 @@ memory from the ``userfaultfd`` registered range). This means a userfault could be triggering just before userland maps in the background the user-faulted page. -The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That -atomically copies a page into the userfault registered range and wakes -up the blocked userfaults -(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set). -Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in -guaranteeing that nothing can see an half copied page since it'll -keep userfaulting until the copy has finished. +Resolving Userfaults +-------------------- + +There are three basic ways to resolve userfaults: + +- ``UFFDIO_COPY`` atomically copies some existing page contents from + userspace. + +- ``UFFDIO_ZEROPAGE`` atomically zeros the new page. + +- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page. + +These operations are atomic in the sense that they guarantee nothing can +see a half-populated page, since readers will keep userfaulting until the +operation has finished. + +By default, these wake up userfaults blocked on the range in question. +They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates +that waking will be done separately at some later time. + +Which ioctl to choose depends on the kind of page fault, and what we'd +like to do to resolve it: + +- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be + resolved by either providing a new page (``UFFDIO_COPY``), or mapping + the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map + the zero page for a missing fault. With userfaultfd, userspace can + decide what content to provide before the faulting thread continues. + +- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in + the page cache). Userspace has the option of modifying the page's + contents before resolving the fault. Once the contents are correct + (modified or not), userspace asks the kernel to map the page and let the + faulting thread continue with ``UFFDIO_CONTINUE``. Notes: -- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then - you must provide some kind of page in your thread after reading from - the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``. - The normal behavior of the OS automatically providing a zero page on - an anonymous mmaping is not in place. +- You can tell which kind of fault occurred by examining + ``pagefault.flags`` within the ``uffd_msg``, checking for the + ``UFFD_PAGEFAULT_FLAG_*`` flags. - None of the page-delivering ioctls default to the range that you registered with. You must fill in all fields for the appropriate @@ -122,9 +147,9 @@ Notes: - You get the address of the access that triggered the missing page event out of a struct uffd_msg that you read in the thread from the - uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or - ``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then - the first of any of those IOCTLs wakes up the faulting thread. + uffd. You can supply as many pages as you want with these IOCTLs. + Keep in mind that unless you used DONTWAKE then the first of any of + those IOCTLs wakes up the faulting thread. - Be sure to test for all errors including (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges
Reword / reorganize things a little bit into "lists", so new features / modes / ioctls can sort of just be appended. Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to intercept and resolve minor faults. Make it clear that COPY and ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR faults. Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> --- Documentation/admin-guide/mm/userfaultfd.rst | 107 ++++++++++++------- 1 file changed, 66 insertions(+), 41 deletions(-)