Message ID | 20230218002819.1486479-1-jthoughton@google.com (mailing list archive) |
---|---|
Headers | show |
Series | hugetlb: introduce HugeTLB high-granularity mapping | expand |
On 02/18/23 00:27, James Houghton wrote: > This series introduces the concept of HugeTLB high-granularity mapping > (HGM). This series teaches HugeTLB how to map HugeTLB pages at > high-granularity, similar to how THPs can be PTE-mapped. > > Support for HGM in this series is for MAP_SHARED VMAs on x86_64 only. Other > architectures and (some) support for MAP_PRIVATE will come later. > > This series is based on latest mm-unstable (ccd6a73daba9). > > Notable changes with this series > ================================ > > - hugetlb_add_file_rmap / hugetlb_remove_rmap are added to handle > mapcounting for non-anon hugetlb. > - The mapcounting scheme uses subpages' mapcounts for high-granularity > mappings, but it does not use subpages_mapcount(). This scheme > prevents the HugeTLB VMEMMAP optimization from being used, so it > will be improved in a later series. > - page_add_file_rmap and page_remove_rmap are updated so they can be > used by hugetlb_add_file_rmap / hugetlb_remove_rmap. > - MADV_SPLIT has been added to enable the userspace API changes that > HGM allows for: high-granularity UFFDIO_CONTINUE (and maybe other > changes in the future). MADV_SPLIT does NOT force all the mappings to > be PAGE_SIZE. > - MADV_COLLAPSE is expanded to include HugeTLB mappings. > > Old versions: > v1: https://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/ > RFC v2: https://lore.kernel.org/linux-mm/20221021163703.3218176-1-jthoughton@google.com/ > RFC v1: https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/ > > Changelog: > v1 -> v2 (thanks Peter for all your suggestions!): > - Changed mapcount to be more THP-like, and make HGM incompatible with > HVO. > - HGM is now disabled by default to leave HVO enabled by default. I understand the reasoning behind the move to THP-like mapcounting, and the incompatibility with HVO. However, I just got to patch 5 and realized either HGM or HVO will need to be chosen at kernel build time. That may not be an issue for cloud providers or others building their own kernels for internal use. However, distro kernels will need to pick one option or the other. Right now, my Fedora desktop has HVO enabled so it would likely not have HGM enabled. That is not a big deal for a desktop. Just curious, do we have distro kernel users that want to use HGM? I see it mentioned that this incompatibility will be addressed in a future series. This certainly will be required before HGM can be expanded for use cases such as memory errors and page poisoning. Just curious of other thoughts? Does the first version of HGM need to be compatible with HVO?
On 21.02.23 22:46, Mike Kravetz wrote: > On 02/18/23 00:27, James Houghton wrote: >> This series introduces the concept of HugeTLB high-granularity mapping >> (HGM). This series teaches HugeTLB how to map HugeTLB pages at >> high-granularity, similar to how THPs can be PTE-mapped. >> >> Support for HGM in this series is for MAP_SHARED VMAs on x86_64 only. Other >> architectures and (some) support for MAP_PRIVATE will come later. >> >> This series is based on latest mm-unstable (ccd6a73daba9). >> >> Notable changes with this series >> ================================ >> >> - hugetlb_add_file_rmap / hugetlb_remove_rmap are added to handle >> mapcounting for non-anon hugetlb. >> - The mapcounting scheme uses subpages' mapcounts for high-granularity >> mappings, but it does not use subpages_mapcount(). This scheme >> prevents the HugeTLB VMEMMAP optimization from being used, so it >> will be improved in a later series. >> - page_add_file_rmap and page_remove_rmap are updated so they can be >> used by hugetlb_add_file_rmap / hugetlb_remove_rmap. >> - MADV_SPLIT has been added to enable the userspace API changes that >> HGM allows for: high-granularity UFFDIO_CONTINUE (and maybe other >> changes in the future). MADV_SPLIT does NOT force all the mappings to >> be PAGE_SIZE. >> - MADV_COLLAPSE is expanded to include HugeTLB mappings. >> >> Old versions: >> v1: https://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/ >> RFC v2: https://lore.kernel.org/linux-mm/20221021163703.3218176-1-jthoughton@google.com/ >> RFC v1: https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/ >> >> Changelog: >> v1 -> v2 (thanks Peter for all your suggestions!): >> - Changed mapcount to be more THP-like, and make HGM incompatible with >> HVO. >> - HGM is now disabled by default to leave HVO enabled by default. > > I understand the reasoning behind the move to THP-like mapcounting, and the > incompatibility with HVO. However, I just got to patch 5 and realized either > HGM or HVO will need to be chosen at kernel build time. That may not be an > issue for cloud providers or others building their own kernels for internal > use. However, distro kernels will need to pick one option or the other. > Right now, my Fedora desktop has HVO enabled so it would likely not have > HGM enabled. That is not a big deal for a desktop. > > Just curious, do we have distro kernel users that want to use HGM? Most certainly I would say :)
On Wed, Feb 22, 2023 at 7:49 AM David Hildenbrand <david@redhat.com> wrote: > > On 21.02.23 22:46, Mike Kravetz wrote: > > On 02/18/23 00:27, James Houghton wrote: > >> This series introduces the concept of HugeTLB high-granularity mapping > >> (HGM). This series teaches HugeTLB how to map HugeTLB pages at > >> high-granularity, similar to how THPs can be PTE-mapped. > >> > >> Support for HGM in this series is for MAP_SHARED VMAs on x86_64 only. Other > >> architectures and (some) support for MAP_PRIVATE will come later. > >> > >> This series is based on latest mm-unstable (ccd6a73daba9). > >> > >> Notable changes with this series > >> ================================ > >> > >> - hugetlb_add_file_rmap / hugetlb_remove_rmap are added to handle > >> mapcounting for non-anon hugetlb. > >> - The mapcounting scheme uses subpages' mapcounts for high-granularity > >> mappings, but it does not use subpages_mapcount(). This scheme > >> prevents the HugeTLB VMEMMAP optimization from being used, so it > >> will be improved in a later series. > >> - page_add_file_rmap and page_remove_rmap are updated so they can be > >> used by hugetlb_add_file_rmap / hugetlb_remove_rmap. > >> - MADV_SPLIT has been added to enable the userspace API changes that > >> HGM allows for: high-granularity UFFDIO_CONTINUE (and maybe other > >> changes in the future). MADV_SPLIT does NOT force all the mappings to > >> be PAGE_SIZE. > >> - MADV_COLLAPSE is expanded to include HugeTLB mappings. > >> > >> Old versions: > >> v1: https://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/ > >> RFC v2: https://lore.kernel.org/linux-mm/20221021163703.3218176-1-jthoughton@google.com/ > >> RFC v1: https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/ > >> > >> Changelog: > >> v1 -> v2 (thanks Peter for all your suggestions!): > >> - Changed mapcount to be more THP-like, and make HGM incompatible with > >> HVO. > >> - HGM is now disabled by default to leave HVO enabled by default. > > > > I understand the reasoning behind the move to THP-like mapcounting, and the > > incompatibility with HVO. However, I just got to patch 5 and realized either > > HGM or HVO will need to be chosen at kernel build time. That may not be an > > issue for cloud providers or others building their own kernels for internal > > use. However, distro kernels will need to pick one option or the other. > > Right now, my Fedora desktop has HVO enabled so it would likely not have > > HGM enabled. That is not a big deal for a desktop. > > > > Just curious, do we have distro kernel users that want to use HGM? > > Most certainly I would say :) > Is it a blocker to merge in an initial implementation though? Do distro kernel users have a pressing need for HVO + HGM used in tandem? > -- > Thanks, > > David / dhildenb >
On 22.02.23 21:57, Mina Almasry wrote: > On Wed, Feb 22, 2023 at 7:49 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 21.02.23 22:46, Mike Kravetz wrote: >>> On 02/18/23 00:27, James Houghton wrote: >>>> This series introduces the concept of HugeTLB high-granularity mapping >>>> (HGM). This series teaches HugeTLB how to map HugeTLB pages at >>>> high-granularity, similar to how THPs can be PTE-mapped. >>>> >>>> Support for HGM in this series is for MAP_SHARED VMAs on x86_64 only. Other >>>> architectures and (some) support for MAP_PRIVATE will come later. >>>> >>>> This series is based on latest mm-unstable (ccd6a73daba9). >>>> >>>> Notable changes with this series >>>> ================================ >>>> >>>> - hugetlb_add_file_rmap / hugetlb_remove_rmap are added to handle >>>> mapcounting for non-anon hugetlb. >>>> - The mapcounting scheme uses subpages' mapcounts for high-granularity >>>> mappings, but it does not use subpages_mapcount(). This scheme >>>> prevents the HugeTLB VMEMMAP optimization from being used, so it >>>> will be improved in a later series. >>>> - page_add_file_rmap and page_remove_rmap are updated so they can be >>>> used by hugetlb_add_file_rmap / hugetlb_remove_rmap. >>>> - MADV_SPLIT has been added to enable the userspace API changes that >>>> HGM allows for: high-granularity UFFDIO_CONTINUE (and maybe other >>>> changes in the future). MADV_SPLIT does NOT force all the mappings to >>>> be PAGE_SIZE. >>>> - MADV_COLLAPSE is expanded to include HugeTLB mappings. >>>> >>>> Old versions: >>>> v1: https://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/ >>>> RFC v2: https://lore.kernel.org/linux-mm/20221021163703.3218176-1-jthoughton@google.com/ >>>> RFC v1: https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/ >>>> >>>> Changelog: >>>> v1 -> v2 (thanks Peter for all your suggestions!): >>>> - Changed mapcount to be more THP-like, and make HGM incompatible with >>>> HVO. >>>> - HGM is now disabled by default to leave HVO enabled by default. >>> >>> I understand the reasoning behind the move to THP-like mapcounting, and the >>> incompatibility with HVO. However, I just got to patch 5 and realized either >>> HGM or HVO will need to be chosen at kernel build time. That may not be an >>> issue for cloud providers or others building their own kernels for internal >>> use. However, distro kernels will need to pick one option or the other. >>> Right now, my Fedora desktop has HVO enabled so it would likely not have >>> HGM enabled. That is not a big deal for a desktop. >>> >>> Just curious, do we have distro kernel users that want to use HGM? >> >> Most certainly I would say :) >> > > Is it a blocker to merge in an initial implementation though? Do > distro kernel users have a pressing need for HVO + HGM used in tandem? At least RHEL9 seems to include HVO. It's not enabled as default (CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON not set), but compiled in so it can be runtime-enabled. Disabling HVO is not an option IMHO. Maybe, one could make both features compile-time compatible but runtime-mutually exclusive. Or work on a way to make them fully compatible right from the start.
On Thu, Feb 23, 2023 at 1:07 AM David Hildenbrand <david@redhat.com> wrote: > > On 22.02.23 21:57, Mina Almasry wrote: > > On Wed, Feb 22, 2023 at 7:49 AM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 21.02.23 22:46, Mike Kravetz wrote: > >>> On 02/18/23 00:27, James Houghton wrote: > >>>> This series introduces the concept of HugeTLB high-granularity mapping > >>>> (HGM). This series teaches HugeTLB how to map HugeTLB pages at > >>>> high-granularity, similar to how THPs can be PTE-mapped. > >>>> > >>>> Support for HGM in this series is for MAP_SHARED VMAs on x86_64 only. Other > >>>> architectures and (some) support for MAP_PRIVATE will come later. > >>>> > >>>> This series is based on latest mm-unstable (ccd6a73daba9). > >>>> > >>>> Notable changes with this series > >>>> ================================ > >>>> > >>>> - hugetlb_add_file_rmap / hugetlb_remove_rmap are added to handle > >>>> mapcounting for non-anon hugetlb. > >>>> - The mapcounting scheme uses subpages' mapcounts for high-granularity > >>>> mappings, but it does not use subpages_mapcount(). This scheme > >>>> prevents the HugeTLB VMEMMAP optimization from being used, so it > >>>> will be improved in a later series. > >>>> - page_add_file_rmap and page_remove_rmap are updated so they can be > >>>> used by hugetlb_add_file_rmap / hugetlb_remove_rmap. > >>>> - MADV_SPLIT has been added to enable the userspace API changes that > >>>> HGM allows for: high-granularity UFFDIO_CONTINUE (and maybe other > >>>> changes in the future). MADV_SPLIT does NOT force all the mappings to > >>>> be PAGE_SIZE. > >>>> - MADV_COLLAPSE is expanded to include HugeTLB mappings. > >>>> > >>>> Old versions: > >>>> v1: https://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/ > >>>> RFC v2: https://lore.kernel.org/linux-mm/20221021163703.3218176-1-jthoughton@google.com/ > >>>> RFC v1: https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/ > >>>> > >>>> Changelog: > >>>> v1 -> v2 (thanks Peter for all your suggestions!): > >>>> - Changed mapcount to be more THP-like, and make HGM incompatible with > >>>> HVO. > >>>> - HGM is now disabled by default to leave HVO enabled by default. > >>> > >>> I understand the reasoning behind the move to THP-like mapcounting, and the > >>> incompatibility with HVO. However, I just got to patch 5 and realized either > >>> HGM or HVO will need to be chosen at kernel build time. That may not be an > >>> issue for cloud providers or others building their own kernels for internal > >>> use. However, distro kernels will need to pick one option or the other. > >>> Right now, my Fedora desktop has HVO enabled so it would likely not have > >>> HGM enabled. That is not a big deal for a desktop. > >>> > >>> Just curious, do we have distro kernel users that want to use HGM? > >> > >> Most certainly I would say :) I'm not sure. Maybe distros want the hwpoison benefits HGM provides? But that's not implemented in this series. > >> > > > > Is it a blocker to merge in an initial implementation though? Do > > distro kernel users have a pressing need for HVO + HGM used in tandem? +1. I don't see why this should be a blocker. > > At least RHEL9 seems to include HVO. It's not enabled as default > (CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON not set), but compiled > in so it can be runtime-enabled. Disabling HVO is not an option IMHO. I agree! CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y is still the default here; I made sure not to change that. :) > > Maybe, one could make both features compile-time compatible but > runtime-mutually exclusive. Or work on a way to make them fully > compatible right from the start. For the sake of simplifying this series as much as possible, going with the THP-like mapcount scheme that we know works properly seems like the right decision to me, even though it is incompatible with HVO. Making HGM and HVO play nice at runtime is a little bit complicated, and it becomes worthless as soon as we optimize the mapcount strategy. So let's just optimize the mapcount strategy, but in a later series. As soon as this series has been fully reviewed, patches will be sent up to: 1. Change the mapcount scheme to make HGM and HVO compatible again (and make MADV_COLLAPSE faster) 2. Add arm64 support 3. Add hwpoison support If we try to integrate #1 with this series now, I fear that that will just slow things down more than if #1 is sent up by itself later. (FWIW, #2 is basically fully implemented and #3 is basically done for MAP_SHARED. Each of these series are MUCH smaller than this main one.) - James
On 23.02.23 16:53, James Houghton wrote: > On Thu, Feb 23, 2023 at 1:07 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 22.02.23 21:57, Mina Almasry wrote: >>> On Wed, Feb 22, 2023 at 7:49 AM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 21.02.23 22:46, Mike Kravetz wrote: >>>>> On 02/18/23 00:27, James Houghton wrote: >>>>>> This series introduces the concept of HugeTLB high-granularity mapping >>>>>> (HGM). This series teaches HugeTLB how to map HugeTLB pages at >>>>>> high-granularity, similar to how THPs can be PTE-mapped. >>>>>> >>>>>> Support for HGM in this series is for MAP_SHARED VMAs on x86_64 only. Other >>>>>> architectures and (some) support for MAP_PRIVATE will come later. >>>>>> >>>>>> This series is based on latest mm-unstable (ccd6a73daba9). >>>>>> >>>>>> Notable changes with this series >>>>>> ================================ >>>>>> >>>>>> - hugetlb_add_file_rmap / hugetlb_remove_rmap are added to handle >>>>>> mapcounting for non-anon hugetlb. >>>>>> - The mapcounting scheme uses subpages' mapcounts for high-granularity >>>>>> mappings, but it does not use subpages_mapcount(). This scheme >>>>>> prevents the HugeTLB VMEMMAP optimization from being used, so it >>>>>> will be improved in a later series. >>>>>> - page_add_file_rmap and page_remove_rmap are updated so they can be >>>>>> used by hugetlb_add_file_rmap / hugetlb_remove_rmap. >>>>>> - MADV_SPLIT has been added to enable the userspace API changes that >>>>>> HGM allows for: high-granularity UFFDIO_CONTINUE (and maybe other >>>>>> changes in the future). MADV_SPLIT does NOT force all the mappings to >>>>>> be PAGE_SIZE. >>>>>> - MADV_COLLAPSE is expanded to include HugeTLB mappings. >>>>>> >>>>>> Old versions: >>>>>> v1: https://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/ >>>>>> RFC v2: https://lore.kernel.org/linux-mm/20221021163703.3218176-1-jthoughton@google.com/ >>>>>> RFC v1: https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/ >>>>>> >>>>>> Changelog: >>>>>> v1 -> v2 (thanks Peter for all your suggestions!): >>>>>> - Changed mapcount to be more THP-like, and make HGM incompatible with >>>>>> HVO. >>>>>> - HGM is now disabled by default to leave HVO enabled by default. >>>>> >>>>> I understand the reasoning behind the move to THP-like mapcounting, and the >>>>> incompatibility with HVO. However, I just got to patch 5 and realized either >>>>> HGM or HVO will need to be chosen at kernel build time. That may not be an >>>>> issue for cloud providers or others building their own kernels for internal >>>>> use. However, distro kernels will need to pick one option or the other. >>>>> Right now, my Fedora desktop has HVO enabled so it would likely not have >>>>> HGM enabled. That is not a big deal for a desktop. >>>>> >>>>> Just curious, do we have distro kernel users that want to use HGM? >>>> >>>> Most certainly I would say :) > > I'm not sure. Maybe distros want the hwpoison benefits HGM provides? > But that's not implemented in this series. From what I can tell, HGM helps to improve live migration of VMs with gigantic pages. That sounds like a good reason why distros (that support virtualization) might want it independent of hwpoison changes.
On 02/23/23 07:53, James Houghton wrote: > On Thu, Feb 23, 2023 at 1:07 AM David Hildenbrand <david@redhat.com> wrote: > > > > On 22.02.23 21:57, Mina Almasry wrote: > > > On Wed, Feb 22, 2023 at 7:49 AM David Hildenbrand <david@redhat.com> wrote: > > >> > > >> On 21.02.23 22:46, Mike Kravetz wrote: > > >>> On 02/18/23 00:27, James Houghton wrote: > > >>>> This series introduces the concept of HugeTLB high-granularity mapping > > >>>> (HGM). This series teaches HugeTLB how to map HugeTLB pages at > > >>>> high-granularity, similar to how THPs can be PTE-mapped. > > >>>> > > >>>> Support for HGM in this series is for MAP_SHARED VMAs on x86_64 only. Other > > >>>> architectures and (some) support for MAP_PRIVATE will come later. > > >>>> > > >>>> This series is based on latest mm-unstable (ccd6a73daba9). > > >>>> > > >>>> Notable changes with this series > > >>>> ================================ > > >>>> > > >>>> - hugetlb_add_file_rmap / hugetlb_remove_rmap are added to handle > > >>>> mapcounting for non-anon hugetlb. > > >>>> - The mapcounting scheme uses subpages' mapcounts for high-granularity > > >>>> mappings, but it does not use subpages_mapcount(). This scheme > > >>>> prevents the HugeTLB VMEMMAP optimization from being used, so it > > >>>> will be improved in a later series. > > >>>> - page_add_file_rmap and page_remove_rmap are updated so they can be > > >>>> used by hugetlb_add_file_rmap / hugetlb_remove_rmap. > > >>>> - MADV_SPLIT has been added to enable the userspace API changes that > > >>>> HGM allows for: high-granularity UFFDIO_CONTINUE (and maybe other > > >>>> changes in the future). MADV_SPLIT does NOT force all the mappings to > > >>>> be PAGE_SIZE. > > >>>> - MADV_COLLAPSE is expanded to include HugeTLB mappings. > > >>>> > > >>>> Old versions: > > >>>> v1: https://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/ > > >>>> RFC v2: https://lore.kernel.org/linux-mm/20221021163703.3218176-1-jthoughton@google.com/ > > >>>> RFC v1: https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/ > > >>>> > > >>>> Changelog: > > >>>> v1 -> v2 (thanks Peter for all your suggestions!): > > >>>> - Changed mapcount to be more THP-like, and make HGM incompatible with > > >>>> HVO. > > >>>> - HGM is now disabled by default to leave HVO enabled by default. > > >>> > > >>> I understand the reasoning behind the move to THP-like mapcounting, and the > > >>> incompatibility with HVO. However, I just got to patch 5 and realized either > > >>> HGM or HVO will need to be chosen at kernel build time. That may not be an > > >>> issue for cloud providers or others building their own kernels for internal > > >>> use. However, distro kernels will need to pick one option or the other. > > >>> Right now, my Fedora desktop has HVO enabled so it would likely not have > > >>> HGM enabled. That is not a big deal for a desktop. > > >>> > > >>> Just curious, do we have distro kernel users that want to use HGM? > > >> > > >> Most certainly I would say :) > > I'm not sure. Maybe distros want the hwpoison benefits HGM provides? > But that's not implemented in this series. > > > >> > > > > > > Is it a blocker to merge in an initial implementation though? Do > > > distro kernel users have a pressing need for HVO + HGM used in tandem? > > +1. I don't see why this should be a blocker. > > > > > At least RHEL9 seems to include HVO. It's not enabled as default > > (CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON not set), but compiled > > in so it can be runtime-enabled. Disabling HVO is not an option IMHO. > > I agree! > > CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y is still the default here; I > made sure not to change that. :) > > > > > Maybe, one could make both features compile-time compatible but > > runtime-mutually exclusive. Or work on a way to make them fully > > compatible right from the start. > > For the sake of simplifying this series as much as possible, going > with the THP-like mapcount scheme that we know works properly seems > like the right decision to me, even though it is incompatible with > HVO. > > Making HGM and HVO play nice at runtime is a little bit complicated, > and it becomes worthless as soon as we optimize the mapcount strategy. > So let's just optimize the mapcount strategy, but in a later series. > > As soon as this series has been fully reviewed, patches will be sent up to: > 1. Change the mapcount scheme to make HGM and HVO compatible again > (and make MADV_COLLAPSE faster) > 2. Add arm64 support > 3. Add hwpoison support > > If we try to integrate #1 with this series now, I fear that that will > just slow things down more than if #1 is sent up by itself later. > > (FWIW, #2 is basically fully implemented and #3 is basically done for > MAP_SHARED. Each of these series are MUCH smaller than this main one.) By asking this question, my intention was NOT to force HGM and HVO compatibility now. Rather, just to ask if there were any distro kernels or environments that enable HVO now, and want HGM ASAP. Was hoping someone from Red Hat would chime in: thanks David! FYI - Oracle is keen on HVO to use every possible bit of memory. See, https://lore.kernel.org/linux-mm/20221110121214.6297-1-joao.m.martins@oracle.com/ In addition, Oracle kernel has CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y, so it can not immediately take advantage of HGM. That is OK 'for now'. I will try to ignore the mapcount issue right now and focus on the rest of the series. Thanks for all your efforts James!
* David Hildenbrand (david@redhat.com) wrote: > On 23.02.23 16:53, James Houghton wrote: > > On Thu, Feb 23, 2023 at 1:07 AM David Hildenbrand <david@redhat.com> wrote: > > > > > > On 22.02.23 21:57, Mina Almasry wrote: > > > > On Wed, Feb 22, 2023 at 7:49 AM David Hildenbrand <david@redhat.com> wrote: > > > > > > > > > > On 21.02.23 22:46, Mike Kravetz wrote: > > > > > > On 02/18/23 00:27, James Houghton wrote: > > > > > > > This series introduces the concept of HugeTLB high-granularity mapping > > > > > > > (HGM). This series teaches HugeTLB how to map HugeTLB pages at > > > > > > > high-granularity, similar to how THPs can be PTE-mapped. > > > > > > > > > > > > > > Support for HGM in this series is for MAP_SHARED VMAs on x86_64 only. Other > > > > > > > architectures and (some) support for MAP_PRIVATE will come later. > > > > > > > > > > > > > > This series is based on latest mm-unstable (ccd6a73daba9). > > > > > > > > > > > > > > Notable changes with this series > > > > > > > ================================ > > > > > > > > > > > > > > - hugetlb_add_file_rmap / hugetlb_remove_rmap are added to handle > > > > > > > mapcounting for non-anon hugetlb. > > > > > > > - The mapcounting scheme uses subpages' mapcounts for high-granularity > > > > > > > mappings, but it does not use subpages_mapcount(). This scheme > > > > > > > prevents the HugeTLB VMEMMAP optimization from being used, so it > > > > > > > will be improved in a later series. > > > > > > > - page_add_file_rmap and page_remove_rmap are updated so they can be > > > > > > > used by hugetlb_add_file_rmap / hugetlb_remove_rmap. > > > > > > > - MADV_SPLIT has been added to enable the userspace API changes that > > > > > > > HGM allows for: high-granularity UFFDIO_CONTINUE (and maybe other > > > > > > > changes in the future). MADV_SPLIT does NOT force all the mappings to > > > > > > > be PAGE_SIZE. > > > > > > > - MADV_COLLAPSE is expanded to include HugeTLB mappings. > > > > > > > > > > > > > > Old versions: > > > > > > > v1: https://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@google.com/ > > > > > > > RFC v2: https://lore.kernel.org/linux-mm/20221021163703.3218176-1-jthoughton@google.com/ > > > > > > > RFC v1: https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/ > > > > > > > > > > > > > > Changelog: > > > > > > > v1 -> v2 (thanks Peter for all your suggestions!): > > > > > > > - Changed mapcount to be more THP-like, and make HGM incompatible with > > > > > > > HVO. > > > > > > > - HGM is now disabled by default to leave HVO enabled by default. > > > > > > > > > > > > I understand the reasoning behind the move to THP-like mapcounting, and the > > > > > > incompatibility with HVO. However, I just got to patch 5 and realized either > > > > > > HGM or HVO will need to be chosen at kernel build time. That may not be an > > > > > > issue for cloud providers or others building their own kernels for internal > > > > > > use. However, distro kernels will need to pick one option or the other. > > > > > > Right now, my Fedora desktop has HVO enabled so it would likely not have > > > > > > HGM enabled. That is not a big deal for a desktop. > > > > > > > > > > > > Just curious, do we have distro kernel users that want to use HGM? > > > > > > > > > > Most certainly I would say :) > > > > I'm not sure. Maybe distros want the hwpoison benefits HGM provides? > > But that's not implemented in this series. > > From what I can tell, HGM helps to improve live migration of VMs with > gigantic pages. That sounds like a good reason why distros (that support > virtualization) might want it independent of hwpoison changes. Yes, in particular for postcopy migration of those VMs, where we can't afford the latency of waiting for the entire gigantic page to bubble along the network. Dave > -- > Thanks, > > David / dhildenb >