mbox series

[v1,00/10] mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE)

Message ID 20191024120938.11237-1-david@redhat.com (mailing list archive)
Headers show
Series mm: Don't mark hotplugged pages PG_reserved (including ZONE_DEVICE) | expand

Message

David Hildenbrand Oct. 24, 2019, 12:09 p.m. UTC
This is the result of a recent discussion with Michal ([1], [2]). Right
now we set all pages PG_reserved when initializing hotplugged memmaps. This
includes ZONE_DEVICE memory. In case of system memory, PG_reserved is
cleared again when onlining the memory, in case of ZONE_DEVICE memory
never.

In ancient times, we needed PG_reserved, because there was no way to tell
whether the memmap was already properly initialized. We now have
SECTION_IS_ONLINE for that in the case of !ZONE_DEVICE memory. ZONE_DEVICE
memory is already initialized deferred, and there shouldn't be a visible
change in that regard.

One of the biggest fears were side effects. I went ahead and audited all
users of PageReserved(). The details can be found in "mm/memory_hotplug:
Don't mark pages PG_reserved when initializing the memmap".

This patch set adapts all relevant users of PageReserved() to keep the
existing behavior in respect to ZONE_DEVICE pages. The biggest part part
that needs changes is KVM, to keep the existing behavior (that's all I
care about in this series).

Note that this series is able to rely completely on pfn_to_online_page().
No new is_zone_device_page() calles are introduced (as requested by Dan).
We are currently discussing a way to mark also ZONE_DEVICE memmaps as
active/initialized - pfn_active() - and lightweight locking to make sure
memmaps remain active (e.g., using RCU). We might later be able to convert
some suers of pfn_to_online_page() to pfn_active(). Details can be found
in [3], however, this represents yet another cleanup/fix we'll perform
on top of this cleanup.

I only gave it a quick test with DIMMs on x86-64, but didn't test the
ZONE_DEVICE part at all (any tips for a nice QEMU setup?). Also, I didn't
test the KVM parts (especially with ZONE_DEVICE pages or no memmap at all).
Compile-tested on x86-64 and PPC.

Based on next/master. The current version (kept updated) can be found at:
    https://github.com/davidhildenbrand/linux.git online_reserved_cleanup

RFC -> v1:
- Dropped "staging/gasket: Prepare gasket_release_page() for PG_reserved
  changes"
- Dropped "staging: kpc2000: Prepare transfer_complete_cb() for PG_reserved
  changes"
- Converted "mm/usercopy.c: Prepare check_page_span() for PG_reserved
  changes" to "mm/usercopy.c: Update comment in check_page_span()
  regarding ZONE_DEVICE"
- No new users of is_zone_device_page() are introduced.
- Rephrased comments and patch descriptions.

[1] https://lkml.org/lkml/2019/10/21/736
[2] https://lkml.org/lkml/2019/10/21/1034
[3] https://www.spinics.net/lists/linux-mm/msg194112.html

Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: devel@driverdev.osuosl.org
Cc: xen-devel@lists.xenproject.org
Cc: x86@kernel.org
Cc: Alexander Duyck <alexander.duyck@gmail.com>

David Hildenbrand (10):
  mm/memory_hotplug: Don't allow to online/offline memory blocks with
    holes
  KVM: x86/mmu: Prepare kvm_is_mmio_pfn() for PG_reserved changes
  KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes
  vfio/type1: Prepare is_invalid_reserved_pfn() for PG_reserved changes
  powerpc/book3s: Prepare kvmppc_book3s_instantiate_page() for
    PG_reserved changes
  powerpc/64s: Prepare hash_page_do_lazy_icache() for PG_reserved
    changes
  powerpc/mm: Prepare maybe_pte_to_page() for PG_reserved changes
  x86/mm: Prepare __ioremap_check_ram() for PG_reserved changes
  mm/memory_hotplug: Don't mark pages PG_reserved when initializing the
    memmap
  mm/usercopy.c: Update comment in check_page_span() regarding
    ZONE_DEVICE

 arch/powerpc/kvm/book3s_64_mmu_radix.c | 14 +++++----
 arch/powerpc/mm/book3s64/hash_utils.c  | 10 +++---
 arch/powerpc/mm/pgtable.c              | 10 +++---
 arch/x86/kvm/mmu.c                     | 29 ++++++++++-------
 arch/x86/mm/ioremap.c                  | 13 ++++++--
 drivers/hv/hv_balloon.c                |  6 ++++
 drivers/vfio/vfio_iommu_type1.c        | 10 ++++--
 drivers/xen/balloon.c                  |  7 +++++
 include/linux/page-flags.h             |  8 +----
 mm/memory_hotplug.c                    | 43 +++++++++++++++++++-------
 mm/page_alloc.c                        | 11 -------
 mm/usercopy.c                          |  6 ++--
 virt/kvm/kvm_main.c                    | 10 ++++--
 13 files changed, 111 insertions(+), 66 deletions(-)

Comments

David Hildenbrand Nov. 1, 2019, 7:24 p.m. UTC | #1
On 24.10.19 14:09, David Hildenbrand wrote:
> This is the result of a recent discussion with Michal ([1], [2]). Right
> now we set all pages PG_reserved when initializing hotplugged memmaps. This
> includes ZONE_DEVICE memory. In case of system memory, PG_reserved is
> cleared again when onlining the memory, in case of ZONE_DEVICE memory
> never.
> 
> In ancient times, we needed PG_reserved, because there was no way to tell
> whether the memmap was already properly initialized. We now have
> SECTION_IS_ONLINE for that in the case of !ZONE_DEVICE memory. ZONE_DEVICE
> memory is already initialized deferred, and there shouldn't be a visible
> change in that regard.
> 
> One of the biggest fears were side effects. I went ahead and audited all
> users of PageReserved(). The details can be found in "mm/memory_hotplug:
> Don't mark pages PG_reserved when initializing the memmap".
> 
> This patch set adapts all relevant users of PageReserved() to keep the
> existing behavior in respect to ZONE_DEVICE pages. The biggest part part
> that needs changes is KVM, to keep the existing behavior (that's all I
> care about in this series).
> 
> Note that this series is able to rely completely on pfn_to_online_page().
> No new is_zone_device_page() calles are introduced (as requested by Dan).
> We are currently discussing a way to mark also ZONE_DEVICE memmaps as
> active/initialized - pfn_active() - and lightweight locking to make sure
> memmaps remain active (e.g., using RCU). We might later be able to convert
> some suers of pfn_to_online_page() to pfn_active(). Details can be found
> in [3], however, this represents yet another cleanup/fix we'll perform
> on top of this cleanup.
> 
> I only gave it a quick test with DIMMs on x86-64, but didn't test the
> ZONE_DEVICE part at all (any tips for a nice QEMU setup?). Also, I didn't
> test the KVM parts (especially with ZONE_DEVICE pages or no memmap at all).
> Compile-tested on x86-64 and PPC.
> 

Jeff Moyer ran some NVDIMM test cases for me (thanks!!!), including 
xfstests, pmdk, and ndctl. No regressions found.

I will run some KVM tests, especially NDIMM passthrough, but will have 
to setup a test environment first.

I would appreciate some review in the meantime. :)