mbox series

[v4,00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

Message ID 20220127030949.19396-1-alex.sierra@amd.com (mailing list archive)
Headers show
Series Add MEMORY_DEVICE_COHERENT for coherent device memory mapping | expand

Message

Sierra Guiza, Alejandro (Alex) Jan. 27, 2022, 3:09 a.m. UTC
This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
owned by a device that can be mapped into CPU page tables like
MEMORY_DEVICE_GENERIC and can also be migrated like
MEMORY_DEVICE_PRIVATE.

Christoph, the suggestion to incorporate Ralph Campbell’s refcount
cleanup patch into our hardware page migration patchset originally came
from you, but it proved impractical to do things in that order because
the refcount cleanup introduced a bug with wide ranging structural
implications. Instead, we amended Ralph’s patch so that it could be
applied after merging the migration work. As we saw from the recent
discussion, merging the refcount work is going to take some time and
cooperation between multiple development groups, while the migration
work is ready now and is needed now. So we propose to merge this
patchset first and continue to work with Ralph and others to merge the
refcount cleanup separately, when it is ready.

This patch series is mostly self-contained except for a few places where
it needs to update other subsystems to handle the new memory type.

System stability and performance are not affected according to our
ongoing testing, including xfstests.

How it works: The system BIOS advertises the GPU device memory
(aka VRAM) as SPM (special purpose memory) in the UEFI system address
map.

The amdgpu driver registers the memory with devmap as
MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for
this hardware page migration capability is the Frontier supercomputer
project. This functionality is not AMD-specific. We expect other GPU
vendors to find this functionality useful, and possibly other hardware
types in the future.

Our test nodes in the lab are similar to the Frontier configuration,
with .5 TB of system memory plus 256 GB of device memory split across
4 GPUs, all in a single coherent address space. Page migration is
expected to improve application efficiency significantly. We will
report empirical results as they become available.

We extended hmm_test to cover migration of MEMORY_DEVICE_COHERENT. This
patch set builds on HMM and our SVM memory manager already merged in
5.15.

v2:
- test_hmm is now able to create private and coherent device mirror
instances in the same driver probe. This adds more usability to the hmm
test by not having to remove the kernel module for each device type
test (private/coherent type). This is done by passing the module
parameters spm_addr_dev0 & spm_addr_dev1. In this case, it will create
four instances of device_mirror. The first two correspond to private
device type, the last two to coherent type. Then, they can be easily
accessed from user space through /dev/hmm_mirror<num_device>. Usually
num_device 0 and 1 are for private, and 2 and 3 for coherent types.

- Coherent device type pages at gup are now migrated back to system
memory if they have been long term pinned (FOLL_LONGTERM). The reason
is these pages could eventually interfere with their own device memory
manager. A new hmm_gup_test has been added to the hmm-test to test this
functionality. It makes use of the gup_test module to long term pin
user pages that have been migrate to device memory first.

- Other patch corrections made by Felix, Alistair and Christoph.

v3:
- Based on last v2 feedback we got from Alistair, we've decided to
remove migration logic for FOLL_LONGTERM coherent device type pages at
gup for now. Ideally, this should be done through the kernel mm,
instead of calling the device driver to do it. Currently, there's no
support for migrating device pages based on pfn, mainly because
migrate_pages() relies on pages being LRU pages. Alistair mentioned, he
has started to work on adding this migrate device pages logic. For now,
we fail on get_user_pages call with FOLL_LONGTERM for DEVICE_COHERENT
pages.

- Also, hmm_gup_test has been removed from hmm-test. We plan to include
it again after this migration work is ready.

- Addressed Liam Howlett's feedback changes.

v4:
- Addressed Alistair Popple's last v3 feedback.

- Use the same system entry path for coherent device pages at
migrate_vma_insert_page.

- Add coherent device type support for try_to_migrate /
try_to_migrate_one.

- Include number of coherent device pages successfully migrated back to
system at test_hmm. Made the proper changes to hmm-test to read/check
this number.

Alex Sierra (10):
  mm: add zone device coherent type memory support
  mm: add device coherent vma selection for memory migration
  mm/gup: fail get_user_pages for LONGTERM dev coherent type
  drm/amdkfd: add SPM support for SVM
  drm/amdkfd: coherent type as sys mem on migration to ram
  lib: test_hmm add ioctl to get zone device type
  lib: test_hmm add module param for zone device type
  lib: add support for device coherent type in test_hmm
  tools: update hmm-test to support device coherent type
  tools: update test_hmm script to support SP config

 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  34 ++-
 include/linux/memremap.h                 |   8 +
 include/linux/migrate.h                  |   1 +
 include/linux/mm.h                       |  16 +
 lib/test_hmm.c                           | 356 +++++++++++++++++------
 lib/test_hmm_uapi.h                      |  22 +-
 mm/gup.c                                 |   7 +
 mm/memcontrol.c                          |   6 +-
 mm/memory-failure.c                      |   8 +-
 mm/memremap.c                            |  14 +-
 mm/migrate.c                             |  56 ++--
 mm/rmap.c                                |  20 +-
 tools/testing/selftests/vm/hmm-tests.c   | 123 ++++++--
 tools/testing/selftests/vm/test_hmm.sh   |  24 +-
 14 files changed, 529 insertions(+), 166 deletions(-)

Comments

Andrew Morton Jan. 27, 2022, 10:32 p.m. UTC | #1
On Wed, 26 Jan 2022 21:09:39 -0600 Alex Sierra <alex.sierra@amd.com> wrote:

> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
> owned by a device that can be mapped into CPU page tables like
> MEMORY_DEVICE_GENERIC and can also be migrated like
> MEMORY_DEVICE_PRIVATE.

Some more reviewer input appears to be desirable here.

I was going to tentatively add it to -mm and -next, but problems. 
5.17-rc1's mm/migrate.c:migrate_vma_check_page() is rather different
from the tree you patched.  Please redo, refresh and resend?
Sierra Guiza, Alejandro (Alex) Jan. 27, 2022, 11:20 p.m. UTC | #2
Andrew,
We're somehow new on this procedure. Are you referring to rebase this 
patch series to
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
<5.17-rc1 tag>?

Regards,
Alex Sierra

Alex Deucher,
Just a quick heads up. This patch series contains changes to the amdgpu 
driver which we're
planning to merge through Andrew's tree, If that's ok with you.

Regards,
Alex Sierra

On 1/27/2022 4:32 PM, Andrew Morton wrote:
> On Wed, 26 Jan 2022 21:09:39 -0600 Alex Sierra <alex.sierra@amd.com> wrote:
>
>> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
>> owned by a device that can be mapped into CPU page tables like
>> MEMORY_DEVICE_GENERIC and can also be migrated like
>> MEMORY_DEVICE_PRIVATE.
> Some more reviewer input appears to be desirable here.
>
> I was going to tentatively add it to -mm and -next, but problems.
> 5.17-rc1's mm/migrate.c:migrate_vma_check_page() is rather different
> from the tree you patched.  Please redo, refresh and resend?
>
Andrew Morton Jan. 28, 2022, 7:08 a.m. UTC | #3
On Thu, 27 Jan 2022 17:20:40 -0600 "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> wrote:

> Andrew,
> We're somehow new on this procedure. Are you referring to rebase this 
> patch series to
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> <5.17-rc1 tag>?

No, against current Linus mainline, please.
Alex Deucher Jan. 28, 2022, 3 p.m. UTC | #4
[Public]

> -----Original Message-----
> From: Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>
> Sent: Thursday, January 27, 2022 6:21 PM
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; linux-mm@kvack.org;
> rcampbell@nvidia.com; linux-ext4@vger.kernel.org; linux-
> xfs@vger.kernel.org; amd-gfx@lists.freedesktop.org; dri-
> devel@lists.freedesktop.org; hch@lst.de; jgg@nvidia.com;
> jglisse@redhat.com; apopple@nvidia.com; willy@infradead.org; Deucher,
> Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH v4 00/10] Add MEMORY_DEVICE_COHERENT for
> coherent device memory mapping
> 
> Andrew,
> We're somehow new on this procedure. Are you referring to rebase this
> patch series to git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-
> next.git
> <5.17-rc1 tag>?
> 
> Regards,
> Alex Sierra
> 
> Alex Deucher,
> Just a quick heads up. This patch series contains changes to the amdgpu
> driver which we're planning to merge through Andrew's tree, If that's ok with
> you.

No problem.

Thanks!

Alex

> 
> Regards,
> Alex Sierra
> 
> On 1/27/2022 4:32 PM, Andrew Morton wrote:
> > On Wed, 26 Jan 2022 21:09:39 -0600 Alex Sierra <alex.sierra@amd.com>
> wrote:
> >
> >> This patch series introduces MEMORY_DEVICE_COHERENT, a type of
> memory
> >> owned by a device that can be mapped into CPU page tables like
> >> MEMORY_DEVICE_GENERIC and can also be migrated like
> >> MEMORY_DEVICE_PRIVATE.
> > Some more reviewer input appears to be desirable here.
> >
> > I was going to tentatively add it to -mm and -next, but problems.
> > 5.17-rc1's mm/migrate.c:migrate_vma_check_page() is rather different
> > from the tree you patched.  Please redo, refresh and resend?
> >
Felix Kuehling Jan. 28, 2022, 5:09 p.m. UTC | #5
Thank you, Alex for your persistence with this patch series. Fee free to 
add my Acked-by to all the patches that don't already have my R-b. I 
have done pretty through reviews of previous versions of those patches, 
but obviously missed a lot of issues pointed out by real MM experts.

Thank you Alistair for your reviews, feedback and collaboration!

Regards,
   Felix


Am 2022-01-27 um 18:20 schrieb Sierra Guiza, Alejandro (Alex):
> Andrew,
> We're somehow new on this procedure. Are you referring to rebase this 
> patch series to
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> <5.17-rc1 tag>?
>
> Regards,
> Alex Sierra
>
> Alex Deucher,
> Just a quick heads up. This patch series contains changes to the 
> amdgpu driver which we're
> planning to merge through Andrew's tree, If that's ok with you.
>
> Regards,
> Alex Sierra
>
> On 1/27/2022 4:32 PM, Andrew Morton wrote:
>> On Wed, 26 Jan 2022 21:09:39 -0600 Alex Sierra <alex.sierra@amd.com> 
>> wrote:
>>
>>> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
>>> owned by a device that can be mapped into CPU page tables like
>>> MEMORY_DEVICE_GENERIC and can also be migrated like
>>> MEMORY_DEVICE_PRIVATE.
>> Some more reviewer input appears to be desirable here.
>>
>> I was going to tentatively add it to -mm and -next, but problems.
>> 5.17-rc1's mm/migrate.c:migrate_vma_check_page() is rather different
>> from the tree you patched.  Please redo, refresh and resend?
>>
Christoph Hellwig Feb. 2, 2022, 2:57 p.m. UTC | #6
On Thu, Jan 27, 2022 at 02:32:58PM -0800, Andrew Morton wrote:
> On Wed, 26 Jan 2022 21:09:39 -0600 Alex Sierra <alex.sierra@amd.com> wrote:
> 
> > This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
> > owned by a device that can be mapped into CPU page tables like
> > MEMORY_DEVICE_GENERIC and can also be migrated like
> > MEMORY_DEVICE_PRIVATE.
> 
> Some more reviewer input appears to be desirable here.
> 
> I was going to tentatively add it to -mm and -next, but problems. 
> 5.17-rc1's mm/migrate.c:migrate_vma_check_page() is rather different
> from the tree you patched.  Please redo, refresh and resend?

I really hate adding more types with the weird one off page refcount.
We need to clean that mess up first.
Jason Gunthorpe Feb. 2, 2022, 3:42 p.m. UTC | #7
On Wed, Feb 02, 2022 at 03:57:50PM +0100, Christoph Hellwig wrote:
> On Thu, Jan 27, 2022 at 02:32:58PM -0800, Andrew Morton wrote:
> > On Wed, 26 Jan 2022 21:09:39 -0600 Alex Sierra <alex.sierra@amd.com> wrote:
> > 
> > > This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
> > > owned by a device that can be mapped into CPU page tables like
> > > MEMORY_DEVICE_GENERIC and can also be migrated like
> > > MEMORY_DEVICE_PRIVATE.
> > 
> > Some more reviewer input appears to be desirable here.
> > 
> > I was going to tentatively add it to -mm and -next, but problems. 
> > 5.17-rc1's mm/migrate.c:migrate_vma_check_page() is rather different
> > from the tree you patched.  Please redo, refresh and resend?
> 
> I really hate adding more types with the weird one off page refcount.
> We need to clean that mess up first.

Is there anyone who could give an outline of what is needed to make
fsdax use compound pages/folios for its PMD stuff?

I already suggested removing that as a way forward, and was shot down,
but nobody is standing up to maintain this code and fix it :(

We got devdax and the DRM stuff fixed now, so FSDAX is the next
blocker on this work.

The people who want this to advance have no idea about FSs or what to
do, unfortunately.

Jason