Message ID | 20210813063150.2938-6-alex.sierra@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Support DEVICE_GENERIC memory in migrate_vma_* | expand |
On Fri, Aug 13, 2021 at 01:31:42AM -0500, Alex Sierra wrote: > migrate.vma = vma; > migrate.start = start; > migrate.end = end; > - migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; > migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev); > > + if (adev->gmc.xgmi.connected_to_cpu) > + migrate.flags = MIGRATE_VMA_SELECT_SYSTEM; > + else > + migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; It's been a while since I touched this migrate code, but doesn't this mean that if the range already contains system memory the migration now won't do anything? for the connected_to_cpu case?
On 8/15/2021 10:38 AM, Christoph Hellwig wrote: > On Fri, Aug 13, 2021 at 01:31:42AM -0500, Alex Sierra wrote: >> migrate.vma = vma; >> migrate.start = start; >> migrate.end = end; >> - migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; >> migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev); >> >> + if (adev->gmc.xgmi.connected_to_cpu) >> + migrate.flags = MIGRATE_VMA_SELECT_SYSTEM; >> + else >> + migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; > It's been a while since I touched this migrate code, but doesn't this > mean that if the range already contains system memory the migration > now won't do anything? for the connected_to_cpu case? For above’s condition equal to connected_to_cpu , we’re explicitly migrating from device memory to system memory with device generic type. In this type, device PTEs are present in CPU page table. During migrate_vma_collect_pmd walk op at migrate_vma_setup call, there’s a condition for present pte that require migrate->flags be set for MIGRATE_VMA_SELECT_SYSTEM. Otherwise, the migration for this entry will be ignored. Regards, Alex S.
Regards, Oak On 2021-08-16, 3:53 PM, "amd-gfx on behalf of Sierra Guiza, Alejandro (Alex)" <amd-gfx-bounces@lists.freedesktop.org on behalf of alex.sierra@amd.com> wrote: On 8/15/2021 10:38 AM, Christoph Hellwig wrote: > On Fri, Aug 13, 2021 at 01:31:42AM -0500, Alex Sierra wrote: >> migrate.vma = vma; >> migrate.start = start; >> migrate.end = end; >> - migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; >> migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev); >> >> + if (adev->gmc.xgmi.connected_to_cpu) >> + migrate.flags = MIGRATE_VMA_SELECT_SYSTEM; >> + else >> + migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; > It's been a while since I touched this migrate code, but doesn't this > mean that if the range already contains system memory the migration > now won't do anything? for the connected_to_cpu case? For above’s condition equal to connected_to_cpu , we’re explicitly migrating from device memory to system memory with device generic type. For MEMORY_DEVICE_GENERIC memory type, why do we need to explicitly migrate it from device memory to normal system memory? I thought the design was, for this type of memory, CPU can access it in place without migration(just like CPU access normal system memory), so there is no need to migrate such type of memory to normal system memory... With this patch, the migration behavior will be: when memory is accessed by CPU, it will be migrated to normal system memory; when memory is accessed by GPU, it will be migrated to device vram. This is basically the same behavior as when vram is treated as DEVICE_PRIVATE. I thought the whole goal of introducing DEVICE_GENERIC is to avoid such back and forth migration b/t device memory and normal system memory. But maybe I am missing something here.... Regards, Oak In this type, device PTEs are present in CPU page table. During migrate_vma_collect_pmd walk op at migrate_vma_setup call, there’s a condition for present pte that require migrate->flags be set for MIGRATE_VMA_SELECT_SYSTEM. Otherwise, the migration for this entry will be ignored. Regards, Alex S.
Am 2021-08-16 um 6:06 p.m. schrieb Zeng, Oak: > Regards, > Oak > > > > On 2021-08-16, 3:53 PM, "amd-gfx on behalf of Sierra Guiza, Alejandro (Alex)" <amd-gfx-bounces@lists.freedesktop.org on behalf of alex.sierra@amd.com> wrote: > > > On 8/15/2021 10:38 AM, Christoph Hellwig wrote: > > On Fri, Aug 13, 2021 at 01:31:42AM -0500, Alex Sierra wrote: > >> migrate.vma = vma; > >> migrate.start = start; > >> migrate.end = end; > >> - migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; > >> migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev); > >> > >> + if (adev->gmc.xgmi.connected_to_cpu) > >> + migrate.flags = MIGRATE_VMA_SELECT_SYSTEM; > >> + else > >> + migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; > > It's been a while since I touched this migrate code, but doesn't this > > mean that if the range already contains system memory the migration > > now won't do anything? for the connected_to_cpu case? > > For above’s condition equal to connected_to_cpu , we’re explicitly > migrating from > device memory to system memory with device generic type. > > For MEMORY_DEVICE_GENERIC memory type, why do we need to explicitly migrate it from device memory to normal system memory? I thought the design was, for this type of memory, CPU can access it in place without migration(just like CPU access normal system memory), so there is no need to migrate such type of memory to normal system memory... > > With this patch, the migration behavior will be: when memory is accessed by CPU, it will be migrated to normal system memory; when memory is accessed by GPU, it will be migrated to device vram. This is basically the same behavior as when vram is treated as DEVICE_PRIVATE. > > I thought the whole goal of introducing DEVICE_GENERIC is to avoid such back and forth migration b/t device memory and normal system memory. But maybe I am missing something here.... Hi Oak, By using MEMORY_DEVICE_GENERIC we can avoid CPU page faults triggering migration back to system memory on every CPU access on the Frontier system architecture, because such pages can be mapped in the CPU page table. You're right that this is the reason for the whole patch series. But we still need the ability to migrate from MEMORY_DEVICE_GENERIC to system memory for reasons other than CPU page faults. Applications can request migrations explicitly (hipMemPrefetchAsync). Or we can be forced to migrate data due to memory pressure from other allocations (evictions in the TTM memory allocator). Regards, Felix > > Regards, > Oak > > In this type, > device PTEs are > present in CPU page table. > > During migrate_vma_collect_pmd walk op at migrate_vma_setup call, > there’s a condition > for present pte that require migrate->flags be set for > MIGRATE_VMA_SELECT_SYSTEM. > Otherwise, the migration for this entry will be ignored. > > Regards, > Alex S. > >
On Mon, Aug 16, 2021 at 02:53:18PM -0500, Sierra Guiza, Alejandro (Alex) wrote: > For above’s condition equal to connected_to_cpu , we’re explicitly > migrating from > device memory to system memory with device generic type. In this type, > device PTEs are > present in CPU page table. > > During migrate_vma_collect_pmd walk op at migrate_vma_setup call, there’s > a condition > for present pte that require migrate->flags be set for > MIGRATE_VMA_SELECT_SYSTEM. > Otherwise, the migration for this entry will be ignored. I think we might need a new SELECT flag here for IOMEM.
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 24a8b6d4f947..e5b10de83a5f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -616,9 +616,12 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange, migrate.vma = vma; migrate.start = start; migrate.end = end; - migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev); + if (adev->gmc.xgmi.connected_to_cpu) + migrate.flags = MIGRATE_VMA_SELECT_SYSTEM; + else + migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE; size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t); size *= npages; buf = kvmalloc(size, GFP_KERNEL | __GFP_ZERO);