Message ID | 20230817064934.3424431-1-vivek.kasireddy@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | udmabuf: Add support for page migration out of movable zone or CMA | expand |
On Wed, Aug 16, 2023 at 11:49:31PM -0700, Vivek Kasireddy wrote: > This patch series adds support for migrating pages associated with > a udmabuf out of the movable zone or CMA to avoid breaking features > such as memory hotunplug. > > The first patch exports check_and_migrate_movable_pages() function > out of GUP so that the udmabuf driver can leverage it for page > migration that is done as part of the second patch. The last patch > adds two new udmabuf selftests to verify data coherency after > page migration. Please don't do this. If you want to do what GUP does then call GUP. udmabuf is not so special that it needs to open code its own weird version of it. Jason
Hi Jason, > > This patch series adds support for migrating pages associated with > > a udmabuf out of the movable zone or CMA to avoid breaking features > > such as memory hotunplug. > > > > The first patch exports check_and_migrate_movable_pages() function > > out of GUP so that the udmabuf driver can leverage it for page > > migration that is done as part of the second patch. The last patch > > adds two new udmabuf selftests to verify data coherency after > > page migration. > > Please don't do this. If you want to do what GUP does then call > GUP. udmabuf is not so special that it needs to open code its own > weird version of it. We can't call GUP directly as explained in the first patch of this series: "For drivers that would like to migrate pages out of the movable zone (or CMA) in order to pin them (longterm) for DMA, using check_and_migrate_movable_pages() directly provides a convenient option instead of duplicating similar checks (e.g, checking the folios for zone, hugetlb, etc) and calling migrate_pages() directly. Ideally, a driver is expected to call pin_user_pages(FOLL_LONGTERM) to migrate and pin the pages for longterm DMA but there are situations where the GUP APIs cannot be used directly for various reasons (e.g, when the VMA or start addr cannot be easily determined but the relevant pages are available)." Given the current (model and) UAPI (udmabuf_create), the userspace only shares (memfd, offset, size) values that we use to find the relevant pages and pin them (by doing get_page()). Since the goal is to also migrate these pages, I think we have the following options: - Leverage check_and_migrate_movable_pages(); but this function needs to be exported from GUP. - Iterate over all the pages (in udmabuf) to check for folio_is_longterm_pinnable() and call migrate_pages() eventually. This requires changes only to the udmabuf driver but we'd be duplicating much of the functionality provided by check_and_migrate_movable_pages(). - Call pin_user_pages_fast(FOLL_LONGTERM) from udmabuf driver. In order to do this, we have to first unpin all pages and iterate over all the VMAs of the VMM to identify the Guest RAM VMA and then use page_address_in_vma() to find the start addr of the ranges and then call GUP. Although this approach is feasible, it feels a bit convoluted. - Add a new udmabuf UAPI to have userspace share (start addr, len) values so that the udmabuf driver can directly call GUP APIs. But this means all the current users of udmabuf such as Qemu, CrosVM, etc, need to be updated to use the new UAPI. - Add a new API to the backing store/allocator to longterm-pin the page. For example, something along the lines of shmem_pin_mapping_page_longterm() for shmem as suggested by Daniel. A similar one needs to be added for hugetlbfs as well. Among these options, the first one seems very reasonable to me. Thanks, Vivek > > Jason
On Tue, Aug 22, 2023 at 05:36:56AM +0000, Kasireddy, Vivek wrote: > Hi Jason, > > > > This patch series adds support for migrating pages associated with > > > a udmabuf out of the movable zone or CMA to avoid breaking features > > > such as memory hotunplug. > > > > > > The first patch exports check_and_migrate_movable_pages() function > > > out of GUP so that the udmabuf driver can leverage it for page > > > migration that is done as part of the second patch. The last patch > > > adds two new udmabuf selftests to verify data coherency after > > > page migration. > > > > Please don't do this. If you want to do what GUP does then call > > GUP. udmabuf is not so special that it needs to open code its own > > weird version of it. > We can't call GUP directly as explained in the first patch of this series: > "For drivers that would like to migrate pages out of the movable > zone (or CMA) in order to pin them (longterm) for DMA, using > check_and_migrate_movable_pages() directly provides a convenient > option instead of duplicating similar checks (e.g, checking > the folios for zone, hugetlb, etc) and calling migrate_pages() > directly. > > Ideally, a driver is expected to call pin_user_pages(FOLL_LONGTERM) > to migrate and pin the pages for longterm DMA but there are > situations where the GUP APIs cannot be used directly for > various reasons (e.g, when the VMA or start addr cannot be > easily determined but the relevant pages are available)." > > Given the current (model and) UAPI (udmabuf_create), the userspace > only shares (memfd, offset, size) values that we use to find the > relevant pages and pin them (by doing get_page()). Since the goal > is to also migrate these pages, I think we have the following options: This seems like a bad choice of uAPI - we don't have any kernel support for pinning from a memfd. If you want this then you have to build this as generic code, not open code it into udmabuf. > - Leverage check_and_migrate_movable_pages(); but this function > needs to be exported from GUP. GUP has many behaviors, we keep adding more, these functions should not leak out of the mm core code into drivers. > - Iterate over all the pages (in udmabuf) to check for folio_is_longterm_pinnable() > and call migrate_pages() eventually. This requires changes only to > the udmabuf driver but we'd be duplicating much of the functionality > provided by check_and_migrate_movable_pages(). Definately not > - Call pin_user_pages_fast(FOLL_LONGTERM) from udmabuf driver. In > order to do this, we have to first unpin all pages and iterate over all > the VMAs of the VMM to identify the Guest RAM VMA and then use > page_address_in_vma() to find the start addr of the ranges and then > call GUP. Although this approach is feasible, it feels a bit convoluted. Userspace should have told the kernel where the memfd is mapped. > - Add a new udmabuf UAPI to have userspace share (start addr, len) values > so that the udmabuf driver can directly call GUP APIs. But this means all > the current users of udmabuf such as Qemu, CrosVM, etc, need to be > updated to use the new UAPI. There you go > - Add a new API to the backing store/allocator to longterm-pin the page. > For example, something along the lines of shmem_pin_mapping_page_longterm() > for shmem as suggested by Daniel. A similar one needs to be added for > hugetlbfs as well. This may also be reasonable. Jason
>> - Add a new API to the backing store/allocator to longterm-pin the page. >> For example, something along the lines of shmem_pin_mapping_page_longterm() >> for shmem as suggested by Daniel. A similar one needs to be added for >> hugetlbfs as well. > > This may also be reasonable. Sounds reasonable to keep the old API (that we unfortunately have) working.
Hi David, > > >> - Add a new API to the backing store/allocator to longterm-pin the page. > >> For example, something along the lines of > shmem_pin_mapping_page_longterm() > >> for shmem as suggested by Daniel. A similar one needs to be added for > >> hugetlbfs as well. > > > > This may also be reasonable. > > Sounds reasonable to keep the old API (that we unfortunately have) working. I agree; I'd like to avoid adding new APIs unless absolutely necessary. Given this, and considering the options I have mentioned earlier, what would be your recommendation for how page migration needs to be done in udmabuf driver? Thanks, Vivek > > -- > Cheers, > > David / dhildenb
On 24.08.23 08:31, Kasireddy, Vivek wrote: > Hi David, > >> >>>> - Add a new API to the backing store/allocator to longterm-pin the page. >>>> For example, something along the lines of >> shmem_pin_mapping_page_longterm() >>>> for shmem as suggested by Daniel. A similar one needs to be added for >>>> hugetlbfs as well. >>> >>> This may also be reasonable. >> >> Sounds reasonable to keep the old API (that we unfortunately have) working. > I agree; I'd like to avoid adding new APIs unless absolutely necessary. Given this, > and considering the options I have mentioned earlier, what would be your > recommendation for how page migration needs to be done in udmabuf driver? I guess using proper APIs for shmem and hugetlb. So, turning roughly what you have in patch#1 for now into common code, and only calling into that from udmabug.
On Thu, Aug 24, 2023 at 08:30:17PM +0200, David Hildenbrand wrote: > On 24.08.23 08:31, Kasireddy, Vivek wrote: > > Hi David, > > > > > > > > > > - Add a new API to the backing store/allocator to longterm-pin the page. > > > > > For example, something along the lines of > > > shmem_pin_mapping_page_longterm() > > > > > for shmem as suggested by Daniel. A similar one needs to be added for > > > > > hugetlbfs as well. > > > > > > > > This may also be reasonable. > > > > > > Sounds reasonable to keep the old API (that we unfortunately have) working. > > I agree; I'd like to avoid adding new APIs unless absolutely necessary. Given this, > > and considering the options I have mentioned earlier, what would be your > > recommendation for how page migration needs to be done in udmabuf driver? > > I guess using proper APIs for shmem and hugetlb. So, turning roughly what > you have in patch#1 for now into common code, and only calling into that > from udmabug. This is a lot of work for an obscure uapi :\ Jason
On 24.08.23 20:30, Jason Gunthorpe wrote: > On Thu, Aug 24, 2023 at 08:30:17PM +0200, David Hildenbrand wrote: >> On 24.08.23 08:31, Kasireddy, Vivek wrote: >>> Hi David, >>> >>>> >>>>>> - Add a new API to the backing store/allocator to longterm-pin the page. >>>>>> For example, something along the lines of >>>> shmem_pin_mapping_page_longterm() >>>>>> for shmem as suggested by Daniel. A similar one needs to be added for >>>>>> hugetlbfs as well. >>>>> >>>>> This may also be reasonable. >>>> >>>> Sounds reasonable to keep the old API (that we unfortunately have) working. >>> I agree; I'd like to avoid adding new APIs unless absolutely necessary. Given this, >>> and considering the options I have mentioned earlier, what would be your >>> recommendation for how page migration needs to be done in udmabuf driver? >> >> I guess using proper APIs for shmem and hugetlb. So, turning roughly what >> you have in patch#1 for now into common code, and only calling into that >> from udmabug. > > This is a lot of work for an obscure uapi :\ Well, what can we otherwise to to *not* break existing users? I'm not happy about this either. Of course, we can come up with a new uapi, but we have to handle the old uapi somehow. Sure, we can simply always fail when we detect ZONE_MOVABLE or MIGRATE_CMA. Maybe that keeps at least some use cases working.
On Thu, Aug 24, 2023 at 08:33:09PM +0200, David Hildenbrand wrote: > Sure, we can simply always fail when we detect ZONE_MOVABLE or MIGRATE_CMA. > Maybe that keeps at least some use cases working. That seems fairly reasonable Jason
Hi Jason, David, > > > Sure, we can simply always fail when we detect ZONE_MOVABLE or > MIGRATE_CMA. > > Maybe that keeps at least some use cases working. > > That seems fairly reasonable AFAICS, failing udmabuf_create() if we detect one or more pages are in ZONE_MOVABLE or MIGRATE_CMA would not be a recoverable failure -- as it would result in the failure of Guest GUI (or compositor). I think it makes sense to have a generic version of And, since check_and_migrate_movable_pages() is GUP-specific, would it be ok to create a generic version of that (in mm/migrate.c) which can be used by udmabuf and/or other drivers in the future? Thanks, Vivek > > Jason
Hi Jason, David, > > > Sure, we can simply always fail when we detect ZONE_MOVABLE or > > MIGRATE_CMA. > > > Maybe that keeps at least some use cases working. > > > > That seems fairly reasonable > AFAICS, failing udmabuf_create() if we detect one or more pages are in > ZONE_MOVABLE or MIGRATE_CMA would not be a recoverable failure -- > as it would result in the failure of Guest GUI (or compositor). > > I think it makes sense to have a generic version of > And, since check_and_migrate_movable_pages() is GUP-specific, would > it be ok to create a generic version of that (in mm/migrate.c) which can be > used by udmabuf and/or other drivers in the future? Sorry, I accidentally sent this earlier email before finishing it. What I meant to say is since the same situation (inadvertently pinning pages in movable) may probably arise in the future with another driver, I think it makes sense to have a generic (non-GUP) version of check_and_migrate_movable_pages() available in migration.h that drivers can use to ensure that they don't break memory hotunplug accidentally. Thanks, Vivek > > Thanks, > Vivek > > > > > Jason >
On Sun, Aug 27, 2023 at 07:05:59PM +0000, Kasireddy, Vivek wrote: > Hi Jason, David, > > > > > Sure, we can simply always fail when we detect ZONE_MOVABLE or > > > MIGRATE_CMA. > > > > Maybe that keeps at least some use cases working. > > > > > > That seems fairly reasonable > > AFAICS, failing udmabuf_create() if we detect one or more pages are in > > ZONE_MOVABLE or MIGRATE_CMA would not be a recoverable failure -- > > as it would result in the failure of Guest GUI (or compositor). Yes, you can't use whatever this driver is while enabling MOVABLE or CMA in your kernel boot. > > I think it makes sense to have a generic version of > > And, since check_and_migrate_movable_pages() is GUP-specific, would > > it be ok to create a generic version of that (in mm/migrate.c) which can be > > used by udmabuf and/or other drivers in the future? > Sorry, I accidentally sent this earlier email before finishing it. > What I meant to say is since the same situation (inadvertently pinning pages > in movable) may probably arise in the future with another driver, Why? It was a big mistake to design a uAPI around taking in a FD and extracting pages from it, we don't have kernel infrastructure for that, and code liek that does not belong outside the MM at all. > I think it makes sense to have a generic (non-GUP) version of > check_and_migrate_movable_pages() available in migration.h that > drivers can use to ensure that they don't break memory hotunplug > accidentally. Definately not. Either use the VMA and pin_user_pages(), or implement pin_user_pages_fd() in core code. Do not open code something wonky in drivers. Jason
>> I think it makes sense to have a generic (non-GUP) version of >> check_and_migrate_movable_pages() available in migration.h that >> drivers can use to ensure that they don't break memory hotunplug >> accidentally. > > Definately not. > > Either use the VMA and pin_user_pages(), or implement > pin_user_pages_fd() in core code. > > Do not open code something wonky in drivers. Agreed. pin_user_pages_fd() might become relevant in the context of vfio/mdev + KVM gmem -- don't mmap guest memory but instead provide it via a special memfd to the kernel. So there might be value in having such a core infrastructure.
Hi David, > >> I think it makes sense to have a generic (non-GUP) version of > >> check_and_migrate_movable_pages() available in migration.h that > >> drivers can use to ensure that they don't break memory hotunplug > >> accidentally. > > > > Definately not. > > > > Either use the VMA and pin_user_pages(), or implement > > pin_user_pages_fd() in core code. > > > > Do not open code something wonky in drivers. > > Agreed. pin_user_pages_fd() might become relevant in the context of > vfio/mdev + KVM gmem -- don't mmap guest memory but instead provide it > via a special memfd to the kernel. > > So there might be value in having such a core infrastructure. Ok, I'll work on adding pin_user_pages_fd() soon. Thanks, Vivek > > -- > Cheers, > > David / dhildenb