Message ID | cover.24b48fced909fe1414e83b58aa468d4393dd06de.1742099301.git-series.apopple@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Allow file-backed or shared device private pages | expand |
On Sun, Mar 16, 2025 at 03:29:23PM +1100, Alistair Popple wrote: > This series lifts that restriction by allowing ZONE_DEVICE private pages to > exist in the pagecache. You'd better provide a really good argument for why we'd even want to do that. So far this cover letter fails to do that.
On Sun, Mar 16, 2025 at 11:04:07PM -0700, Christoph Hellwig wrote: > On Sun, Mar 16, 2025 at 03:29:23PM +1100, Alistair Popple wrote: > > This series lifts that restriction by allowing ZONE_DEVICE private pages to > > exist in the pagecache. > > You'd better provide a really good argument for why we'd even want > to do that. So far this cover letter fails to do that. Alistair and I discussed this during his session at LSFMM today. Here's what I think we agreed to. The use case is a file containing a potentially very large data set. Some phases of processing that data set are best done on the GPU, other phases on the CPU. We agreed that shared writable mmap was not actually needed (it might need to be supported for correctness, but it's not a performance requirement). So, there's no need to put DEVICE_PRIVATE pages in the page cache. Instead the GPU will take a copy of the page(s). We agreed that there will have to be some indication (probably a folio flag?) that the GPU has or may have a copy of (some of) the folio so that it can be invalidated if the page is removed due to truncation / eviction. Alistair, let me know if that's not what you think we agreed to ;-)
On Wed, Mar 26, 2025 at 02:14:59AM +0000, Matthew Wilcox wrote: > On Sun, Mar 16, 2025 at 11:04:07PM -0700, Christoph Hellwig wrote: > > On Sun, Mar 16, 2025 at 03:29:23PM +1100, Alistair Popple wrote: > > > This series lifts that restriction by allowing ZONE_DEVICE private pages to > > > exist in the pagecache. > > > > You'd better provide a really good argument for why we'd even want > > to do that. So far this cover letter fails to do that. > > Alistair and I discussed this during his session at LSFMM today. > Here's what I think we agreed to. Thanks for writing up this summary. > > The use case is a file containing a potentially very large data set. > Some phases of processing that data set are best done on the GPU, other > phases on the CPU. We agreed that shared writable mmap was not actually > needed (it might need to be supported for correctness, but it's not a > performance requirement). Right. I agree we don't currently have a good usecase for writeback so the next revision will definitely only support read-only access. > So, there's no need to put DEVICE_PRIVATE pages in the page cache. > Instead the GPU will take a copy of the page(s). We agreed that there > will have to be some indication (probably a folio flag?) that the GPU has > or may have a copy of (some of) the folio so that it can be invalidated > if the page is removed due to truncation / eviction. > > Alistair, let me know if that's not what you think we agreed to ;-) That all looks about right. I think the flag/indication is a good idea and is probably the best solution, but I will need to write the code to truely convince myself of that :-)
On Thu, Mar 27, 2025 at 07:49:47AM -0700, Alistair Popple wrote: > On Wed, Mar 26, 2025 at 02:14:59AM +0000, Matthew Wilcox wrote: > > So, there's no need to put DEVICE_PRIVATE pages in the page cache. > > Instead the GPU will take a copy of the page(s). We agreed that there > > will have to be some indication (probably a folio flag?) that the GPU has > > or may have a copy of (some of) the folio so that it can be invalidated > > if the page is removed due to truncation / eviction. > > > > Alistair, let me know if that's not what you think we agreed to ;-) > > That all looks about right. I think the flag/indication is a good idea and is > probably the best solution, but I will need to write the code to truely convince > myself of that :-) It might end up making more sense to make it a per-VMA flag or a per-inode flag, but that's probably something you're in a better position to determine than I am.
To simplify the initial implementation device private pages were restricted to only being used for private anonymous. This avoided having to deal with issues related to shared and/or file-backed pagesi early on. This series lifts that restriction by allowing ZONE_DEVICE private pages to exist in the pagecache. As the CPU cannot directly access these pages special care needs to be taken when looking them up in the page-cache. This series solves the problem by always migrating such pages back from device memory when looking them up in the pagecache. This is similar to how device private pages work for anonymous memory, where a CPU fault on the device memory will always trigger a migration back to CPU system memory. Initially this series only allows for read-only migration - this is because the call to migrate pages back will always reload the data from backing storage. It then introduces a callback that drivers may implement to actually copy any modified data back as required. Drivers are expected to call set_page_dirty() when copying data back to ensure it hits the backing store. This series is an early draft implementation - in particular error handling is not dealt with and I'm not sure that the management of PTE write bits is entirely correct. Much more testing of all the various filesystem corner cases is also required. The aim of this series is to get early feedback on the overall concept of putting device private pages in the pagecache before fleshing out the implementation further. Signed-off-by: Alistair Popple <apopple@nvidia.com> Alistair Popple (6): mm/migrate_device.c: Don't read dirty bit of non-present PTEs mm/migrate: Support file-backed pages with migrate_vma mm: Allow device private pages to exist in page cache mm: Implement writeback for share device private pages selftests/hmm: Add file-backed migration tests nouveau: Add SVM support for migrating file-backed pages to the GPU drivers/gpu/drm/nouveau/nouveau_dmem.c | 24 ++- include/linux/memremap.h | 2 +- include/linux/migrate.h | 6 +- lib/test_hmm.c | 27 ++- mm/filemap.c | 41 ++++- mm/memory.c | 9 +- mm/memremap.c | 1 +- mm/migrate.c | 42 ++-- mm/migrate_device.c | 114 +++++++++++- mm/rmap.c | 2 +- tools/testing/selftests/mm/hmm-tests.c | 252 +++++++++++++++++++++++++- 11 files changed, 489 insertions(+), 31 deletions(-) base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319