Message ID | a443974e64917824e078485d4e755ef04c89d73f.1712796818.git-series.apopple@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fs/dax: Fix FS DAX page reference counts | expand |
On Thu, Apr 11, 2024 at 10:57:24AM +1000, Alistair Popple wrote: > The reference counts for ZONE_DEVICE private pages should be > initialised by the driver when the page is actually allocated by the > driver allocator, not when they are first created. This is currently > the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages > but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up. > > Signed-off-by: Alistair Popple <apopple@nvidia.com> > --- > drivers/pci/p2pdma.c | 2 ++ > mm/memremap.c | 8 ++++---- > mm/mm_init.c | 4 +++- > 3 files changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index fa7370f..ab7ef18 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj, > goto out; > } > > + get_page(virt_to_page(kaddr)); > + Should this be set_page_count(page, 1) If the refcount is already known to be 0 ? > @@ -508,15 +508,15 @@ void free_zone_device_page(struct page *page) > page->mapping = NULL; > page->pgmap->ops->page_free(page); > > - if (page->pgmap->type != MEMORY_DEVICE_PRIVATE && > - page->pgmap->type != MEMORY_DEVICE_COHERENT) > + if (page->pgmap->type == MEMORY_DEVICE_PRIVATE || > + page->pgmap->type == MEMORY_DEVICE_COHERENT) > + put_dev_pagemap(page->pgmap); Not related, but we should really be getting rid of this devmap refcount traffic too, IMHO.. If an implementation wants this then it should hook the page free/alloc callbacks and do this, not put it in the core code. Jason
Jason Gunthorpe <jgg@nvidia.com> writes: > On Thu, Apr 11, 2024 at 10:57:24AM +1000, Alistair Popple wrote: >> The reference counts for ZONE_DEVICE private pages should be >> initialised by the driver when the page is actually allocated by the >> driver allocator, not when they are first created. This is currently >> the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages >> but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up. >> >> Signed-off-by: Alistair Popple <apopple@nvidia.com> >> --- >> drivers/pci/p2pdma.c | 2 ++ >> mm/memremap.c | 8 ++++---- >> mm/mm_init.c | 4 +++- >> 3 files changed, 9 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c >> index fa7370f..ab7ef18 100644 >> --- a/drivers/pci/p2pdma.c >> +++ b/drivers/pci/p2pdma.c >> @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj, >> goto out; >> } >> >> + get_page(virt_to_page(kaddr)); >> + > > Should this be > > set_page_count(page, 1) > > If the refcount is already known to be 0 ? Yeah, that would avoid the obvious warning that calling get_page there will generate. My test setup for p2pdma is pretty clunky, so haven't run it a while. Not sure if there are any good qemu based tests for this. >> @@ -508,15 +508,15 @@ void free_zone_device_page(struct page *page) >> page->mapping = NULL; >> page->pgmap->ops->page_free(page); >> >> - if (page->pgmap->type != MEMORY_DEVICE_PRIVATE && >> - page->pgmap->type != MEMORY_DEVICE_COHERENT) >> + if (page->pgmap->type == MEMORY_DEVICE_PRIVATE || >> + page->pgmap->type == MEMORY_DEVICE_COHERENT) >> + put_dev_pagemap(page->pgmap); > > Not related, but we should really be getting rid of this devmap > refcount traffic too, IMHO.. Absolutely. I think there's a bunch of clean ups for this in mm/gup.c that could be done as well. I plan on doing that as a follow up to this series. We pretty much don't use that for device private/coherent pages anyway. > If an implementation wants this then it should hook the page > free/alloc callbacks and do this, not put it in the core code. > > Jason
Alistair Popple wrote: > The reference counts for ZONE_DEVICE private pages should be > initialised by the driver when the page is actually allocated by the > driver allocator, not when they are first created. This is currently > the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages > but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up. I know you said to hold off on looking at this series until you fixed up the kernel assertions, but I would not expect to remove logic before the replacement is available. So this seems to be in the wrong place in the series, or am I missing something?
Hi Alistair, I was working on testing your patch set, however I'm dealing with some hardware issues at the moment so I haven't fully tested everything yet. I managed to find one issue though: On 2024-04-10 18:57, Alistair Popple wrote: > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index fa7370f..ab7ef18 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj, > goto out; > } > > + get_page(virt_to_page(kaddr)); > + kaddr may represent more than one page, so this will fail to map anything if the mapping size is greater than 4KB. There is a loop just below this that calls vm_insert_page(). Moving a set_page_count() call just before vm_insert_page() fixes the issue. Thanks! Logan
Logan Gunthorpe <logang@deltatee.com> writes: > Hi Alistair, > > I was working on testing your patch set, however I'm dealing with some > hardware issues at the moment so I haven't fully tested everything yet. Oh great. I haven't extensively tested the p2pdma setup yet because I rely on a pretty janky qemu setup to do so. > I managed to find one issue though: > > On 2024-04-10 18:57, Alistair Popple wrote: >> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c >> index fa7370f..ab7ef18 100644 >> --- a/drivers/pci/p2pdma.c >> +++ b/drivers/pci/p2pdma.c >> @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj, >> goto out; >> } >> >> + get_page(virt_to_page(kaddr)); >> + > > kaddr may represent more than one page, so this will fail to map > anything if the mapping size is greater than 4KB. There is a loop just > below this that calls vm_insert_page(). Moving a set_page_count() call > just before vm_insert_page() fixes the issue. Good point. I'm in the middle of respinning this (I was hoping to get it done before LSF/MM but I suspect my travel will preempt it) so will add this fix. Note that there are problems with both fs-dax and device-dax in this version which the next one fixes, but any p2p dma testing would be appreciated. Thanks! > Thanks! > > Logan
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index fa7370f..ab7ef18 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj, goto out; } + get_page(virt_to_page(kaddr)); + /* * vm_insert_page() can sleep, so a reference is taken to mapping * such that rcu_read_unlock() can be done before inserting the diff --git a/mm/memremap.c b/mm/memremap.c index bee8556..99d26ff 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -508,15 +508,15 @@ void free_zone_device_page(struct page *page) page->mapping = NULL; page->pgmap->ops->page_free(page); - if (page->pgmap->type != MEMORY_DEVICE_PRIVATE && - page->pgmap->type != MEMORY_DEVICE_COHERENT) + if (page->pgmap->type == MEMORY_DEVICE_PRIVATE || + page->pgmap->type == MEMORY_DEVICE_COHERENT) + put_dev_pagemap(page->pgmap); + else if (page->pgmap->type != MEMORY_DEVICE_PCI_P2PDMA) /* * Reset the page count to 1 to prepare for handing out the page * again. */ set_page_count(page, 1); - else - put_dev_pagemap(page->pgmap); } void zone_device_page_init(struct page *page) diff --git a/mm/mm_init.c b/mm/mm_init.c index 50f2f34..da45abd 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -6,6 +6,7 @@ * Author Mel Gorman <mel@csn.ul.ie> * */ +#include "linux/memremap.h" #include <linux/kernel.h> #include <linux/init.h> #include <linux/kobject.h> @@ -1006,7 +1007,8 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * which will set the page count to 1 when allocating the page. */ if (pgmap->type == MEMORY_DEVICE_PRIVATE || - pgmap->type == MEMORY_DEVICE_COHERENT) + pgmap->type == MEMORY_DEVICE_COHERENT || + pgmap->type == MEMORY_DEVICE_PCI_P2PDMA) set_page_count(page, 0); }
The reference counts for ZONE_DEVICE private pages should be initialised by the driver when the page is actually allocated by the driver allocator, not when they are first created. This is currently the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up. Signed-off-by: Alistair Popple <apopple@nvidia.com> --- drivers/pci/p2pdma.c | 2 ++ mm/memremap.c | 8 ++++---- mm/mm_init.c | 4 +++- 3 files changed, 9 insertions(+), 5 deletions(-)