diff mbox series

[RFC,03/10] pci/p2pdma: Don't initialise page refcount to one

Message ID a443974e64917824e078485d4e755ef04c89d73f.1712796818.git-series.apopple@nvidia.com (mailing list archive)
State New, archived
Headers show
Series fs/dax: Fix FS DAX page reference counts | expand

Commit Message

Alistair Popple April 11, 2024, 12:57 a.m. UTC
The reference counts for ZONE_DEVICE private pages should be
initialised by the driver when the page is actually allocated by the
driver allocator, not when they are first created. This is currently
the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages
but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up.

Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
 drivers/pci/p2pdma.c | 2 ++
 mm/memremap.c        | 8 ++++----
 mm/mm_init.c         | 4 +++-
 3 files changed, 9 insertions(+), 5 deletions(-)

Comments

Jason Gunthorpe April 11, 2024, 12:29 p.m. UTC | #1
On Thu, Apr 11, 2024 at 10:57:24AM +1000, Alistair Popple wrote:
> The reference counts for ZONE_DEVICE private pages should be
> initialised by the driver when the page is actually allocated by the
> driver allocator, not when they are first created. This is currently
> the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages
> but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up.
> 
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> ---
>  drivers/pci/p2pdma.c | 2 ++
>  mm/memremap.c        | 8 ++++----
>  mm/mm_init.c         | 4 +++-
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index fa7370f..ab7ef18 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>  		goto out;
>  	}
>  
> +	get_page(virt_to_page(kaddr));
> +

Should this be 

 set_page_count(page, 1)

If the refcount is already known to be 0 ?

> @@ -508,15 +508,15 @@ void free_zone_device_page(struct page *page)
>  	page->mapping = NULL;
>  	page->pgmap->ops->page_free(page);
>  
> -	if (page->pgmap->type != MEMORY_DEVICE_PRIVATE &&
> -	    page->pgmap->type != MEMORY_DEVICE_COHERENT)
> +	if (page->pgmap->type == MEMORY_DEVICE_PRIVATE ||
> +	    page->pgmap->type == MEMORY_DEVICE_COHERENT)
> +		put_dev_pagemap(page->pgmap);

Not related, but we should really be getting rid of this devmap
refcount traffic too, IMHO..

If an implementation wants this then it should hook the page
free/alloc callbacks and do this, not put it in the core code.

Jason
Alistair Popple April 12, 2024, 5:40 a.m. UTC | #2
Jason Gunthorpe <jgg@nvidia.com> writes:

> On Thu, Apr 11, 2024 at 10:57:24AM +1000, Alistair Popple wrote:
>> The reference counts for ZONE_DEVICE private pages should be
>> initialised by the driver when the page is actually allocated by the
>> driver allocator, not when they are first created. This is currently
>> the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages
>> but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up.
>> 
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> ---
>>  drivers/pci/p2pdma.c | 2 ++
>>  mm/memremap.c        | 8 ++++----
>>  mm/mm_init.c         | 4 +++-
>>  3 files changed, 9 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index fa7370f..ab7ef18 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>>  		goto out;
>>  	}
>>  
>> +	get_page(virt_to_page(kaddr));
>> +
>
> Should this be 
>
>  set_page_count(page, 1)
>
> If the refcount is already known to be 0 ?

Yeah, that would avoid the obvious warning that calling get_page there
will generate. My test setup for p2pdma is pretty clunky, so haven't run
it a while. Not sure if there are any good qemu based tests for this.

>> @@ -508,15 +508,15 @@ void free_zone_device_page(struct page *page)
>>  	page->mapping = NULL;
>>  	page->pgmap->ops->page_free(page);
>>  
>> -	if (page->pgmap->type != MEMORY_DEVICE_PRIVATE &&
>> -	    page->pgmap->type != MEMORY_DEVICE_COHERENT)
>> +	if (page->pgmap->type == MEMORY_DEVICE_PRIVATE ||
>> +	    page->pgmap->type == MEMORY_DEVICE_COHERENT)
>> +		put_dev_pagemap(page->pgmap);
>
> Not related, but we should really be getting rid of this devmap
> refcount traffic too, IMHO..

Absolutely. I think there's a bunch of clean ups for this in mm/gup.c
that could be done as well. I plan on doing that as a follow up to this
series. We pretty much don't use that for device private/coherent pages
anyway.

> If an implementation wants this then it should hook the page
> free/alloc callbacks and do this, not put it in the core code.
>
> Jason
Dan Williams April 12, 2024, 5:20 p.m. UTC | #3
Alistair Popple wrote:
> The reference counts for ZONE_DEVICE private pages should be
> initialised by the driver when the page is actually allocated by the
> driver allocator, not when they are first created. This is currently
> the case for MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT pages
> but not MEMORY_DEVICE_PCI_P2PDMA pages so fix that up.

I know you said to hold off on looking at this series until you fixed up
the kernel assertions, but I would not expect to remove logic before the
replacement is available. So this seems to be in the wrong place in the
series, or am I missing something?
Logan Gunthorpe May 9, 2024, 9:59 p.m. UTC | #4
Hi Alistair,

I was working on testing your patch set, however I'm dealing with some
hardware issues at the moment so I haven't fully tested everything yet.

I managed to find one issue though:

On 2024-04-10 18:57, Alistair Popple wrote:
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index fa7370f..ab7ef18 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>  		goto out;
>  	}
>  
> +	get_page(virt_to_page(kaddr));
> +

kaddr may represent more than one page, so this will fail to map
anything if the mapping size is greater than 4KB. There is a loop just
below this that calls vm_insert_page(). Moving a set_page_count() call
just before vm_insert_page() fixes the issue.

Thanks!

Logan
Alistair Popple May 9, 2024, 11:14 p.m. UTC | #5
Logan Gunthorpe <logang@deltatee.com> writes:

> Hi Alistair,
>
> I was working on testing your patch set, however I'm dealing with some
> hardware issues at the moment so I haven't fully tested everything yet.

Oh great. I haven't extensively tested the p2pdma setup yet because I
rely on a pretty janky qemu setup to do so.

> I managed to find one issue though:
>
> On 2024-04-10 18:57, Alistair Popple wrote:
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index fa7370f..ab7ef18 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -128,6 +128,8 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>>  		goto out;
>>  	}
>>  
>> +	get_page(virt_to_page(kaddr));
>> +
>
> kaddr may represent more than one page, so this will fail to map
> anything if the mapping size is greater than 4KB. There is a loop just
> below this that calls vm_insert_page(). Moving a set_page_count() call
> just before vm_insert_page() fixes the issue.

Good point. I'm in the middle of respinning this (I was hoping to get it
done before LSF/MM but I suspect my travel will preempt it) so will add
this fix.

Note that there are problems with both fs-dax and device-dax in this
version which the next one fixes, but any p2p dma testing would be
appreciated. Thanks!

> Thanks!
>
> Logan
diff mbox series

Patch

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index fa7370f..ab7ef18 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -128,6 +128,8 @@  static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
 		goto out;
 	}
 
+	get_page(virt_to_page(kaddr));
+
 	/*
 	 * vm_insert_page() can sleep, so a reference is taken to mapping
 	 * such that rcu_read_unlock() can be done before inserting the
diff --git a/mm/memremap.c b/mm/memremap.c
index bee8556..99d26ff 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -508,15 +508,15 @@  void free_zone_device_page(struct page *page)
 	page->mapping = NULL;
 	page->pgmap->ops->page_free(page);
 
-	if (page->pgmap->type != MEMORY_DEVICE_PRIVATE &&
-	    page->pgmap->type != MEMORY_DEVICE_COHERENT)
+	if (page->pgmap->type == MEMORY_DEVICE_PRIVATE ||
+	    page->pgmap->type == MEMORY_DEVICE_COHERENT)
+		put_dev_pagemap(page->pgmap);
+	else if (page->pgmap->type != MEMORY_DEVICE_PCI_P2PDMA)
 		/*
 		 * Reset the page count to 1 to prepare for handing out the page
 		 * again.
 		 */
 		set_page_count(page, 1);
-	else
-		put_dev_pagemap(page->pgmap);
 }
 
 void zone_device_page_init(struct page *page)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 50f2f34..da45abd 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -6,6 +6,7 @@ 
  * Author Mel Gorman <mel@csn.ul.ie>
  *
  */
+#include "linux/memremap.h"
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/kobject.h>
@@ -1006,7 +1007,8 @@  static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	 * which will set the page count to 1 when allocating the page.
 	 */
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
-	    pgmap->type == MEMORY_DEVICE_COHERENT)
+	    pgmap->type == MEMORY_DEVICE_COHERENT ||
+	    pgmap->type == MEMORY_DEVICE_PCI_P2PDMA)
 		set_page_count(page, 0);
 }