diff mbox series

[RFC,01/16] mm: add pagechain container for storing multiple pages.

Message ID 20200902180628.4052244-2-zi.yan@sent.com (mailing list archive)
State New, archived
Headers show
Series 1GB THP support on x86_64 | expand

Commit Message

Zi Yan Sept. 2, 2020, 6:06 p.m. UTC
From: Zi Yan <ziy@nvidia.com>

When depositing page table pages for 1GB THPs, we need 512 PTE pages +
1 PMD page. Instead of counting and depositing 513 pages, we can use the
PMD page as a leader page and chain the rest 512 PTE pages with ->lru.
This, however, prevents us depositing PMD pages with ->lru, which is
currently used by depositing PTE pages for 2MB THPs. So add a new
pagechain container for PMD pages.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/pagechain.h | 73 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 include/linux/pagechain.h

Comments

Randy Dunlap Sept. 2, 2020, 8:29 p.m. UTC | #1
On 9/2/20 11:06 AM, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> When depositing page table pages for 1GB THPs, we need 512 PTE pages +
> 1 PMD page. Instead of counting and depositing 513 pages, we can use the
> PMD page as a leader page and chain the rest 512 PTE pages with ->lru.
> This, however, prevents us depositing PMD pages with ->lru, which is
> currently used by depositing PTE pages for 2MB THPs. So add a new
> pagechain container for PMD pages.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  include/linux/pagechain.h | 73 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 73 insertions(+)
>  create mode 100644 include/linux/pagechain.h
> 
> diff --git a/include/linux/pagechain.h b/include/linux/pagechain.h
> new file mode 100644
> index 000000000000..be536142b413
> --- /dev/null
> +++ b/include/linux/pagechain.h
> @@ -0,0 +1,73 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * include/linux/pagechain.h
> + *
> + * In many places it is efficient to batch an operation up against multiple
> + * pages. A pagechain is a multipage container which is used for that.
> + */
> +
> +#ifndef _LINUX_PAGECHAIN_H
> +#define _LINUX_PAGECHAIN_H
> +
> +#include <linux/slab.h>
> +
> +/* 14 pointers + two long's align the pagechain structure to a power of two */
> +#define PAGECHAIN_SIZE	13

OK, I'll bite.  I see neither 14 pointers nor 2 longs below.
Is the comment out of date or am I just confuzed?

Update: struct list_head is 2 pointers, so I see 15 pointers & one unsigned int.
Where are the 2 longs?

> +
> +struct page;
> +
> +struct pagechain {
> +	struct list_head list;
> +	unsigned int nr;
> +	struct page *pages[PAGECHAIN_SIZE];
> +};

thanks.
Zi Yan Sept. 2, 2020, 8:48 p.m. UTC | #2
On 2 Sep 2020, at 16:29, Randy Dunlap wrote:

> On 9/2/20 11:06 AM, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> When depositing page table pages for 1GB THPs, we need 512 PTE pages +
>> 1 PMD page. Instead of counting and depositing 513 pages, we can use the
>> PMD page as a leader page and chain the rest 512 PTE pages with ->lru.
>> This, however, prevents us depositing PMD pages with ->lru, which is
>> currently used by depositing PTE pages for 2MB THPs. So add a new
>> pagechain container for PMD pages.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  include/linux/pagechain.h | 73 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 73 insertions(+)
>>  create mode 100644 include/linux/pagechain.h
>>
>> diff --git a/include/linux/pagechain.h b/include/linux/pagechain.h
>> new file mode 100644
>> index 000000000000..be536142b413
>> --- /dev/null
>> +++ b/include/linux/pagechain.h
>> @@ -0,0 +1,73 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * include/linux/pagechain.h
>> + *
>> + * In many places it is efficient to batch an operation up against multiple
>> + * pages. A pagechain is a multipage container which is used for that.
>> + */
>> +
>> +#ifndef _LINUX_PAGECHAIN_H
>> +#define _LINUX_PAGECHAIN_H
>> +
>> +#include <linux/slab.h>
>> +
>> +/* 14 pointers + two long's align the pagechain structure to a power of two */
>> +#define PAGECHAIN_SIZE	13
>
> OK, I'll bite.  I see neither 14 pointers nor 2 longs below.
> Is the comment out of date or am I just confuzed?
>
> Update: struct list_head is 2 pointers, so I see 15 pointers & one unsigned int.
> Where are the 2 longs?

My bad. Will change this to:

/* 15 pointers + one long align the pagechain structure to a power of two */
#define PAGECHAIN_SIZE  13

struct page;

struct pagechain {
    struct list_head list;
    unsigned long nr;
    struct page *pages[PAGECHAIN_SIZE];
};


Thanks for checking.

—
Best Regards,
Yan Zi
Matthew Wilcox (Oracle) Sept. 3, 2020, 3:15 a.m. UTC | #3
On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote:
> When depositing page table pages for 1GB THPs, we need 512 PTE pages +
> 1 PMD page. Instead of counting and depositing 513 pages, we can use the
> PMD page as a leader page and chain the rest 512 PTE pages with ->lru.
> This, however, prevents us depositing PMD pages with ->lru, which is
> currently used by depositing PTE pages for 2MB THPs. So add a new
> pagechain container for PMD pages.

But you've allocated a page for the PMD table.  Why can't you use that
4kB to store pointers to the 512 PTE tables?

You could also use an existing data structure like the XArray (although
not a pagevec).
Kirill A. Shutemov Sept. 7, 2020, 12:22 p.m. UTC | #4
On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> When depositing page table pages for 1GB THPs, we need 512 PTE pages +
> 1 PMD page. Instead of counting and depositing 513 pages, we can use the
> PMD page as a leader page and chain the rest 512 PTE pages with ->lru.
> This, however, prevents us depositing PMD pages with ->lru, which is
> currently used by depositing PTE pages for 2MB THPs. So add a new
> pagechain container for PMD pages.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Just deposit it to a linked list in the mm_struct as we do for PMD if
split ptl disabled.
Zi Yan Sept. 7, 2020, 3:11 p.m. UTC | #5
On 7 Sep 2020, at 8:22, Kirill A. Shutemov wrote:

> On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> When depositing page table pages for 1GB THPs, we need 512 PTE pages +
>> 1 PMD page. Instead of counting and depositing 513 pages, we can use the
>> PMD page as a leader page and chain the rest 512 PTE pages with ->lru.
>> This, however, prevents us depositing PMD pages with ->lru, which is
>> currently used by depositing PTE pages for 2MB THPs. So add a new
>> pagechain container for PMD pages.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> Just deposit it to a linked list in the mm_struct as we do for PMD if
> split ptl disabled.
>

Thank you for checking the patches. Since we don’t have PUD split lock
yet, I store the PMD page table pages in a newly added linked list head
in mm_struct like you suggested above.

I was too vague about my pagechain design for depositing page table pages
for PUD THPs. Sorry about the confusion. Let me clarify why
I am doing this pagechain here too. I am sure there would be
some other designs and I am happy to change my code.

In my design, I did not store all page table pages in a single list.
I first deposit 512 PTE pages in one PMD page table page’s pmd_huge_pte
using pgtable_trans_huge_depsit(), then deposit the PMD page to
a newly added linked list in mm_struct. Since pmd_huge_pte shares space
with half of lru in struct page, we cannot use lru to link all PMD
pages together. As a result, I added pagechain. Also in this way,
we can avoid these things:

1. when we withdraw the PMD page during PUD THP split, we don’t need
to withdraw 513 page, set up one PMD page, then, deposit 512 PTE pages
in that PMD page.

2. we don’t mix PMD page table pages and PTE page table pages in a single
list, since they are initialized in different ways. Otherwise, we need
to maintain a subtle rule in the single page table page list that in every
513 pages, first one is PMD page table page and the rest are PTE page
table pages.

As I am typing, I also realize that my current design does not work
when PMD split lock is disabled, so I will fix it. I would store PMD pages
and PTE pages in two separate lists in mm_struct.


Any comments?


—
Best Regards,
Yan Zi
Kirill A. Shutemov Sept. 9, 2020, 1:46 p.m. UTC | #6
On Mon, Sep 07, 2020 at 11:11:05AM -0400, Zi Yan wrote:
> On 7 Sep 2020, at 8:22, Kirill A. Shutemov wrote:
> 
> > On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote:
> >> From: Zi Yan <ziy@nvidia.com>
> >>
> >> When depositing page table pages for 1GB THPs, we need 512 PTE pages +
> >> 1 PMD page. Instead of counting and depositing 513 pages, we can use the
> >> PMD page as a leader page and chain the rest 512 PTE pages with ->lru.
> >> This, however, prevents us depositing PMD pages with ->lru, which is
> >> currently used by depositing PTE pages for 2MB THPs. So add a new
> >> pagechain container for PMD pages.
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >
> > Just deposit it to a linked list in the mm_struct as we do for PMD if
> > split ptl disabled.
> >
> 
> Thank you for checking the patches. Since we don’t have PUD split lock
> yet, I store the PMD page table pages in a newly added linked list head
> in mm_struct like you suggested above.
> 
> I was too vague about my pagechain design for depositing page table pages
> for PUD THPs. Sorry about the confusion. Let me clarify why
> I am doing this pagechain here too. I am sure there would be
> some other designs and I am happy to change my code.
> 
> In my design, I did not store all page table pages in a single list.
> I first deposit 512 PTE pages in one PMD page table page’s pmd_huge_pte
> using pgtable_trans_huge_depsit(), then deposit the PMD page to
> a newly added linked list in mm_struct. Since pmd_huge_pte shares space
> with half of lru in struct page, we cannot use lru to link all PMD
> pages together. As a result, I added pagechain. Also in this way,
> we can avoid these things:
> 
> 1. when we withdraw the PMD page during PUD THP split, we don’t need
> to withdraw 513 page, set up one PMD page, then, deposit 512 PTE pages
> in that PMD page.
> 
> 2. we don’t mix PMD page table pages and PTE page table pages in a single
> list, since they are initialized in different ways. Otherwise, we need
> to maintain a subtle rule in the single page table page list that in every
> 513 pages, first one is PMD page table page and the rest are PTE page
> table pages.
> 
> As I am typing, I also realize that my current design does not work
> when PMD split lock is disabled, so I will fix it. I would store PMD pages
> and PTE pages in two separate lists in mm_struct.
> 
> 
> Any comments?

Okay, fair enough.

Although, I think you can get away without a new data structure. We don't
need double-linked list to deposit page tables. You can rework PTE tables
deposit code to have single-linked list and use one pointer of ->lru (with
proper name) and make PMD tables deposit to use the other one. This way
you can avoid conflict for ->lru.

Does it make sense?
Zi Yan Sept. 9, 2020, 2:15 p.m. UTC | #7
On 9 Sep 2020, at 9:46, Kirill A. Shutemov wrote:

> On Mon, Sep 07, 2020 at 11:11:05AM -0400, Zi Yan wrote:
>> On 7 Sep 2020, at 8:22, Kirill A. Shutemov wrote:
>>
>>> On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote:
>>>> From: Zi Yan <ziy@nvidia.com>
>>>>
>>>> When depositing page table pages for 1GB THPs, we need 512 PTE pages +
>>>> 1 PMD page. Instead of counting and depositing 513 pages, we can use the
>>>> PMD page as a leader page and chain the rest 512 PTE pages with ->lru.
>>>> This, however, prevents us depositing PMD pages with ->lru, which is
>>>> currently used by depositing PTE pages for 2MB THPs. So add a new
>>>> pagechain container for PMD pages.
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>
>>> Just deposit it to a linked list in the mm_struct as we do for PMD if
>>> split ptl disabled.
>>>
>>
>> Thank you for checking the patches. Since we don’t have PUD split lock
>> yet, I store the PMD page table pages in a newly added linked list head
>> in mm_struct like you suggested above.
>>
>> I was too vague about my pagechain design for depositing page table pages
>> for PUD THPs. Sorry about the confusion. Let me clarify why
>> I am doing this pagechain here too. I am sure there would be
>> some other designs and I am happy to change my code.
>>
>> In my design, I did not store all page table pages in a single list.
>> I first deposit 512 PTE pages in one PMD page table page’s pmd_huge_pte
>> using pgtable_trans_huge_depsit(), then deposit the PMD page to
>> a newly added linked list in mm_struct. Since pmd_huge_pte shares space
>> with half of lru in struct page, we cannot use lru to link all PMD
>> pages together. As a result, I added pagechain. Also in this way,
>> we can avoid these things:
>>
>> 1. when we withdraw the PMD page during PUD THP split, we don’t need
>> to withdraw 513 page, set up one PMD page, then, deposit 512 PTE pages
>> in that PMD page.
>>
>> 2. we don’t mix PMD page table pages and PTE page table pages in a single
>> list, since they are initialized in different ways. Otherwise, we need
>> to maintain a subtle rule in the single page table page list that in every
>> 513 pages, first one is PMD page table page and the rest are PTE page
>> table pages.
>>
>> As I am typing, I also realize that my current design does not work
>> when PMD split lock is disabled, so I will fix it. I would store PMD pages
>> and PTE pages in two separate lists in mm_struct.
>>
>>
>> Any comments?
>
> Okay, fair enough.
>
> Although, I think you can get away without a new data structure. We don't
> need double-linked list to deposit page tables. You can rework PTE tables
> deposit code to have single-linked list and use one pointer of ->lru (with
> proper name) and make PMD tables deposit to use the other one. This way
> you can avoid conflict for ->lru.
>
> Does it make sense?

Yes. Thanks. Will do this in the next version. I think the single linked list
from llist.h can be used.

—
Best Regards,
Yan Zi
diff mbox series

Patch

diff --git a/include/linux/pagechain.h b/include/linux/pagechain.h
new file mode 100644
index 000000000000..be536142b413
--- /dev/null
+++ b/include/linux/pagechain.h
@@ -0,0 +1,73 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * include/linux/pagechain.h
+ *
+ * In many places it is efficient to batch an operation up against multiple
+ * pages. A pagechain is a multipage container which is used for that.
+ */
+
+#ifndef _LINUX_PAGECHAIN_H
+#define _LINUX_PAGECHAIN_H
+
+#include <linux/slab.h>
+
+/* 14 pointers + two long's align the pagechain structure to a power of two */
+#define PAGECHAIN_SIZE	13
+
+struct page;
+
+struct pagechain {
+	struct list_head list;
+	unsigned int nr;
+	struct page *pages[PAGECHAIN_SIZE];
+};
+
+static inline void pagechain_init(struct pagechain *pchain)
+{
+	pchain->nr = 0;
+	INIT_LIST_HEAD(&pchain->list);
+}
+
+static inline void pagechain_reinit(struct pagechain *pchain)
+{
+	pchain->nr = 0;
+}
+
+static inline unsigned int pagechain_count(struct pagechain *pchain)
+{
+	return pchain->nr;
+}
+
+static inline unsigned int pagechain_space(struct pagechain *pchain)
+{
+	return PAGECHAIN_SIZE - pchain->nr;
+}
+
+static inline bool pagechain_empty(struct pagechain *pchain)
+{
+	return pchain->nr == 0;
+}
+
+/*
+ * Add a page to a pagechain.  Returns the number of slots still available.
+ */
+static inline unsigned int pagechain_deposit(struct pagechain *pchain, struct page *page)
+{
+	VM_BUG_ON(!pagechain_space(pchain));
+	pchain->pages[pchain->nr++] = page;
+	return pagechain_space(pchain);
+}
+
+static inline struct page *pagechain_withdraw(struct pagechain *pchain)
+{
+	if (!pagechain_count(pchain))
+		return NULL;
+	return pchain->pages[--pchain->nr];
+}
+
+void __init pagechain_cache_init(void);
+struct pagechain *pagechain_alloc(void);
+void pagechain_free(struct pagechain *pchain);
+
+#endif /* _LINUX_PAGECHAIN_H */
+