[0/2] Add a new scheme to support demotion on tiered memory system

Message ID	cover.1640077468.git.baolin.wang@linux.alibaba.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Baolin Wang <baolin.wang@linux.alibaba.com> To: sj@kernel.org, akpm@linux-foundation.org Cc: ying.huang@intel.com, dave.hansen@linux.intel.com, ziy@nvidia.com, shy828301@gmail.com, zhongjiang-ali@linux.alibaba.com, xlpang@linux.alibaba.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 0/2] Add a new scheme to support demotion on tiered memory system Date: Tue, 21 Dec 2021 17:18:02 +0800 Message-Id: <cover.1640077468.git.baolin.wang@linux.alibaba.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Add a new scheme to support demotion on tiered memory system \| expand [0/2] Add a new scheme to support demotion on tiered memory system [1/2] mm: Export the alloc_demote_page() function [2/2] mm/damon: Add a new scheme to support demotion on tiered memory system

Message ID

cover.1640077468.git.baolin.wang@linux.alibaba.com (mailing list archive)

Headers

From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: sj@kernel.org,
	akpm@linux-foundation.org
Cc: ying.huang@intel.com,
	dave.hansen@linux.intel.com,
	ziy@nvidia.com,
	shy828301@gmail.com,
	zhongjiang-ali@linux.alibaba.com,
	xlpang@linux.alibaba.com,
	baolin.wang@linux.alibaba.com,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 0/2] Add a new scheme to support demotion on tiered memory
 system
Date: Tue, 21 Dec 2021 17:18:02 +0800
Message-Id: <cover.1640077468.git.baolin.wang@linux.alibaba.com>
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Add a new scheme to support demotion on tiered memory system | expand

Message

Baolin Wang Dec. 21, 2021, 9:18 a.m. UTC

Hi,

Now on tiered memory system with different memory types, the reclaim path in
shrink_page_list() already support demoting pages to slow memory node instead
of discarding the pages. However, at that time the fast memory node memory
wartermark is already tense, which will increase the memory allocation latency
during page demotion. So a new method from user space demoting cold pages
proactively will be more helpful.

We can rely on the DAMON in user space to help to monitor the cold memory on
fast memory node, and demote the cold pages to slow memory node proactively to
keep the fast memory node in a healthy state.

This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
and works well from my testing. Any comments are welcome. Thanks.


Baolin Wang (2):
  mm: Export the alloc_demote_page() function
  mm/damon: Add a new scheme to support demotion on tiered memory system

 include/linux/damon.h |   3 +
 mm/damon/dbgfs.c      |   1 +
 mm/damon/vaddr.c      | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/internal.h         |   1 +
 mm/vmscan.c           |   2 +-
 5 files changed, 162 insertions(+), 1 deletion(-)

Comments

SeongJae Park Dec. 21, 2021, 1:26 p.m. UTC | #1

Hi Baolin,

On Tue, 21 Dec 2021 17:18:02 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:

> Hi,
> 
> Now on tiered memory system with different memory types, the reclaim path in
> shrink_page_list() already support demoting pages to slow memory node instead
> of discarding the pages. However, at that time the fast memory node memory
> wartermark is already tense, which will increase the memory allocation latency
> during page demotion. So a new method from user space demoting cold pages
> proactively will be more helpful.
> 
> We can rely on the DAMON in user space to help to monitor the cold memory on
> fast memory node, and demote the cold pages to slow memory node proactively to
> keep the fast memory node in a healthy state.
> 
> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
> and works well from my testing. Any comments are welcome. Thanks.

I like the idea, thank you for these patches!  If possible, could you share
some details about your tests?


Thanks,
SJ

> 
> 
> Baolin Wang (2):
>   mm: Export the alloc_demote_page() function
>   mm/damon: Add a new scheme to support demotion on tiered memory system
> 
>  include/linux/damon.h |   3 +
>  mm/damon/dbgfs.c      |   1 +
>  mm/damon/vaddr.c      | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/internal.h         |   1 +
>  mm/vmscan.c           |   2 +-
>  5 files changed, 162 insertions(+), 1 deletion(-)
> 
> -- 
> 1.8.3.1

Baolin Wang Dec. 21, 2021, 2:32 p.m. UTC | #2

On 12/21/2021 9:26 PM, SeongJae Park wrote:
> Hi Baolin,
> 
> On Tue, 21 Dec 2021 17:18:02 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
> 
>> Hi,
>>
>> Now on tiered memory system with different memory types, the reclaim path in
>> shrink_page_list() already support demoting pages to slow memory node instead
>> of discarding the pages. However, at that time the fast memory node memory
>> wartermark is already tense, which will increase the memory allocation latency
>> during page demotion. So a new method from user space demoting cold pages
>> proactively will be more helpful.
>>
>> We can rely on the DAMON in user space to help to monitor the cold memory on
>> fast memory node, and demote the cold pages to slow memory node proactively to
>> keep the fast memory node in a healthy state.
>>
>> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
>> and works well from my testing. Any comments are welcome. Thanks.
> 
> I like the idea, thank you for these patches!  If possible, could you share
> some details about your tests?

Sure, sorry for not adding more information about my tests.

My machine contains 64G DRAM + 256G AEP(persistent memory), and you 
should enable the demotion firstly by:
echo "true" > /sys/kernel/mm/numa/demotion_enabled

Then I just write a simple test case like below to mmap some anon 
memory, and then just read and write half of the mmap buffer to let 
another half to be cold enough to demote.

int main()
{
         int len = 50 * 1024 * 1024;
         int scan_len = len / 2;
         int i, ret, j;
         unsigned long *p;

         p = mmap(NULL, len, PROT_READ | PROT_WRITE,
                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
         if (p == MAP_FAILED) {
                 printf("failed to get memory\n");
                 return -1;
         }

         for (i = 0; i < len / sizeof(*p); i++)
                 p[i] = 0x55aa;

         /* Let another half of buffer to be cold */
         do {
                 for (i = 0; i < scan_len / sizeof(*p); i++)
                         p[i] = 0x55aa;

                 sleep(2);

                 for (i = 0; i < scan_len / sizeof(*p); i++)
                         j +=  p[i] >> 2;
         } while (1);

         munmap(p, len);
         return 0;
}

After setting the atts/schemes/target_ids, then start monitoring:
echo 100000 1000000 1000000 10 1000 > /sys/kernel/debug/damon/attrs
echo 4096 8192000 0 5 10 2000 5 1000 2097152 5000 0 0 0 0 0 3 2 1 > 
/sys/kernel/debug/damon/schemes

After a while, you can check the demote statictics by below command, and 
you can find the demote scheme is applied by demoting some cold pages to 
slow memory (AEP) node.

cat /proc/vmstat | grep "demote"
pgdemote_direct 6881

SeongJae Park Dec. 22, 2021, 8:54 a.m. UTC | #3

On Tue, 21 Dec 2021 22:32:24 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:

> 
> 
> On 12/21/2021 9:26 PM, SeongJae Park wrote:
> > Hi Baolin,
> > 
> > On Tue, 21 Dec 2021 17:18:02 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
> > 
> >> Hi,
> >>
> >> Now on tiered memory system with different memory types, the reclaim path in
> >> shrink_page_list() already support demoting pages to slow memory node instead
> >> of discarding the pages. However, at that time the fast memory node memory
> >> wartermark is already tense, which will increase the memory allocation latency
> >> during page demotion. So a new method from user space demoting cold pages
> >> proactively will be more helpful.
> >>
> >> We can rely on the DAMON in user space to help to monitor the cold memory on
> >> fast memory node, and demote the cold pages to slow memory node proactively to
> >> keep the fast memory node in a healthy state.
> >>
> >> This patch set introduces a new scheme named DAMOS_DEMOTE to support this feature,
> >> and works well from my testing. Any comments are welcome. Thanks.
> > 
> > I like the idea, thank you for these patches!  If possible, could you share
> > some details about your tests?
> 
> Sure, sorry for not adding more information about my tests.

No problem!

> 
> My machine contains 64G DRAM + 256G AEP(persistent memory), and you 
> should enable the demotion firstly by:
> echo "true" > /sys/kernel/mm/numa/demotion_enabled
> 
> Then I just write a simple test case like below to mmap some anon 
> memory, and then just read and write half of the mmap buffer to let 
> another half to be cold enough to demote.
> 
> int main()
> {
>          int len = 50 * 1024 * 1024;
>          int scan_len = len / 2;
>          int i, ret, j;
>          unsigned long *p;
> 
>          p = mmap(NULL, len, PROT_READ | PROT_WRITE,
>                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>          if (p == MAP_FAILED) {
>                  printf("failed to get memory\n");
>                  return -1;
>          }
> 
>          for (i = 0; i < len / sizeof(*p); i++)
>                  p[i] = 0x55aa;
> 
>          /* Let another half of buffer to be cold */
>          do {
>                  for (i = 0; i < scan_len / sizeof(*p); i++)
>                          p[i] = 0x55aa;
> 
>                  sleep(2);
> 
>                  for (i = 0; i < scan_len / sizeof(*p); i++)
>                          j +=  p[i] >> 2;
>          } while (1);
> 
>          munmap(p, len);
>          return 0;
> }
> 
> After setting the atts/schemes/target_ids, then start monitoring:
> echo 100000 1000000 1000000 10 1000 > /sys/kernel/debug/damon/attrs
> echo 4096 8192000 0 5 10 2000 5 1000 2097152 5000 0 0 0 0 0 3 2 1 > 
> /sys/kernel/debug/damon/schemes
> 
> After a while, you can check the demote statictics by below command, and 
> you can find the demote scheme is applied by demoting some cold pages to 
> slow memory (AEP) node.
> 
> cat /proc/vmstat | grep "demote"
> pgdemote_direct 6881

Thank you for sharing this great details!

I was just wondering if you have tested and measured the effects of the memory
allocation latency increase during the page demotion, which invoked by
shrink_page_list(), and also if you have measured how much improvement can be
achieved with DAMON-based demotion in the scenario.  Seems that's not the case,
and I personally think that information is not essential for this patch, so I
see no problem here.  But, if you have tested or have a plan to do that, and if
you could, I think sharing the results on this cover letter would make this
even greater.


Thanks,
SJ

Baolin Wang Dec. 22, 2021, 9:57 a.m. UTC | #4

On 12/22/2021 4:54 PM, SeongJae Park wrote:
[snip]

>>
>> My machine contains 64G DRAM + 256G AEP(persistent memory), and you
>> should enable the demotion firstly by:
>> echo "true" > /sys/kernel/mm/numa/demotion_enabled
>>
>> Then I just write a simple test case like below to mmap some anon
>> memory, and then just read and write half of the mmap buffer to let
>> another half to be cold enough to demote.
>>
>> int main()
>> {
>>           int len = 50 * 1024 * 1024;
>>           int scan_len = len / 2;
>>           int i, ret, j;
>>           unsigned long *p;
>>
>>           p = mmap(NULL, len, PROT_READ | PROT_WRITE,
>>                    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>           if (p == MAP_FAILED) {
>>                   printf("failed to get memory\n");
>>                   return -1;
>>           }
>>
>>           for (i = 0; i < len / sizeof(*p); i++)
>>                   p[i] = 0x55aa;
>>
>>           /* Let another half of buffer to be cold */
>>           do {
>>                   for (i = 0; i < scan_len / sizeof(*p); i++)
>>                           p[i] = 0x55aa;
>>
>>                   sleep(2);
>>
>>                   for (i = 0; i < scan_len / sizeof(*p); i++)
>>                           j +=  p[i] >> 2;
>>           } while (1);
>>
>>           munmap(p, len);
>>           return 0;
>> }
>>
>> After setting the atts/schemes/target_ids, then start monitoring:
>> echo 100000 1000000 1000000 10 1000 > /sys/kernel/debug/damon/attrs
>> echo 4096 8192000 0 5 10 2000 5 1000 2097152 5000 0 0 0 0 0 3 2 1 >
>> /sys/kernel/debug/damon/schemes
>>
>> After a while, you can check the demote statictics by below command, and
>> you can find the demote scheme is applied by demoting some cold pages to
>> slow memory (AEP) node.
>>
>> cat /proc/vmstat | grep "demote"
>> pgdemote_direct 6881
> 
> Thank you for sharing this great details!
> 
> I was just wondering if you have tested and measured the effects of the memory
> allocation latency increase during the page demotion, which invoked by
> shrink_page_list(), and also if you have measured how much improvement can be
> achieved with DAMON-based demotion in the scenario.  Seems that's not the case,

Not yet testing on the real workload with DAMON demote scheme now, and I 
think DAMON is lack of some functions to tune performance on tiered 
memory system. At least I think we also need add a new promotion scheme 
for DAMON to promote hot memory from slow memory node to the fast memory 
node, which is on my TODO list.

> and I personally think that information is not essential for this patch, so I
> see no problem here.  But, if you have tested or have a plan to do that, and if
> you could, I think sharing the results on this cover letter would make this
> even greater.

Sure, will do if we find some funny results with DAMON on tiered memory 
system in future. Thanks.