Message ID | 20220608144031.829-2-linmiaohe@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | A few cleanup and fixup patches for swap | expand |
On 08.06.22 16:40, Miaohe Lin wrote: > security_vm_enough_memory_mm() checks whether a process has enough memory > to allocate a new virtual mapping. And total_swap_pages is considered as > available memory while swapoff tries to make sure there's enough memory > that can hold the swapped out memory. But total_swap_pages contains the > swap space that is being swapoff. So security_vm_enough_memory_mm() will > success even if there's no memory to hold the swapped out memory because s/success/succeed/ > total_swap_pages always greater than or equal to p->pages. > > In order to fix it, p->pages should be retracted from total_swap_pages s/retracted/subtracted/ > first and then check whether there's enough memory for inuse swap pages. > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > mm/swapfile.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index ec4c1b276691..d2bead7b8b70 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) > struct filename *pathname; > int err, found = 0; > unsigned int old_block_size; > + unsigned int inuse_pages; > > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) > spin_unlock(&swap_lock); > goto out_dput; > } > - if (!security_vm_enough_memory_mm(current->mm, p->pages)) > - vm_unacct_memory(p->pages); > + > + total_swap_pages -= p->pages; > + inuse_pages = READ_ONCE(p->inuse_pages); > + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) > + vm_unacct_memory(inuse_pages); > else { > + total_swap_pages += p->pages; That implies that whenever we fail in security_vm_enough_memory_mm(), that other concurrent users might see a wrong total_swap_pages. Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use. Temporarily, we'd have CommitLimit 4 GiB Committed_AS 10 GiB Not sure if relevant, but I wonder if it could be avoided somehow? Apart from that, LGTM.
On 2022/6/17 15:33, David Hildenbrand wrote: > On 08.06.22 16:40, Miaohe Lin wrote: >> security_vm_enough_memory_mm() checks whether a process has enough memory >> to allocate a new virtual mapping. And total_swap_pages is considered as >> available memory while swapoff tries to make sure there's enough memory >> that can hold the swapped out memory. But total_swap_pages contains the >> swap space that is being swapoff. So security_vm_enough_memory_mm() will >> success even if there's no memory to hold the swapped out memory because > > s/success/succeed/ OK. Thanks. > >> total_swap_pages always greater than or equal to p->pages. >> >> In order to fix it, p->pages should be retracted from total_swap_pages > > s/retracted/subtracted/ OK. Thanks. > >> first and then check whether there's enough memory for inuse swap pages. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> mm/swapfile.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index ec4c1b276691..d2bead7b8b70 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >> struct filename *pathname; >> int err, found = 0; >> unsigned int old_block_size; >> + unsigned int inuse_pages; >> >> if (!capable(CAP_SYS_ADMIN)) >> return -EPERM; >> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >> spin_unlock(&swap_lock); >> goto out_dput; >> } >> - if (!security_vm_enough_memory_mm(current->mm, p->pages)) >> - vm_unacct_memory(p->pages); >> + >> + total_swap_pages -= p->pages; >> + inuse_pages = READ_ONCE(p->inuse_pages); >> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) >> + vm_unacct_memory(inuse_pages); >> else { >> + total_swap_pages += p->pages; > > That implies that whenever we fail in security_vm_enough_memory_mm(), > that other concurrent users might see a wrong total_swap_pages. > > Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use. > > Temporarily, we'd have > > CommitLimit 4 GiB > Committed_AS 10 GiB IIUC, even if without this change, the other concurrent users if come after vm_acct_memory() is done in __vm_enough_memory(), they might see CommitLimit 12 GiB (4 GiB memory + 8GiB total swap) Committed_AS 18 GiB (10 GiB in use + 8GiB swap space to swapoff) Or am I miss something? > > Not sure if relevant, but I wonder if it could be avoided somehow? It seems this race exists already and is benign. The worst case is concurrent users might fail to allocate the memory. But that window should be really small and swapoff is a rare ops. Or should I try to fix this race? > > > Apart from that, LGTM. Many thanks for comment! :) >
On 18.06.22 04:43, Miaohe Lin wrote: > On 2022/6/17 15:33, David Hildenbrand wrote: >> On 08.06.22 16:40, Miaohe Lin wrote: >>> security_vm_enough_memory_mm() checks whether a process has enough memory >>> to allocate a new virtual mapping. And total_swap_pages is considered as >>> available memory while swapoff tries to make sure there's enough memory >>> that can hold the swapped out memory. But total_swap_pages contains the >>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>> success even if there's no memory to hold the swapped out memory because >> >> s/success/succeed/ > > OK. Thanks. > >> >>> total_swap_pages always greater than or equal to p->pages. >>> >>> In order to fix it, p->pages should be retracted from total_swap_pages >> >> s/retracted/subtracted/ > > OK. Thanks. > >> >>> first and then check whether there's enough memory for inuse swap pages. >>> >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>> --- >>> mm/swapfile.c | 10 +++++++--- >>> 1 file changed, 7 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/swapfile.c b/mm/swapfile.c >>> index ec4c1b276691..d2bead7b8b70 100644 >>> --- a/mm/swapfile.c >>> +++ b/mm/swapfile.c >>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >>> struct filename *pathname; >>> int err, found = 0; >>> unsigned int old_block_size; >>> + unsigned int inuse_pages; >>> >>> if (!capable(CAP_SYS_ADMIN)) >>> return -EPERM; >>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >>> spin_unlock(&swap_lock); >>> goto out_dput; >>> } >>> - if (!security_vm_enough_memory_mm(current->mm, p->pages)) >>> - vm_unacct_memory(p->pages); >>> + >>> + total_swap_pages -= p->pages; >>> + inuse_pages = READ_ONCE(p->inuse_pages); >>> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) >>> + vm_unacct_memory(inuse_pages); >>> else { >>> + total_swap_pages += p->pages; >> >> That implies that whenever we fail in security_vm_enough_memory_mm(), >> that other concurrent users might see a wrong total_swap_pages. >> >> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use. >> >> Temporarily, we'd have >> >> CommitLimit 4 GiB >> Committed_AS 10 GiB > > IIUC, even if without this change, the other concurrent users if come after vm_acct_memory() > is done in __vm_enough_memory(), they might see > > CommitLimit 12 GiB (4 GiB memory + 8GiB total swap) > Committed_AS 18 GiB (10 GiB in use + 8GiB swap space to swapoff) > > Or am I miss something? > I think you are right! Reviewed-by: David Hildenbrand <david@redhat.com>
On 2022/6/18 15:10, David Hildenbrand wrote: > On 18.06.22 04:43, Miaohe Lin wrote: >> On 2022/6/17 15:33, David Hildenbrand wrote: >>> On 08.06.22 16:40, Miaohe Lin wrote: >>>> security_vm_enough_memory_mm() checks whether a process has enough memory >>>> to allocate a new virtual mapping. And total_swap_pages is considered as >>>> available memory while swapoff tries to make sure there's enough memory >>>> that can hold the swapped out memory. But total_swap_pages contains the >>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>>> success even if there's no memory to hold the swapped out memory because >>> >>> s/success/succeed/ >> >> OK. Thanks. >> >>> >>>> total_swap_pages always greater than or equal to p->pages. >>>> >>>> In order to fix it, p->pages should be retracted from total_swap_pages >>> >>> s/retracted/subtracted/ >> >> OK. Thanks. >> >>> >>>> first and then check whether there's enough memory for inuse swap pages. >>>> >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>> --- >>>> mm/swapfile.c | 10 +++++++--- >>>> 1 file changed, 7 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/mm/swapfile.c b/mm/swapfile.c >>>> index ec4c1b276691..d2bead7b8b70 100644 >>>> --- a/mm/swapfile.c >>>> +++ b/mm/swapfile.c >>>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >>>> struct filename *pathname; >>>> int err, found = 0; >>>> unsigned int old_block_size; >>>> + unsigned int inuse_pages; >>>> >>>> if (!capable(CAP_SYS_ADMIN)) >>>> return -EPERM; >>>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) >>>> spin_unlock(&swap_lock); >>>> goto out_dput; >>>> } >>>> - if (!security_vm_enough_memory_mm(current->mm, p->pages)) >>>> - vm_unacct_memory(p->pages); >>>> + >>>> + total_swap_pages -= p->pages; >>>> + inuse_pages = READ_ONCE(p->inuse_pages); >>>> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) >>>> + vm_unacct_memory(inuse_pages); >>>> else { >>>> + total_swap_pages += p->pages; >>> >>> That implies that whenever we fail in security_vm_enough_memory_mm(), >>> that other concurrent users might see a wrong total_swap_pages. >>> >>> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use. >>> >>> Temporarily, we'd have >>> >>> CommitLimit 4 GiB >>> Committed_AS 10 GiB >> >> IIUC, even if without this change, the other concurrent users if come after vm_acct_memory() >> is done in __vm_enough_memory(), they might see >> >> CommitLimit 12 GiB (4 GiB memory + 8GiB total swap) >> Committed_AS 18 GiB (10 GiB in use + 8GiB swap space to swapoff) >> >> Or am I miss something? >> > > I think you are right! > > Reviewed-by: David Hildenbrand <david@redhat.com> Thanks a lot! > >
Miaohe Lin <linmiaohe@huawei.com> writes: > security_vm_enough_memory_mm() checks whether a process has enough memory > to allocate a new virtual mapping. And total_swap_pages is considered as > available memory while swapoff tries to make sure there's enough memory > that can hold the swapped out memory. But total_swap_pages contains the > swap space that is being swapoff. So security_vm_enough_memory_mm() will > success even if there's no memory to hold the swapped out memory because > total_swap_pages always greater than or equal to p->pages. Per my understanding, swapoff will not allocate virtual mapping by itself. But after swapoff, the overcommit limit could be exceeded. security_vm_enough_memory_mm() is used to check that. For example, in a system with 4GB memory and 8GB swap, and 10GB is in use, CommitLimit: 4+8 = 12GB Committed_AS: 10GB security_vm_enough_memory_mm() in swapoff() will fail because 10+8 = 18 > 12. This is expected because after swapoff, the overcommit limit will be exceeded. If 3GB is in use, CommitLimit: 4+8 = 12GB Committed_AS: 3GB security_vm_enough_memory_mm() in swapoff() will succeed because 3+8 = 11 < 12. This is expected because after swapoff, the overcommit limit will not be exceeded. So, what's the real problem of the original implementation? Can you show it with an example as above? Best Regards, Huang, Ying > In order to fix it, p->pages should be retracted from total_swap_pages > first and then check whether there's enough memory for inuse swap pages. > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> [snip]
On 2022/6/20 15:31, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> security_vm_enough_memory_mm() checks whether a process has enough memory >> to allocate a new virtual mapping. And total_swap_pages is considered as >> available memory while swapoff tries to make sure there's enough memory >> that can hold the swapped out memory. But total_swap_pages contains the >> swap space that is being swapoff. So security_vm_enough_memory_mm() will >> success even if there's no memory to hold the swapped out memory because >> total_swap_pages always greater than or equal to p->pages. > > Per my understanding, swapoff will not allocate virtual mapping by > itself. But after swapoff, the overcommit limit could be exceeded. > security_vm_enough_memory_mm() is used to check that. For example, in a > system with 4GB memory and 8GB swap, and 10GB is in use, > > CommitLimit: 4+8 = 12GB > Committed_AS: 10GB > > security_vm_enough_memory_mm() in swapoff() will fail because > 10+8 = 18 > 12. This is expected because after swapoff, the overcommit > limit will be exceeded. > > If 3GB is in use, > > CommitLimit: 4+8 = 12GB > Committed_AS: 3GB > > security_vm_enough_memory_mm() in swapoff() will succeed because > 3+8 = 11 < 12. This is expected because after swapoff, the overcommit > limit will not be exceeded. In OVERCOMMIT_NEVER scene, I think you're right. > > So, what's the real problem of the original implementation? Can you > show it with an example as above? In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the below case. if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { if (pages > totalram_pages() + total_swap_pages) goto error; return 0; } Or am I miss something? > > Best Regards, > Huang, Ying Thanks! > >> In order to fix it, p->pages should be retracted from total_swap_pages >> first and then check whether there's enough memory for inuse swap pages. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > > [snip] > > . >
Miaohe Lin <linmiaohe@huawei.com> writes: > On 2022/6/20 15:31, Huang, Ying wrote: >> Miaohe Lin <linmiaohe@huawei.com> writes: >> >>> security_vm_enough_memory_mm() checks whether a process has enough memory >>> to allocate a new virtual mapping. And total_swap_pages is considered as >>> available memory while swapoff tries to make sure there's enough memory >>> that can hold the swapped out memory. But total_swap_pages contains the >>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>> success even if there's no memory to hold the swapped out memory because >>> total_swap_pages always greater than or equal to p->pages. >> >> Per my understanding, swapoff will not allocate virtual mapping by >> itself. But after swapoff, the overcommit limit could be exceeded. >> security_vm_enough_memory_mm() is used to check that. For example, in a >> system with 4GB memory and 8GB swap, and 10GB is in use, >> >> CommitLimit: 4+8 = 12GB >> Committed_AS: 10GB >> >> security_vm_enough_memory_mm() in swapoff() will fail because >> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit >> limit will be exceeded. >> >> If 3GB is in use, >> >> CommitLimit: 4+8 = 12GB >> Committed_AS: 3GB >> >> security_vm_enough_memory_mm() in swapoff() will succeed because >> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit >> limit will not be exceeded. > > In OVERCOMMIT_NEVER scene, I think you're right. > >> >> So, what's the real problem of the original implementation? Can you >> show it with an example as above? > > In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, > pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed > instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the > below case. > > if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { > if (pages > totalram_pages() + total_swap_pages) > goto error; > return 0; > } > > Or am I miss something? Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages isn't checked at all. The only restriction is that the size of the virtual mapping created should be less than total RAM + total swap pages. Because swapoff() will not create virtual mapping, so it's expected that security_vm_enough_memory_mm() in swapoff() always succeeds. Best Regards, Huang, Ying > > Thanks! > >> >>> In order to fix it, p->pages should be retracted from total_swap_pages >>> first and then check whether there's enough memory for inuse swap pages. >>> >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> >> [snip] >> >> . >>
On 2022/6/21 9:35, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> On 2022/6/20 15:31, Huang, Ying wrote: >>> Miaohe Lin <linmiaohe@huawei.com> writes: >>> >>>> security_vm_enough_memory_mm() checks whether a process has enough memory >>>> to allocate a new virtual mapping. And total_swap_pages is considered as >>>> available memory while swapoff tries to make sure there's enough memory >>>> that can hold the swapped out memory. But total_swap_pages contains the >>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>>> success even if there's no memory to hold the swapped out memory because >>>> total_swap_pages always greater than or equal to p->pages. >>> >>> Per my understanding, swapoff will not allocate virtual mapping by >>> itself. But after swapoff, the overcommit limit could be exceeded. >>> security_vm_enough_memory_mm() is used to check that. For example, in a >>> system with 4GB memory and 8GB swap, and 10GB is in use, >>> >>> CommitLimit: 4+8 = 12GB >>> Committed_AS: 10GB >>> >>> security_vm_enough_memory_mm() in swapoff() will fail because >>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit >>> limit will be exceeded. >>> >>> If 3GB is in use, >>> >>> CommitLimit: 4+8 = 12GB >>> Committed_AS: 3GB >>> >>> security_vm_enough_memory_mm() in swapoff() will succeed because >>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit >>> limit will not be exceeded. >> >> In OVERCOMMIT_NEVER scene, I think you're right. >> >>> >>> So, what's the real problem of the original implementation? Can you >>> show it with an example as above? >> >> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, >> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed >> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the >> below case. >> >> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { >> if (pages > totalram_pages() + total_swap_pages) >> goto error; >> return 0; >> } >> >> Or am I miss something? > > Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages > isn't checked at all. The only restriction is that the size of the > virtual mapping created should be less than total RAM + total swap Do you mean the only restriction is that the size of the virtual mapping *created every time* should be less than total RAM + total swap pages but *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so, the current behavior should be sane and I will drop this patch. Thanks! > pages. Because swapoff() will not create virtual mapping, so it's > expected that security_vm_enough_memory_mm() in swapoff() always > succeeds. > > Best Regards, > Huang, Ying > >> >> Thanks! >> >>> >>>> In order to fix it, p->pages should be retracted from total_swap_pages >>>> first and then check whether there's enough memory for inuse swap pages. >>>> >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>> >>> [snip] >>> >>> . >>> > > . >
Miaohe Lin <linmiaohe@huawei.com> writes: > On 2022/6/21 9:35, Huang, Ying wrote: >> Miaohe Lin <linmiaohe@huawei.com> writes: >> >>> On 2022/6/20 15:31, Huang, Ying wrote: >>>> Miaohe Lin <linmiaohe@huawei.com> writes: >>>> >>>>> security_vm_enough_memory_mm() checks whether a process has enough memory >>>>> to allocate a new virtual mapping. And total_swap_pages is considered as >>>>> available memory while swapoff tries to make sure there's enough memory >>>>> that can hold the swapped out memory. But total_swap_pages contains the >>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>>>> success even if there's no memory to hold the swapped out memory because >>>>> total_swap_pages always greater than or equal to p->pages. >>>> >>>> Per my understanding, swapoff will not allocate virtual mapping by >>>> itself. But after swapoff, the overcommit limit could be exceeded. >>>> security_vm_enough_memory_mm() is used to check that. For example, in a >>>> system with 4GB memory and 8GB swap, and 10GB is in use, >>>> >>>> CommitLimit: 4+8 = 12GB >>>> Committed_AS: 10GB >>>> >>>> security_vm_enough_memory_mm() in swapoff() will fail because >>>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit >>>> limit will be exceeded. >>>> >>>> If 3GB is in use, >>>> >>>> CommitLimit: 4+8 = 12GB >>>> Committed_AS: 3GB >>>> >>>> security_vm_enough_memory_mm() in swapoff() will succeed because >>>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit >>>> limit will not be exceeded. >>> >>> In OVERCOMMIT_NEVER scene, I think you're right. >>> >>>> >>>> So, what's the real problem of the original implementation? Can you >>>> show it with an example as above? >>> >>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, >>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed >>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the >>> below case. >>> >>> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { >>> if (pages > totalram_pages() + total_swap_pages) >>> goto error; >>> return 0; >>> } >>> >>> Or am I miss something? >> >> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages >> isn't checked at all. The only restriction is that the size of the >> virtual mapping created should be less than total RAM + total swap > > Do you mean the only restriction is that the size of the virtual mapping > *created every time* should be less than total RAM + total swap pages but > *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so, > the current behavior should be sane and I will drop this patch. Yes. This is my understanding. Best Regards, Huang, Ying > Thanks! > >> pages. Because swapoff() will not create virtual mapping, so it's >> expected that security_vm_enough_memory_mm() in swapoff() always >> succeeds. >> >> Best Regards, >> Huang, Ying >> >>> >>> Thanks! >>> >>>> >>>>> In order to fix it, p->pages should be retracted from total_swap_pages >>>>> first and then check whether there's enough memory for inuse swap pages. >>>>> >>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>> >>>> [snip] >>>> >>>> . >>>> >> >> . >>
On 2022/6/21 15:42, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> On 2022/6/21 9:35, Huang, Ying wrote: >>> Miaohe Lin <linmiaohe@huawei.com> writes: >>> >>>> On 2022/6/20 15:31, Huang, Ying wrote: >>>>> Miaohe Lin <linmiaohe@huawei.com> writes: >>>>> >>>>>> security_vm_enough_memory_mm() checks whether a process has enough memory >>>>>> to allocate a new virtual mapping. And total_swap_pages is considered as >>>>>> available memory while swapoff tries to make sure there's enough memory >>>>>> that can hold the swapped out memory. But total_swap_pages contains the >>>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will >>>>>> success even if there's no memory to hold the swapped out memory because >>>>>> total_swap_pages always greater than or equal to p->pages. >>>>> >>>>> Per my understanding, swapoff will not allocate virtual mapping by >>>>> itself. But after swapoff, the overcommit limit could be exceeded. >>>>> security_vm_enough_memory_mm() is used to check that. For example, in a >>>>> system with 4GB memory and 8GB swap, and 10GB is in use, >>>>> >>>>> CommitLimit: 4+8 = 12GB >>>>> Committed_AS: 10GB >>>>> >>>>> security_vm_enough_memory_mm() in swapoff() will fail because >>>>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit >>>>> limit will be exceeded. >>>>> >>>>> If 3GB is in use, >>>>> >>>>> CommitLimit: 4+8 = 12GB >>>>> Committed_AS: 3GB >>>>> >>>>> security_vm_enough_memory_mm() in swapoff() will succeed because >>>>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit >>>>> limit will not be exceeded. >>>> >>>> In OVERCOMMIT_NEVER scene, I think you're right. >>>> >>>>> >>>>> So, what's the real problem of the original implementation? Can you >>>>> show it with an example as above? >>>> >>>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use, >>>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed >>>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the >>>> below case. >>>> >>>> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { >>>> if (pages > totalram_pages() + total_swap_pages) >>>> goto error; >>>> return 0; >>>> } >>>> >>>> Or am I miss something? >>> >>> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages >>> isn't checked at all. The only restriction is that the size of the >>> virtual mapping created should be less than total RAM + total swap >> >> Do you mean the only restriction is that the size of the virtual mapping >> *created every time* should be less than total RAM + total swap pages but >> *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so, >> the current behavior should be sane and I will drop this patch. > > Yes. This is my understanding. I see. Thank you. > > Best Regards, > Huang, Ying > >> Thanks! >> >>> pages. Because swapoff() will not create virtual mapping, so it's >>> expected that security_vm_enough_memory_mm() in swapoff() always >>> succeeds. >>> >>> Best Regards, >>> Huang, Ying >>> >>>> >>>> Thanks! >>>> >>>>> >>>>>> In order to fix it, p->pages should be retracted from total_swap_pages >>>>>> first and then check whether there's enough memory for inuse swap pages. >>>>>> >>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>>> >>>>> [snip] >>>>> >>>>> . >>>>> >>> >>> . >>> > > . >
diff --git a/mm/swapfile.c b/mm/swapfile.c index ec4c1b276691..d2bead7b8b70 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) struct filename *pathname; int err, found = 0; unsigned int old_block_size; + unsigned int inuse_pages; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) spin_unlock(&swap_lock); goto out_dput; } - if (!security_vm_enough_memory_mm(current->mm, p->pages)) - vm_unacct_memory(p->pages); + + total_swap_pages -= p->pages; + inuse_pages = READ_ONCE(p->inuse_pages); + if (!security_vm_enough_memory_mm(current->mm, inuse_pages)) + vm_unacct_memory(inuse_pages); else { + total_swap_pages += p->pages; err = -ENOMEM; spin_unlock(&swap_lock); goto out_dput; @@ -2453,7 +2458,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) } plist_del(&p->list, &swap_active_head); atomic_long_sub(p->pages, &nr_swap_pages); - total_swap_pages -= p->pages; p->flags &= ~SWP_WRITEOK; spin_unlock(&p->lock); spin_unlock(&swap_lock);
security_vm_enough_memory_mm() checks whether a process has enough memory to allocate a new virtual mapping. And total_swap_pages is considered as available memory while swapoff tries to make sure there's enough memory that can hold the swapped out memory. But total_swap_pages contains the swap space that is being swapoff. So security_vm_enough_memory_mm() will success even if there's no memory to hold the swapped out memory because total_swap_pages always greater than or equal to p->pages. In order to fix it, p->pages should be retracted from total_swap_pages first and then check whether there's enough memory for inuse swap pages. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> --- mm/swapfile.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)