diff mbox series

[v2,1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

Message ID 20220608144031.829-2-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series A few cleanup and fixup patches for swap | expand

Commit Message

Miaohe Lin June 8, 2022, 2:40 p.m. UTC
security_vm_enough_memory_mm() checks whether a process has enough memory
to allocate a new virtual mapping. And total_swap_pages is considered as
available memory while swapoff tries to make sure there's enough memory
that can hold the swapped out memory. But total_swap_pages contains the
swap space that is being swapoff. So security_vm_enough_memory_mm() will
success even if there's no memory to hold the swapped out memory because
total_swap_pages always greater than or equal to p->pages.

In order to fix it, p->pages should be retracted from total_swap_pages
first and then check whether there's enough memory for inuse swap pages.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/swapfile.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

David Hildenbrand June 17, 2022, 7:33 a.m. UTC | #1
On 08.06.22 16:40, Miaohe Lin wrote:
> security_vm_enough_memory_mm() checks whether a process has enough memory
> to allocate a new virtual mapping. And total_swap_pages is considered as
> available memory while swapoff tries to make sure there's enough memory
> that can hold the swapped out memory. But total_swap_pages contains the
> swap space that is being swapoff. So security_vm_enough_memory_mm() will
> success even if there's no memory to hold the swapped out memory because

s/success/succeed/

> total_swap_pages always greater than or equal to p->pages.
> 
> In order to fix it, p->pages should be retracted from total_swap_pages

s/retracted/subtracted/

> first and then check whether there's enough memory for inuse swap pages.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/swapfile.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index ec4c1b276691..d2bead7b8b70 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>  	struct filename *pathname;
>  	int err, found = 0;
>  	unsigned int old_block_size;
> +	unsigned int inuse_pages;
>  
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>  		spin_unlock(&swap_lock);
>  		goto out_dput;
>  	}
> -	if (!security_vm_enough_memory_mm(current->mm, p->pages))
> -		vm_unacct_memory(p->pages);
> +
> +	total_swap_pages -= p->pages;
> +	inuse_pages = READ_ONCE(p->inuse_pages);
> +	if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
> +		vm_unacct_memory(inuse_pages);
>  	else {
> +		total_swap_pages += p->pages;

That implies that whenever we fail in security_vm_enough_memory_mm(),
that other concurrent users might see a wrong total_swap_pages.

Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use.

Temporarily, we'd have

CommitLimit    4 GiB
Committed_AS  10 GiB

Not sure if relevant, but I wonder if it could be avoided somehow?


Apart from that, LGTM.
Miaohe Lin June 18, 2022, 2:43 a.m. UTC | #2
On 2022/6/17 15:33, David Hildenbrand wrote:
> On 08.06.22 16:40, Miaohe Lin wrote:
>> security_vm_enough_memory_mm() checks whether a process has enough memory
>> to allocate a new virtual mapping. And total_swap_pages is considered as
>> available memory while swapoff tries to make sure there's enough memory
>> that can hold the swapped out memory. But total_swap_pages contains the
>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>> success even if there's no memory to hold the swapped out memory because
> 
> s/success/succeed/

OK. Thanks.

> 
>> total_swap_pages always greater than or equal to p->pages.
>>
>> In order to fix it, p->pages should be retracted from total_swap_pages
> 
> s/retracted/subtracted/

OK. Thanks.

> 
>> first and then check whether there's enough memory for inuse swap pages.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/swapfile.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index ec4c1b276691..d2bead7b8b70 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>  	struct filename *pathname;
>>  	int err, found = 0;
>>  	unsigned int old_block_size;
>> +	unsigned int inuse_pages;
>>  
>>  	if (!capable(CAP_SYS_ADMIN))
>>  		return -EPERM;
>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>  		spin_unlock(&swap_lock);
>>  		goto out_dput;
>>  	}
>> -	if (!security_vm_enough_memory_mm(current->mm, p->pages))
>> -		vm_unacct_memory(p->pages);
>> +
>> +	total_swap_pages -= p->pages;
>> +	inuse_pages = READ_ONCE(p->inuse_pages);
>> +	if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
>> +		vm_unacct_memory(inuse_pages);
>>  	else {
>> +		total_swap_pages += p->pages;
> 
> That implies that whenever we fail in security_vm_enough_memory_mm(),
> that other concurrent users might see a wrong total_swap_pages.
> 
> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use.
> 
> Temporarily, we'd have
> 
> CommitLimit    4 GiB
> Committed_AS  10 GiB

IIUC, even if without this change, the other concurrent users if come after vm_acct_memory()
is done in __vm_enough_memory(), they might see

CommitLimit   12 GiB (4 GiB memory + 8GiB total swap)
Committed_AS  18 GiB (10 GiB in use + 8GiB swap space to swapoff)

Or am I miss something?

> 
> Not sure if relevant, but I wonder if it could be avoided somehow?

It seems this race exists already and is benign. The worst case is concurrent users might
fail to allocate the memory. But that window should be really small and swapoff is a rare
ops. Or should I try to fix this race?

> 
> 
> Apart from that, LGTM.

Many thanks for comment! :)

>
David Hildenbrand June 18, 2022, 7:10 a.m. UTC | #3
On 18.06.22 04:43, Miaohe Lin wrote:
> On 2022/6/17 15:33, David Hildenbrand wrote:
>> On 08.06.22 16:40, Miaohe Lin wrote:
>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>> available memory while swapoff tries to make sure there's enough memory
>>> that can hold the swapped out memory. But total_swap_pages contains the
>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>> success even if there's no memory to hold the swapped out memory because
>>
>> s/success/succeed/
> 
> OK. Thanks.
> 
>>
>>> total_swap_pages always greater than or equal to p->pages.
>>>
>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>
>> s/retracted/subtracted/
> 
> OK. Thanks.
> 
>>
>>> first and then check whether there's enough memory for inuse swap pages.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  mm/swapfile.c | 10 +++++++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>> index ec4c1b276691..d2bead7b8b70 100644
>>> --- a/mm/swapfile.c
>>> +++ b/mm/swapfile.c
>>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>>  	struct filename *pathname;
>>>  	int err, found = 0;
>>>  	unsigned int old_block_size;
>>> +	unsigned int inuse_pages;
>>>  
>>>  	if (!capable(CAP_SYS_ADMIN))
>>>  		return -EPERM;
>>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>>  		spin_unlock(&swap_lock);
>>>  		goto out_dput;
>>>  	}
>>> -	if (!security_vm_enough_memory_mm(current->mm, p->pages))
>>> -		vm_unacct_memory(p->pages);
>>> +
>>> +	total_swap_pages -= p->pages;
>>> +	inuse_pages = READ_ONCE(p->inuse_pages);
>>> +	if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
>>> +		vm_unacct_memory(inuse_pages);
>>>  	else {
>>> +		total_swap_pages += p->pages;
>>
>> That implies that whenever we fail in security_vm_enough_memory_mm(),
>> that other concurrent users might see a wrong total_swap_pages.
>>
>> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use.
>>
>> Temporarily, we'd have
>>
>> CommitLimit    4 GiB
>> Committed_AS  10 GiB
> 
> IIUC, even if without this change, the other concurrent users if come after vm_acct_memory()
> is done in __vm_enough_memory(), they might see
> 
> CommitLimit   12 GiB (4 GiB memory + 8GiB total swap)
> Committed_AS  18 GiB (10 GiB in use + 8GiB swap space to swapoff)
> 
> Or am I miss something?
> 

I think you are right!

Reviewed-by: David Hildenbrand <david@redhat.com>
Miaohe Lin June 18, 2022, 7:31 a.m. UTC | #4
On 2022/6/18 15:10, David Hildenbrand wrote:
> On 18.06.22 04:43, Miaohe Lin wrote:
>> On 2022/6/17 15:33, David Hildenbrand wrote:
>>> On 08.06.22 16:40, Miaohe Lin wrote:
>>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>>> available memory while swapoff tries to make sure there's enough memory
>>>> that can hold the swapped out memory. But total_swap_pages contains the
>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>>> success even if there's no memory to hold the swapped out memory because
>>>
>>> s/success/succeed/
>>
>> OK. Thanks.
>>
>>>
>>>> total_swap_pages always greater than or equal to p->pages.
>>>>
>>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>>
>>> s/retracted/subtracted/
>>
>> OK. Thanks.
>>
>>>
>>>> first and then check whether there's enough memory for inuse swap pages.
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  mm/swapfile.c | 10 +++++++---
>>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>>> index ec4c1b276691..d2bead7b8b70 100644
>>>> --- a/mm/swapfile.c
>>>> +++ b/mm/swapfile.c
>>>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>>>  	struct filename *pathname;
>>>>  	int err, found = 0;
>>>>  	unsigned int old_block_size;
>>>> +	unsigned int inuse_pages;
>>>>  
>>>>  	if (!capable(CAP_SYS_ADMIN))
>>>>  		return -EPERM;
>>>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>>>  		spin_unlock(&swap_lock);
>>>>  		goto out_dput;
>>>>  	}
>>>> -	if (!security_vm_enough_memory_mm(current->mm, p->pages))
>>>> -		vm_unacct_memory(p->pages);
>>>> +
>>>> +	total_swap_pages -= p->pages;
>>>> +	inuse_pages = READ_ONCE(p->inuse_pages);
>>>> +	if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
>>>> +		vm_unacct_memory(inuse_pages);
>>>>  	else {
>>>> +		total_swap_pages += p->pages;
>>>
>>> That implies that whenever we fail in security_vm_enough_memory_mm(),
>>> that other concurrent users might see a wrong total_swap_pages.
>>>
>>> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use.
>>>
>>> Temporarily, we'd have
>>>
>>> CommitLimit    4 GiB
>>> Committed_AS  10 GiB
>>
>> IIUC, even if without this change, the other concurrent users if come after vm_acct_memory()
>> is done in __vm_enough_memory(), they might see
>>
>> CommitLimit   12 GiB (4 GiB memory + 8GiB total swap)
>> Committed_AS  18 GiB (10 GiB in use + 8GiB swap space to swapoff)
>>
>> Or am I miss something?
>>
> 
> I think you are right!
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks a lot!

> 
>
Huang, Ying June 20, 2022, 7:31 a.m. UTC | #5
Miaohe Lin <linmiaohe@huawei.com> writes:

> security_vm_enough_memory_mm() checks whether a process has enough memory
> to allocate a new virtual mapping. And total_swap_pages is considered as
> available memory while swapoff tries to make sure there's enough memory
> that can hold the swapped out memory. But total_swap_pages contains the
> swap space that is being swapoff. So security_vm_enough_memory_mm() will
> success even if there's no memory to hold the swapped out memory because
> total_swap_pages always greater than or equal to p->pages.

Per my understanding, swapoff will not allocate virtual mapping by
itself.  But after swapoff, the overcommit limit could be exceeded.
security_vm_enough_memory_mm() is used to check that.  For example, in a
system with 4GB memory and 8GB swap, and 10GB is in use,

CommitLimit:    4+8 = 12GB
Committed_AS:   10GB

security_vm_enough_memory_mm() in swapoff() will fail because
10+8 = 18 > 12.  This is expected because after swapoff, the overcommit
limit will be exceeded.

If 3GB is in use,

CommitLimit:    4+8 = 12GB
Committed_AS:   3GB

security_vm_enough_memory_mm() in swapoff() will succeed because
3+8 = 11 < 12.  This is expected because after swapoff, the overcommit
limit will not be exceeded.

So, what's the real problem of the original implementation?  Can you
show it with an example as above?

Best Regards,
Huang, Ying

> In order to fix it, p->pages should be retracted from total_swap_pages
> first and then check whether there's enough memory for inuse swap pages.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

[snip]
Miaohe Lin June 20, 2022, 12:12 p.m. UTC | #6
On 2022/6/20 15:31, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> security_vm_enough_memory_mm() checks whether a process has enough memory
>> to allocate a new virtual mapping. And total_swap_pages is considered as
>> available memory while swapoff tries to make sure there's enough memory
>> that can hold the swapped out memory. But total_swap_pages contains the
>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>> success even if there's no memory to hold the swapped out memory because
>> total_swap_pages always greater than or equal to p->pages.
> 
> Per my understanding, swapoff will not allocate virtual mapping by
> itself.  But after swapoff, the overcommit limit could be exceeded.
> security_vm_enough_memory_mm() is used to check that.  For example, in a
> system with 4GB memory and 8GB swap, and 10GB is in use,
> 
> CommitLimit:    4+8 = 12GB
> Committed_AS:   10GB
> 
> security_vm_enough_memory_mm() in swapoff() will fail because
> 10+8 = 18 > 12.  This is expected because after swapoff, the overcommit
> limit will be exceeded.
> 
> If 3GB is in use,
> 
> CommitLimit:    4+8 = 12GB
> Committed_AS:   3GB
> 
> security_vm_enough_memory_mm() in swapoff() will succeed because
> 3+8 = 11 < 12.  This is expected because after swapoff, the overcommit
> limit will not be exceeded.

In OVERCOMMIT_NEVER scene, I think you're right.

> 
> So, what's the real problem of the original implementation?  Can you
> show it with an example as above?

In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
below case.

	if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
		if (pages > totalram_pages() + total_swap_pages)
			goto error;
		return 0;
	}

Or am I miss something?

> 
> Best Regards,
> Huang, Ying

Thanks!

> 
>> In order to fix it, p->pages should be retracted from total_swap_pages
>> first and then check whether there's enough memory for inuse swap pages.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> 
> [snip]
> 
> .
>
Huang, Ying June 21, 2022, 1:35 a.m. UTC | #7
Miaohe Lin <linmiaohe@huawei.com> writes:

> On 2022/6/20 15:31, Huang, Ying wrote:
>> Miaohe Lin <linmiaohe@huawei.com> writes:
>> 
>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>> available memory while swapoff tries to make sure there's enough memory
>>> that can hold the swapped out memory. But total_swap_pages contains the
>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>> success even if there's no memory to hold the swapped out memory because
>>> total_swap_pages always greater than or equal to p->pages.
>> 
>> Per my understanding, swapoff will not allocate virtual mapping by
>> itself.  But after swapoff, the overcommit limit could be exceeded.
>> security_vm_enough_memory_mm() is used to check that.  For example, in a
>> system with 4GB memory and 8GB swap, and 10GB is in use,
>> 
>> CommitLimit:    4+8 = 12GB
>> Committed_AS:   10GB
>> 
>> security_vm_enough_memory_mm() in swapoff() will fail because
>> 10+8 = 18 > 12.  This is expected because after swapoff, the overcommit
>> limit will be exceeded.
>> 
>> If 3GB is in use,
>> 
>> CommitLimit:    4+8 = 12GB
>> Committed_AS:   3GB
>> 
>> security_vm_enough_memory_mm() in swapoff() will succeed because
>> 3+8 = 11 < 12.  This is expected because after swapoff, the overcommit
>> limit will not be exceeded.
>
> In OVERCOMMIT_NEVER scene, I think you're right.
>
>> 
>> So, what's the real problem of the original implementation?  Can you
>> show it with an example as above?
>
> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
> below case.
>
> 	if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
> 		if (pages > totalram_pages() + total_swap_pages)
> 			goto error;
> 		return 0;
> 	}
>
> Or am I miss something?

Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages
isn't checked at all.  The only restriction is that the size of the
virtual mapping created should be less than total RAM + total swap
pages.  Because swapoff() will not create virtual mapping, so it's
expected that security_vm_enough_memory_mm() in swapoff() always
succeeds.

Best Regards,
Huang, Ying

>
> Thanks!
>
>> 
>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>> first and then check whether there's enough memory for inuse swap pages.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> 
>> [snip]
>> 
>> .
>>
Miaohe Lin June 21, 2022, 7:37 a.m. UTC | #8
On 2022/6/21 9:35, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> On 2022/6/20 15:31, Huang, Ying wrote:
>>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>>
>>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>>> available memory while swapoff tries to make sure there's enough memory
>>>> that can hold the swapped out memory. But total_swap_pages contains the
>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>>> success even if there's no memory to hold the swapped out memory because
>>>> total_swap_pages always greater than or equal to p->pages.
>>>
>>> Per my understanding, swapoff will not allocate virtual mapping by
>>> itself.  But after swapoff, the overcommit limit could be exceeded.
>>> security_vm_enough_memory_mm() is used to check that.  For example, in a
>>> system with 4GB memory and 8GB swap, and 10GB is in use,
>>>
>>> CommitLimit:    4+8 = 12GB
>>> Committed_AS:   10GB
>>>
>>> security_vm_enough_memory_mm() in swapoff() will fail because
>>> 10+8 = 18 > 12.  This is expected because after swapoff, the overcommit
>>> limit will be exceeded.
>>>
>>> If 3GB is in use,
>>>
>>> CommitLimit:    4+8 = 12GB
>>> Committed_AS:   3GB
>>>
>>> security_vm_enough_memory_mm() in swapoff() will succeed because
>>> 3+8 = 11 < 12.  This is expected because after swapoff, the overcommit
>>> limit will not be exceeded.
>>
>> In OVERCOMMIT_NEVER scene, I think you're right.
>>
>>>
>>> So, what's the real problem of the original implementation?  Can you
>>> show it with an example as above?
>>
>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
>> below case.
>>
>> 	if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
>> 		if (pages > totalram_pages() + total_swap_pages)
>> 			goto error;
>> 		return 0;
>> 	}
>>
>> Or am I miss something?
> 
> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages
> isn't checked at all.  The only restriction is that the size of the
> virtual mapping created should be less than total RAM + total swap

Do you mean the only restriction is that the size of the virtual mapping
*created every time* should be less than total RAM + total swap pages but
*total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so,
the current behavior should be sane and I will drop this patch.

Thanks!

> pages.  Because swapoff() will not create virtual mapping, so it's
> expected that security_vm_enough_memory_mm() in swapoff() always
> succeeds.
> 
> Best Regards,
> Huang, Ying
> 
>>
>> Thanks!
>>
>>>
>>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>>> first and then check whether there's enough memory for inuse swap pages.
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>
>>> [snip]
>>>
>>> .
>>>
> 
> .
>
Huang, Ying June 21, 2022, 7:42 a.m. UTC | #9
Miaohe Lin <linmiaohe@huawei.com> writes:

> On 2022/6/21 9:35, Huang, Ying wrote:
>> Miaohe Lin <linmiaohe@huawei.com> writes:
>> 
>>> On 2022/6/20 15:31, Huang, Ying wrote:
>>>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>>>
>>>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>>>> available memory while swapoff tries to make sure there's enough memory
>>>>> that can hold the swapped out memory. But total_swap_pages contains the
>>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>>>> success even if there's no memory to hold the swapped out memory because
>>>>> total_swap_pages always greater than or equal to p->pages.
>>>>
>>>> Per my understanding, swapoff will not allocate virtual mapping by
>>>> itself.  But after swapoff, the overcommit limit could be exceeded.
>>>> security_vm_enough_memory_mm() is used to check that.  For example, in a
>>>> system with 4GB memory and 8GB swap, and 10GB is in use,
>>>>
>>>> CommitLimit:    4+8 = 12GB
>>>> Committed_AS:   10GB
>>>>
>>>> security_vm_enough_memory_mm() in swapoff() will fail because
>>>> 10+8 = 18 > 12.  This is expected because after swapoff, the overcommit
>>>> limit will be exceeded.
>>>>
>>>> If 3GB is in use,
>>>>
>>>> CommitLimit:    4+8 = 12GB
>>>> Committed_AS:   3GB
>>>>
>>>> security_vm_enough_memory_mm() in swapoff() will succeed because
>>>> 3+8 = 11 < 12.  This is expected because after swapoff, the overcommit
>>>> limit will not be exceeded.
>>>
>>> In OVERCOMMIT_NEVER scene, I think you're right.
>>>
>>>>
>>>> So, what's the real problem of the original implementation?  Can you
>>>> show it with an example as above?
>>>
>>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
>>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
>>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
>>> below case.
>>>
>>> 	if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
>>> 		if (pages > totalram_pages() + total_swap_pages)
>>> 			goto error;
>>> 		return 0;
>>> 	}
>>>
>>> Or am I miss something?
>> 
>> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages
>> isn't checked at all.  The only restriction is that the size of the
>> virtual mapping created should be less than total RAM + total swap
>
> Do you mean the only restriction is that the size of the virtual mapping
> *created every time* should be less than total RAM + total swap pages but
> *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so,
> the current behavior should be sane and I will drop this patch.

Yes.  This is my understanding.

Best Regards,
Huang, Ying

> Thanks!
>
>> pages.  Because swapoff() will not create virtual mapping, so it's
>> expected that security_vm_enough_memory_mm() in swapoff() always
>> succeeds.
>> 
>> Best Regards,
>> Huang, Ying
>> 
>>>
>>> Thanks!
>>>
>>>>
>>>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>>>> first and then check whether there's enough memory for inuse swap pages.
>>>>>
>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>
>>>> [snip]
>>>>
>>>> .
>>>>
>> 
>> .
>>
Miaohe Lin June 21, 2022, 8:20 a.m. UTC | #10
On 2022/6/21 15:42, Huang, Ying wrote:
> Miaohe Lin <linmiaohe@huawei.com> writes:
> 
>> On 2022/6/21 9:35, Huang, Ying wrote:
>>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>>
>>>> On 2022/6/20 15:31, Huang, Ying wrote:
>>>>> Miaohe Lin <linmiaohe@huawei.com> writes:
>>>>>
>>>>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>>>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>>>>> available memory while swapoff tries to make sure there's enough memory
>>>>>> that can hold the swapped out memory. But total_swap_pages contains the
>>>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>>>>> success even if there's no memory to hold the swapped out memory because
>>>>>> total_swap_pages always greater than or equal to p->pages.
>>>>>
>>>>> Per my understanding, swapoff will not allocate virtual mapping by
>>>>> itself.  But after swapoff, the overcommit limit could be exceeded.
>>>>> security_vm_enough_memory_mm() is used to check that.  For example, in a
>>>>> system with 4GB memory and 8GB swap, and 10GB is in use,
>>>>>
>>>>> CommitLimit:    4+8 = 12GB
>>>>> Committed_AS:   10GB
>>>>>
>>>>> security_vm_enough_memory_mm() in swapoff() will fail because
>>>>> 10+8 = 18 > 12.  This is expected because after swapoff, the overcommit
>>>>> limit will be exceeded.
>>>>>
>>>>> If 3GB is in use,
>>>>>
>>>>> CommitLimit:    4+8 = 12GB
>>>>> Committed_AS:   3GB
>>>>>
>>>>> security_vm_enough_memory_mm() in swapoff() will succeed because
>>>>> 3+8 = 11 < 12.  This is expected because after swapoff, the overcommit
>>>>> limit will not be exceeded.
>>>>
>>>> In OVERCOMMIT_NEVER scene, I think you're right.
>>>>
>>>>>
>>>>> So, what's the real problem of the original implementation?  Can you
>>>>> show it with an example as above?
>>>>
>>>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
>>>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
>>>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
>>>> below case.
>>>>
>>>> 	if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
>>>> 		if (pages > totalram_pages() + total_swap_pages)
>>>> 			goto error;
>>>> 		return 0;
>>>> 	}
>>>>
>>>> Or am I miss something?
>>>
>>> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages
>>> isn't checked at all.  The only restriction is that the size of the
>>> virtual mapping created should be less than total RAM + total swap
>>
>> Do you mean the only restriction is that the size of the virtual mapping
>> *created every time* should be less than total RAM + total swap pages but
>> *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so,
>> the current behavior should be sane and I will drop this patch.
> 
> Yes.  This is my understanding.

I see. Thank you.

> 
> Best Regards,
> Huang, Ying
> 
>> Thanks!
>>
>>> pages.  Because swapoff() will not create virtual mapping, so it's
>>> expected that security_vm_enough_memory_mm() in swapoff() always
>>> succeeds.
>>>
>>> Best Regards,
>>> Huang, Ying
>>>
>>>>
>>>> Thanks!
>>>>
>>>>>
>>>>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>>>>> first and then check whether there's enough memory for inuse swap pages.
>>>>>>
>>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>>
>>>>> [snip]
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> .
>
diff mbox series

Patch

diff --git a/mm/swapfile.c b/mm/swapfile.c
index ec4c1b276691..d2bead7b8b70 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2398,6 +2398,7 @@  SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	struct filename *pathname;
 	int err, found = 0;
 	unsigned int old_block_size;
+	unsigned int inuse_pages;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -2428,9 +2429,13 @@  SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 		spin_unlock(&swap_lock);
 		goto out_dput;
 	}
-	if (!security_vm_enough_memory_mm(current->mm, p->pages))
-		vm_unacct_memory(p->pages);
+
+	total_swap_pages -= p->pages;
+	inuse_pages = READ_ONCE(p->inuse_pages);
+	if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
+		vm_unacct_memory(inuse_pages);
 	else {
+		total_swap_pages += p->pages;
 		err = -ENOMEM;
 		spin_unlock(&swap_lock);
 		goto out_dput;
@@ -2453,7 +2458,6 @@  SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	}
 	plist_del(&p->list, &swap_active_head);
 	atomic_long_sub(p->pages, &nr_swap_pages);
-	total_swap_pages -= p->pages;
 	p->flags &= ~SWP_WRITEOK;
 	spin_unlock(&p->lock);
 	spin_unlock(&swap_lock);