diff mbox series

[v2] mm: shrinker: avoid memleak in alloc_shrinker_info

Message ID 20241014032336.482088-1-chenridong@huaweicloud.com (mailing list archive)
State New
Headers show
Series [v2] mm: shrinker: avoid memleak in alloc_shrinker_info | expand

Commit Message

Chen Ridong Oct. 14, 2024, 3:23 a.m. UTC
From: Chen Ridong <chenridong@huawei.com>

A memleak was found as bellow:

unreferenced object 0xffff8881010d2a80 (size 32):
  comm "mkdir", pid 1559, jiffies 4294932666
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
  backtrace (crc 2e7ef6fa):
    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
    [<ffffffff81198dd9>] online_css+0x29/0xa0
    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

In the alloc_shrinker_info function, when shrinker_unit_alloc return
err, the info won't be freed. Just fix it.

Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 mm/shrinker.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Muchun Song Oct. 14, 2024, 6:25 a.m. UTC | #1
> On Oct 14, 2024, at 11:23, Chen Ridong <chenridong@huaweicloud.com> wrote:
> 
> From: Chen Ridong <chenridong@huawei.com>
> 
> A memleak was found as bellow:
> 
> unreferenced object 0xffff8881010d2a80 (size 32):
>  comm "mkdir", pid 1559, jiffies 4294932666
>  hex dump (first 32 bytes):
>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>  backtrace (crc 2e7ef6fa):
>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>    [<ffffffff81198dd9>] online_css+0x29/0xa0
>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> In the alloc_shrinker_info function, when shrinker_unit_alloc return
> err, the info won't be freed. Just fix it.
> 
> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> mm/shrinker.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index dc5d2a6fcfc4..92270413190d 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
> 
> err:
> 	mutex_unlock(&shrinker_mutex);
> + 	kvfree(info);

Please transpose kvfree() to the place above mutex_unlock().
We should release the resources in the order which is inverse
to the that of acquiring resources.

Thanks.

> 	free_shrinker_info(memcg);
> 	return -ENOMEM;
> }
> -- 
> 2.34.1
>
Anshuman Khandual Oct. 14, 2024, 8:13 a.m. UTC | #2
On 10/14/24 08:53, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> A memleak was found as bellow:
> 
> unreferenced object 0xffff8881010d2a80 (size 32):
>   comm "mkdir", pid 1559, jiffies 4294932666
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>   backtrace (crc 2e7ef6fa):
>     [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>     [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>     [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>     [<ffffffff81198dd9>] online_css+0x29/0xa0
>     [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>     [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>     [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>     [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>     [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>     [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>     [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>     [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> In the alloc_shrinker_info function, when shrinker_unit_alloc return
> err, the info won't be freed. Just fix it.
> 
> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  mm/shrinker.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index dc5d2a6fcfc4..92270413190d 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>  
>  err:
>  	mutex_unlock(&shrinker_mutex);
> +	kvfree(info);
>  	free_shrinker_info(memcg);
>  	return -ENOMEM;
>  }

There are two scenarios when "goto err:" gets called

- When shrinker_info allocations fails, no kvfree() is required
	- but after this change kvfree() would be called even
	  when the allocation had failed originally, which does
	  not sound right

- shrinker_unit_alloc() fails, kvfree() is actually required

I guess kvfree() should be called just after shrinker_unit_alloc()
fails but before calling into "goto err".

But curious, should not both kvzalloc_node()/kvfree() be avoided
while inside mutex lock to avoid possible lockdep issues ?
Muchun Song Oct. 14, 2024, 8:43 a.m. UTC | #3
> On Oct 14, 2024, at 16:13, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
> 
> 
> On 10/14/24 08:53, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>> 
>> A memleak was found as bellow:
>> 
>> unreferenced object 0xffff8881010d2a80 (size 32):
>>  comm "mkdir", pid 1559, jiffies 4294932666
>>  hex dump (first 32 bytes):
>>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>  backtrace (crc 2e7ef6fa):
>>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>    [<ffffffff81198dd9>] online_css+0x29/0xa0
>>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> 
>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>> err, the info won't be freed. Just fix it.
>> 
>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> mm/shrinker.c | 1 +
>> 1 file changed, 1 insertion(+)
>> 
>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>> index dc5d2a6fcfc4..92270413190d 100644
>> --- a/mm/shrinker.c
>> +++ b/mm/shrinker.c
>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>> 
>> err:
>> 	mutex_unlock(&shrinker_mutex);
>> + 	kvfree(info);
>> 	free_shrinker_info(memcg);
>> 	return -ENOMEM;
>> }
> 
> There are two scenarios when "goto err:" gets called
> 
> - When shrinker_info allocations fails, no kvfree() is required
> 	- but after this change kvfree() would be called even
> 	  when the allocation had failed originally, which does
>  	  not sound right

Yes. In this case, @info is NULL and kvfree could handle NULL.
It seems strange but the final behaviour correct.

> 
> - shrinker_unit_alloc() fails, kvfree() is actually required
> 
> I guess kvfree() should be called just after shrinker_unit_alloc()
> fails but before calling into "goto err".

We could do it like this, which avoids ambiguity (if someone ignores
that kvfree could handle NULL). Something like:

--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
                        goto err;
                info->map_nr_max = shrinker_nr_max;
                if (shrinker_unit_alloc(info, NULL, nid))
-                       goto err;
+                       goto free;
                rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
        }
        mutex_unlock(&shrinker_mutex);

        return ret;
-
+free:
+       kvfree(info);
 err:
        mutex_unlock(&shrinker_mutex);
        free_shrinker_info(memcg);

Thanks.

> 
> But curious, should not both kvzalloc_node()/kvfree() be avoided
> while inside mutex lock to avoid possible lockdep issues ?
chenridong Oct. 14, 2024, 9:04 a.m. UTC | #4
On 2024/10/14 16:43, Muchun Song wrote:
> 
> 
>> On Oct 14, 2024, at 16:13, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>
>>
>>
>> On 10/14/24 08:53, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> A memleak was found as bellow:
>>>
>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>   comm "mkdir", pid 1559, jiffies 4294932666
>>>   hex dump (first 32 bytes):
>>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>     40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>   backtrace (crc 2e7ef6fa):
>>>     [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>     [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>     [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>     [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>     [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>     [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>     [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>     [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>     [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>     [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>     [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>     [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>> err, the info won't be freed. Just fix it.
>>>
>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>> ---
>>> mm/shrinker.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>> index dc5d2a6fcfc4..92270413190d 100644
>>> --- a/mm/shrinker.c
>>> +++ b/mm/shrinker.c
>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>
>>> err:
>>> 	mutex_unlock(&shrinker_mutex);
>>> + 	kvfree(info);
>>> 	free_shrinker_info(memcg);
>>> 	return -ENOMEM;
>>> }
>>
>> There are two scenarios when "goto err:" gets called
>>
>> - When shrinker_info allocations fails, no kvfree() is required
>> 	- but after this change kvfree() would be called even
>> 	  when the allocation had failed originally, which does
>>   	  not sound right
> 
> Yes. In this case, @info is NULL and kvfree could handle NULL.
> It seems strange but the final behaviour correct.
> 
>>
>> - shrinker_unit_alloc() fails, kvfree() is actually required
>>
>> I guess kvfree() should be called just after shrinker_unit_alloc()
>> fails but before calling into "goto err".
> 
> We could do it like this, which avoids ambiguity (if someone ignores
> that kvfree could handle NULL). Something like:
> 
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>                          goto err;
>                  info->map_nr_max = shrinker_nr_max;
>                  if (shrinker_unit_alloc(info, NULL, nid))
> -                       goto err;
> +                       goto free;
>                  rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>          }
>          mutex_unlock(&shrinker_mutex);
> 
>          return ret;
> -
> +free:
> +       kvfree(info);
>   err:
>          mutex_unlock(&shrinker_mutex);
>          free_shrinker_info(memcg);
> 
> Thanks.
> 
>>
>> But curious, should not both kvzalloc_node()/kvfree() be avoided
>> while inside mutex lock to avoid possible lockdep issues ?
> 
How about:

diff --git a/mm/shrinker.c b/mm/shrinker.c
index dc5d2a6fcfc4..7baee7f00497 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -87,9 +87,9 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
                  if (!info)
                          goto err;
                  info->map_nr_max = shrinker_nr_max;
+               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, 
info);
                  if (shrinker_unit_alloc(info, NULL, nid))
                          goto err;
-               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, 
info);
          }
          mutex_unlock(&shrinker_mutex);

I think this is concise.

Best regards,
Ridong
Muchun Song Oct. 14, 2024, 9:20 a.m. UTC | #5
> On Oct 14, 2024, at 17:04, chenridong <chenridong@huawei.com> wrote:
> 
> 
> 
> On 2024/10/14 16:43, Muchun Song wrote:
>>> On Oct 14, 2024, at 16:13, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>> 
>>> 
>>> 
>>> On 10/14/24 08:53, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>> 
>>>> A memleak was found as bellow:
>>>> 
>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>  comm "mkdir", pid 1559, jiffies 4294932666
>>>>  hex dump (first 32 bytes):
>>>>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>  backtrace (crc 2e7ef6fa):
>>>>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>    [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>> 
>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>> err, the info won't be freed. Just fix it.
>>>> 
>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>> ---
>>>> mm/shrinker.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>> 
>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>> --- a/mm/shrinker.c
>>>> +++ b/mm/shrinker.c
>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>> 
>>>> err:
>>>> mutex_unlock(&shrinker_mutex);
>>>> + kvfree(info);
>>>> free_shrinker_info(memcg);
>>>> return -ENOMEM;
>>>> }
>>> 
>>> There are two scenarios when "goto err:" gets called
>>> 
>>> - When shrinker_info allocations fails, no kvfree() is required
>>> - but after this change kvfree() would be called even
>>>  when the allocation had failed originally, which does
>>>    not sound right
>> Yes. In this case, @info is NULL and kvfree could handle NULL.
>> It seems strange but the final behaviour correct.
>>> 
>>> - shrinker_unit_alloc() fails, kvfree() is actually required
>>> 
>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>> fails but before calling into "goto err".
>> We could do it like this, which avoids ambiguity (if someone ignores
>> that kvfree could handle NULL). Something like:
>> --- a/mm/shrinker.c
>> +++ b/mm/shrinker.c
>> @@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>                         goto err;
>>                 info->map_nr_max = shrinker_nr_max;
>>                 if (shrinker_unit_alloc(info, NULL, nid))
>> -                       goto err;
>> +                       goto free;
>>                 rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>         }
>>         mutex_unlock(&shrinker_mutex);
>>         return ret;
>> -
>> +free:
>> +       kvfree(info);
>>  err:
>>         mutex_unlock(&shrinker_mutex);
>>         free_shrinker_info(memcg);
>> Thanks.
>>> 
>>> But curious, should not both kvzalloc_node()/kvfree() be avoided
>>> while inside mutex lock to avoid possible lockdep issues ?
> How about:
> 
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index dc5d2a6fcfc4..7baee7f00497 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -87,9 +87,9 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>                 if (!info)
>                         goto err;
>                 info->map_nr_max = shrinker_nr_max;
> +               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>                 if (shrinker_unit_alloc(info, NULL, nid))
>                         goto err;
> -               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>         }
>         mutex_unlock(&shrinker_mutex);

No. We should make sure the @info is fully initialized before others
could see it. That's why rcu_assign_pointer is used here.

> 
> I think this is concise.
> 
> Best regards,
> Ridong
chenridong Oct. 14, 2024, 9:38 a.m. UTC | #6
On 2024/10/14 17:20, Muchun Song wrote:
> 
> 
>> On Oct 14, 2024, at 17:04, chenridong <chenridong@huawei.com> wrote:
>>
>>
>>
>> On 2024/10/14 16:43, Muchun Song wrote:
>>>> On Oct 14, 2024, at 16:13, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 10/14/24 08:53, Chen Ridong wrote:
>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>
>>>>> A memleak was found as bellow:
>>>>>
>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>>   comm "mkdir", pid 1559, jiffies 4294932666
>>>>>   hex dump (first 32 bytes):
>>>>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>     40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>>   backtrace (crc 2e7ef6fa):
>>>>>     [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>>     [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>>     [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>>     [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>>     [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>>     [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>>     [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>>     [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>>     [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>>     [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>>     [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>>     [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>
>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>>> err, the info won't be freed. Just fix it.
>>>>>
>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>> ---
>>>>> mm/shrinker.c | 1 +
>>>>> 1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>>> --- a/mm/shrinker.c
>>>>> +++ b/mm/shrinker.c
>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>
>>>>> err:
>>>>> mutex_unlock(&shrinker_mutex);
>>>>> + kvfree(info);
>>>>> free_shrinker_info(memcg);
>>>>> return -ENOMEM;
>>>>> }
>>>>
>>>> There are two scenarios when "goto err:" gets called
>>>>
>>>> - When shrinker_info allocations fails, no kvfree() is required
>>>> - but after this change kvfree() would be called even
>>>>   when the allocation had failed originally, which does
>>>>     not sound right
>>> Yes. In this case, @info is NULL and kvfree could handle NULL.
>>> It seems strange but the final behaviour correct.
>>>>
>>>> - shrinker_unit_alloc() fails, kvfree() is actually required
>>>>
>>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>>> fails but before calling into "goto err".
>>> We could do it like this, which avoids ambiguity (if someone ignores
>>> that kvfree could handle NULL). Something like:
>>> --- a/mm/shrinker.c
>>> +++ b/mm/shrinker.c
>>> @@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>                          goto err;
>>>                  info->map_nr_max = shrinker_nr_max;
>>>                  if (shrinker_unit_alloc(info, NULL, nid))
>>> -                       goto err;
>>> +                       goto free;
>>>                  rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>          }
>>>          mutex_unlock(&shrinker_mutex);
>>>          return ret;
>>> -
>>> +free:
>>> +       kvfree(info);
>>>   err:
>>>          mutex_unlock(&shrinker_mutex);
>>>          free_shrinker_info(memcg);
>>> Thanks.
>>>>
>>>> But curious, should not both kvzalloc_node()/kvfree() be avoided
>>>> while inside mutex lock to avoid possible lockdep issues ?
>> How about:
>>
>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>> index dc5d2a6fcfc4..7baee7f00497 100644
>> --- a/mm/shrinker.c
>> +++ b/mm/shrinker.c
>> @@ -87,9 +87,9 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>                  if (!info)
>>                          goto err;
>>                  info->map_nr_max = shrinker_nr_max;
>> +               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>                  if (shrinker_unit_alloc(info, NULL, nid))
>>                          goto err;
>> -               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>          }
>>          mutex_unlock(&shrinker_mutex);
> 
> No. We should make sure the @info is fully initialized before others
> could see it. That's why rcu_assign_pointer is used here.
> 

Thank you, it seems that 'goto free' is a better choice.
Will update.

Thanks,
Ridong
>>
>> I think this is concise.
>>
>> Best regards,
>> Ridong
>
Kirill A. Shutemov Oct. 14, 2024, 11:29 a.m. UTC | #7
On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> A memleak was found as bellow:
> 
> unreferenced object 0xffff8881010d2a80 (size 32):
>   comm "mkdir", pid 1559, jiffies 4294932666
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>   backtrace (crc 2e7ef6fa):
>     [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>     [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>     [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>     [<ffffffff81198dd9>] online_css+0x29/0xa0
>     [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>     [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>     [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>     [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>     [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>     [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>     [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>     [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> In the alloc_shrinker_info function, when shrinker_unit_alloc return
> err, the info won't be freed. Just fix it.
> 
> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  mm/shrinker.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index dc5d2a6fcfc4..92270413190d 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>  
>  err:
>  	mutex_unlock(&shrinker_mutex);
> +	kvfree(info);
>  	free_shrinker_info(memcg);
>  	return -ENOMEM;
>  }

NAK. If in the future there going to one more error case after
rcu_assign_pointer() we will end up with double free.

This should be safer:

diff --git a/mm/shrinker.c b/mm/shrinker.c
index dc5d2a6fcfc4..763fd556bc7d 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
 		if (!info)
 			goto err;
 		info->map_nr_max = shrinker_nr_max;
-		if (shrinker_unit_alloc(info, NULL, nid))
+		if (shrinker_unit_alloc(info, NULL, nid)) {
+			kvfree(info);
 			goto err;
+		}
 		rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
 	}
 	mutex_unlock(&shrinker_mutex);
chenridong Oct. 15, 2024, 1:13 a.m. UTC | #8
On 2024/10/14 19:29, Kirill A. Shutemov wrote:
> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> A memleak was found as bellow:
>>
>> unreferenced object 0xffff8881010d2a80 (size 32):
>>    comm "mkdir", pid 1559, jiffies 4294932666
>>    hex dump (first 32 bytes):
>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>      40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>    backtrace (crc 2e7ef6fa):
>>      [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>      [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>      [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>      [<ffffffff81198dd9>] online_css+0x29/0xa0
>>      [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>      [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>      [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>      [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>      [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>      [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>      [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>      [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>> err, the info won't be freed. Just fix it.
>>
>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>   mm/shrinker.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>> index dc5d2a6fcfc4..92270413190d 100644
>> --- a/mm/shrinker.c
>> +++ b/mm/shrinker.c
>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>   
>>   err:
>>   	mutex_unlock(&shrinker_mutex);
>> +	kvfree(info);
>>   	free_shrinker_info(memcg);
>>   	return -ENOMEM;
>>   }
> 
> NAK. If in the future there going to one more error case after
> rcu_assign_pointer() we will end up with double free.
> 
> This should be safer:
> 
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index dc5d2a6fcfc4..763fd556bc7d 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>   		if (!info)
>   			goto err;
>   		info->map_nr_max = shrinker_nr_max;
> -		if (shrinker_unit_alloc(info, NULL, nid))
> +		if (shrinker_unit_alloc(info, NULL, nid)) {
> +			kvfree(info);
>   			goto err;
> +		}
>   		rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>   	}
>   	mutex_unlock(&shrinker_mutex);

This is what I sent in v1.
Muchun thinks it's better to do the similar thing in the same way.

Thanks,
Ridong
Anshuman Khandual Oct. 15, 2024, 6:55 a.m. UTC | #9
On 10/14/24 16:59, Kirill A. Shutemov wrote:
> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> A memleak was found as bellow:
>>
>> unreferenced object 0xffff8881010d2a80 (size 32):
>>   comm "mkdir", pid 1559, jiffies 4294932666
>>   hex dump (first 32 bytes):
>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>     40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>   backtrace (crc 2e7ef6fa):
>>     [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>     [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>     [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>     [<ffffffff81198dd9>] online_css+0x29/0xa0
>>     [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>     [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>     [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>     [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>     [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>     [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>     [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>     [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>> err, the info won't be freed. Just fix it.
>>
>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>  mm/shrinker.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>> index dc5d2a6fcfc4..92270413190d 100644
>> --- a/mm/shrinker.c
>> +++ b/mm/shrinker.c
>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>  
>>  err:
>>  	mutex_unlock(&shrinker_mutex);
>> +	kvfree(info);
>>  	free_shrinker_info(memcg);
>>  	return -ENOMEM;
>>  }
> 
> NAK. If in the future there going to one more error case after
> rcu_assign_pointer() we will end up with double free.
> 
> This should be safer:
> 
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index dc5d2a6fcfc4..763fd556bc7d 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>  		if (!info)
>  			goto err;
>  		info->map_nr_max = shrinker_nr_max;
> -		if (shrinker_unit_alloc(info, NULL, nid))
> +		if (shrinker_unit_alloc(info, NULL, nid)) {
> +			kvfree(info);
>  			goto err;
> +		}
>  		rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>  	}
>  	mutex_unlock(&shrinker_mutex);

Agreed, this is what I mentioned earlier as well.

------------------------------------------------------------------
I guess kvfree() should be called just after shrinker_unit_alloc()
fails but before calling into "goto err"
------------------------------------------------------------------
chenridong Oct. 16, 2024, 1:25 a.m. UTC | #10
On 2024/10/15 14:55, Anshuman Khandual wrote:
> 
> 
> On 10/14/24 16:59, Kirill A. Shutemov wrote:
>> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> A memleak was found as bellow:
>>>
>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>    comm "mkdir", pid 1559, jiffies 4294932666
>>>    hex dump (first 32 bytes):
>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>      40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>    backtrace (crc 2e7ef6fa):
>>>      [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>      [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>      [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>      [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>      [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>      [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>      [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>      [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>      [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>      [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>      [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>      [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>> err, the info won't be freed. Just fix it.
>>>
>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>> ---
>>>   mm/shrinker.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>> index dc5d2a6fcfc4..92270413190d 100644
>>> --- a/mm/shrinker.c
>>> +++ b/mm/shrinker.c
>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>   
>>>   err:
>>>   	mutex_unlock(&shrinker_mutex);
>>> +	kvfree(info);
>>>   	free_shrinker_info(memcg);
>>>   	return -ENOMEM;
>>>   }
>>
>> NAK. If in the future there going to one more error case after
>> rcu_assign_pointer() we will end up with double free.
>>
>> This should be safer:
>>
>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>> index dc5d2a6fcfc4..763fd556bc7d 100644
>> --- a/mm/shrinker.c
>> +++ b/mm/shrinker.c
>> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>   		if (!info)
>>   			goto err;
>>   		info->map_nr_max = shrinker_nr_max;
>> -		if (shrinker_unit_alloc(info, NULL, nid))
>> +		if (shrinker_unit_alloc(info, NULL, nid)) {
>> +			kvfree(info);
>>   			goto err;
>> +		}
>>   		rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>   	}
>>   	mutex_unlock(&shrinker_mutex);
> 
> Agreed, this is what I mentioned earlier as well.
> 
> ------------------------------------------------------------------
> I guess kvfree() should be called just after shrinker_unit_alloc()
> fails but before calling into "goto err"
> ------------------------------------------------------------------
> 

After discussion, it seems that v1 is acceptable.
Hi, Muchun, do you have any other opinions?

Best regards,
Ridong
Muchun Song Oct. 16, 2024, 2:21 a.m. UTC | #11
> On Oct 16, 2024, at 09:25, chenridong <chenridong@huawei.com> wrote:
> 
> 
> 
> On 2024/10/15 14:55, Anshuman Khandual wrote:
>> On 10/14/24 16:59, Kirill A. Shutemov wrote:
>>> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>> 
>>>> A memleak was found as bellow:
>>>> 
>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>   comm "mkdir", pid 1559, jiffies 4294932666
>>>>   hex dump (first 32 bytes):
>>>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>     40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>   backtrace (crc 2e7ef6fa):
>>>>     [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>     [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>     [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>     [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>     [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>     [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>     [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>     [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>     [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>     [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>     [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>     [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>> 
>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>> err, the info won't be freed. Just fix it.
>>>> 
>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>> ---
>>>>  mm/shrinker.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>> 
>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>> --- a/mm/shrinker.c
>>>> +++ b/mm/shrinker.c
>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>    err:
>>>>   mutex_unlock(&shrinker_mutex);
>>>> + kvfree(info);
>>>>   free_shrinker_info(memcg);
>>>>   return -ENOMEM;
>>>>  }
>>> 
>>> NAK. If in the future there going to one more error case after
>>> rcu_assign_pointer() we will end up with double free.
>>> 
>>> This should be safer:
>>> 
>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>> index dc5d2a6fcfc4..763fd556bc7d 100644
>>> --- a/mm/shrinker.c
>>> +++ b/mm/shrinker.c
>>> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>   if (!info)
>>>   goto err;
>>>   info->map_nr_max = shrinker_nr_max;
>>> - if (shrinker_unit_alloc(info, NULL, nid))
>>> + if (shrinker_unit_alloc(info, NULL, nid)) {
>>> + kvfree(info);
>>>   goto err;
>>> + }
>>>   rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>   }
>>>   mutex_unlock(&shrinker_mutex);
>> Agreed, this is what I mentioned earlier as well.
>> ------------------------------------------------------------------
>> I guess kvfree() should be called just after shrinker_unit_alloc()
>> fails but before calling into "goto err"
>> ------------------------------------------------------------------
> 
> After discussion, it seems that v1 is acceptable.
> Hi, Muchun, do you have any other opinions?

I insist on my opinion, not mixing two different approaches
to do release resources.

Thanks.

> 
> Best regards,
> Ridong
Kirill A. Shutemov Oct. 16, 2024, 10:16 a.m. UTC | #12
On Wed, Oct 16, 2024 at 10:21:30AM +0800, Muchun Song wrote:
> 
> 
> > On Oct 16, 2024, at 09:25, chenridong <chenridong@huawei.com> wrote:
> > 
> > 
> > 
> > On 2024/10/15 14:55, Anshuman Khandual wrote:
> >> On 10/14/24 16:59, Kirill A. Shutemov wrote:
> >>> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
> >>>> From: Chen Ridong <chenridong@huawei.com>
> >>>> 
> >>>> A memleak was found as bellow:
> >>>> 
> >>>> unreferenced object 0xffff8881010d2a80 (size 32):
> >>>>   comm "mkdir", pid 1559, jiffies 4294932666
> >>>>   hex dump (first 32 bytes):
> >>>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>>     40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
> >>>>   backtrace (crc 2e7ef6fa):
> >>>>     [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
> >>>>     [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
> >>>>     [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
> >>>>     [<ffffffff81198dd9>] online_css+0x29/0xa0
> >>>>     [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
> >>>>     [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
> >>>>     [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
> >>>>     [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
> >>>>     [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
> >>>>     [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
> >>>>     [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
> >>>>     [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >>>> 
> >>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
> >>>> err, the info won't be freed. Just fix it.
> >>>> 
> >>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
> >>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >>>> ---
> >>>>  mm/shrinker.c | 1 +
> >>>>  1 file changed, 1 insertion(+)
> >>>> 
> >>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
> >>>> index dc5d2a6fcfc4..92270413190d 100644
> >>>> --- a/mm/shrinker.c
> >>>> +++ b/mm/shrinker.c
> >>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
> >>>>    err:
> >>>>   mutex_unlock(&shrinker_mutex);
> >>>> + kvfree(info);
> >>>>   free_shrinker_info(memcg);
> >>>>   return -ENOMEM;
> >>>>  }
> >>> 
> >>> NAK. If in the future there going to one more error case after
> >>> rcu_assign_pointer() we will end up with double free.
> >>> 
> >>> This should be safer:
> >>> 
> >>> diff --git a/mm/shrinker.c b/mm/shrinker.c
> >>> index dc5d2a6fcfc4..763fd556bc7d 100644
> >>> --- a/mm/shrinker.c
> >>> +++ b/mm/shrinker.c
> >>> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
> >>>   if (!info)
> >>>   goto err;
> >>>   info->map_nr_max = shrinker_nr_max;
> >>> - if (shrinker_unit_alloc(info, NULL, nid))
> >>> + if (shrinker_unit_alloc(info, NULL, nid)) {
> >>> + kvfree(info);
> >>>   goto err;
> >>> + }
> >>>   rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
> >>>   }
> >>>   mutex_unlock(&shrinker_mutex);
> >> Agreed, this is what I mentioned earlier as well.
> >> ------------------------------------------------------------------
> >> I guess kvfree() should be called just after shrinker_unit_alloc()
> >> fails but before calling into "goto err"
> >> ------------------------------------------------------------------
> > 
> > After discussion, it seems that v1 is acceptable.
> > Hi, Muchun, do you have any other opinions?
> 
> I insist on my opinion, not mixing two different approaches
> to do release resources.

It makes no sense.

This kvfree() is specifically to handle the case when 'info' is allocated,
but not yet assigned to ->shrinker_info. And 'err:' block handles all
other error cases. Putting kvfree() in 'err:' section is double-free
timebomb.
Vlastimil Babka Oct. 16, 2024, 11:43 a.m. UTC | #13
On 10/16/24 04:21, Muchun Song wrote:
> 
> 
>> On Oct 16, 2024, at 09:25, chenridong <chenridong@huawei.com> wrote:
>> 
>> 
>> 
>> On 2024/10/15 14:55, Anshuman Khandual wrote:
>>> On 10/14/24 16:59, Kirill A. Shutemov wrote:
>>>> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>> 
>>>>> A memleak was found as bellow:
>>>>> 
>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>>   comm "mkdir", pid 1559, jiffies 4294932666
>>>>>   hex dump (first 32 bytes):
>>>>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>     40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>>   backtrace (crc 2e7ef6fa):
>>>>>     [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>>     [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>>     [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>>     [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>>     [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>>     [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>>     [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>>     [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>>     [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>>     [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>>     [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>>     [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>> 
>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>>> err, the info won't be freed. Just fix it.
>>>>> 
>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>> ---
>>>>>  mm/shrinker.c | 1 +
>>>>>  1 file changed, 1 insertion(+)
>>>>> 
>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>>> --- a/mm/shrinker.c
>>>>> +++ b/mm/shrinker.c
>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>    err:
>>>>>   mutex_unlock(&shrinker_mutex);
>>>>> + kvfree(info);
>>>>>   free_shrinker_info(memcg);
>>>>>   return -ENOMEM;
>>>>>  }
>>>> 
>>>> NAK. If in the future there going to one more error case after
>>>> rcu_assign_pointer() we will end up with double free.
>>>> 
>>>> This should be safer:
>>>> 
>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>> index dc5d2a6fcfc4..763fd556bc7d 100644
>>>> --- a/mm/shrinker.c
>>>> +++ b/mm/shrinker.c
>>>> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>   if (!info)
>>>>   goto err;
>>>>   info->map_nr_max = shrinker_nr_max;
>>>> - if (shrinker_unit_alloc(info, NULL, nid))
>>>> + if (shrinker_unit_alloc(info, NULL, nid)) {
>>>> + kvfree(info);
>>>>   goto err;
>>>> + }
>>>>   rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>>   }
>>>>   mutex_unlock(&shrinker_mutex);
>>> Agreed, this is what I mentioned earlier as well.
>>> ------------------------------------------------------------------
>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>> fails but before calling into "goto err"
>>> ------------------------------------------------------------------
>> 
>> After discussion, it seems that v1 is acceptable.
>> Hi, Muchun, do you have any other opinions?
> 
> I insist on my opinion, not mixing two different approaches
> to do release resources.

So instead we mix the cleanup of the whole function with the cleanup of what
is effectively a per-iteration temporary variable?

The fact there was already a confusion in this thread about whether it's
safe and relies on kvfree(NULL) to be a no-op, should be a hint.

So no, I a gree with Kirill and others. Ideally the fix would also move the
declaration of info into the for loop to make its scope more obvious.

> Thanks.
> 
>> 
>> Best regards,
>> Ridong
> 
>
Vlastimil Babka Oct. 16, 2024, 12:13 p.m. UTC | #14
On 10/14/24 11:20, Muchun Song wrote:
> 
> 
>> On Oct 14, 2024, at 17:04, chenridong <chenridong@huawei.com> wrote:
>> 
>> 
>> 
>> On 2024/10/14 16:43, Muchun Song wrote:
>>>> On Oct 14, 2024, at 16:13, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>> 
>>>> 
>>>> 
>>>> On 10/14/24 08:53, Chen Ridong wrote:
>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>> 
>>>>> A memleak was found as bellow:
>>>>> 
>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>>  comm "mkdir", pid 1559, jiffies 4294932666
>>>>>  hex dump (first 32 bytes):
>>>>>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>>  backtrace (crc 2e7ef6fa):
>>>>>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>>    [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>> 
>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>>> err, the info won't be freed. Just fix it.
>>>>> 
>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>> ---
>>>>> mm/shrinker.c | 1 +
>>>>> 1 file changed, 1 insertion(+)
>>>>> 
>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>>> --- a/mm/shrinker.c
>>>>> +++ b/mm/shrinker.c
>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>> 
>>>>> err:
>>>>> mutex_unlock(&shrinker_mutex);
>>>>> + kvfree(info);
>>>>> free_shrinker_info(memcg);
>>>>> return -ENOMEM;
>>>>> }
>>>> 
>>>> There are two scenarios when "goto err:" gets called
>>>> 
>>>> - When shrinker_info allocations fails, no kvfree() is required
>>>> - but after this change kvfree() would be called even
>>>>  when the allocation had failed originally, which does
>>>>    not sound right
>>> Yes. In this case, @info is NULL and kvfree could handle NULL.
>>> It seems strange but the final behaviour correct.
>>>> 
>>>> - shrinker_unit_alloc() fails, kvfree() is actually required
>>>> 
>>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>>> fails but before calling into "goto err".
>>> We could do it like this, which avoids ambiguity (if someone ignores
>>> that kvfree could handle NULL). Something like:
>>> --- a/mm/shrinker.c
>>> +++ b/mm/shrinker.c
>>> @@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>                         goto err;
>>>                 info->map_nr_max = shrinker_nr_max;
>>>                 if (shrinker_unit_alloc(info, NULL, nid))
>>> -                       goto err;
>>> +                       goto free;
>>>                 rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>         }
>>>         mutex_unlock(&shrinker_mutex);
>>>         return ret;
>>> -
>>> +free:
>>> +       kvfree(info);
>>>  err:
>>>         mutex_unlock(&shrinker_mutex);
>>>         free_shrinker_info(memcg);
>>> Thanks.
>>>> 
>>>> But curious, should not both kvzalloc_node()/kvfree() be avoided
>>>> while inside mutex lock to avoid possible lockdep issues ?
>> How about:
>> 
>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>> index dc5d2a6fcfc4..7baee7f00497 100644
>> --- a/mm/shrinker.c
>> +++ b/mm/shrinker.c
>> @@ -87,9 +87,9 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>                 if (!info)
>>                         goto err;
>>                 info->map_nr_max = shrinker_nr_max;
>> +               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>                 if (shrinker_unit_alloc(info, NULL, nid))
>>                         goto err;
>> -               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>         }
>>         mutex_unlock(&shrinker_mutex);
> 
> No. We should make sure the @info is fully initialized before others
> could see it. That's why rcu_assign_pointer is used here.

If the info is immediately visible, is the failure cleanup
free_shrinker_info() safe? It uses kvfree(info) and not kvfree_rcu(), and
shrinker_unit_free() is also doing kfree().

>> 
>> I think this is concise.
>> 
>> Best regards,
>> Ridong
> 
>
Muchun Song Oct. 16, 2024, 1:37 p.m. UTC | #15
> On Oct 16, 2024, at 18:17, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> On Wed, Oct 16, 2024 at 10:21:30AM +0800, Muchun Song wrote:
>> 
>> 
>>>> On Oct 16, 2024, at 09:25, chenridong <chenridong@huawei.com> wrote:
>>> 
>>> 
>>> 
>>> On 2024/10/15 14:55, Anshuman Khandual wrote:
>>>> On 10/14/24 16:59, Kirill A. Shutemov wrote:
>>>>> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>> 
>>>>>> A memleak was found as bellow:
>>>>>> 
>>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>>>  comm "mkdir", pid 1559, jiffies 4294932666
>>>>>>  hex dump (first 32 bytes):
>>>>>>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>>>  backtrace (crc 2e7ef6fa):
>>>>>>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>>>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>>>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>>>    [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>>>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>>>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>>>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>>>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>>>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>>>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>>>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>>>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>> 
>>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>>>> err, the info won't be freed. Just fix it.
>>>>>> 
>>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>>> ---
>>>>>> mm/shrinker.c | 1 +
>>>>>> 1 file changed, 1 insertion(+)
>>>>>> 
>>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>>>> --- a/mm/shrinker.c
>>>>>> +++ b/mm/shrinker.c
>>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>>   err:
>>>>>>  mutex_unlock(&shrinker_mutex);
>>>>>> + kvfree(info);
>>>>>>  free_shrinker_info(memcg);
>>>>>>  return -ENOMEM;
>>>>>> }
>>>>> 
>>>>> NAK. If in the future there going to one more error case after
>>>>> rcu_assign_pointer() we will end up with double free.
>>>>> 
>>>>> This should be safer:
>>>>> 
>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>> index dc5d2a6fcfc4..763fd556bc7d 100644
>>>>> --- a/mm/shrinker.c
>>>>> +++ b/mm/shrinker.c
>>>>> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>  if (!info)
>>>>>  goto err;
>>>>>  info->map_nr_max = shrinker_nr_max;
>>>>> - if (shrinker_unit_alloc(info, NULL, nid))
>>>>> + if (shrinker_unit_alloc(info, NULL, nid)) {
>>>>> + kvfree(info);
>>>>>  goto err;
>>>>> + }
>>>>>  rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>>>  }
>>>>>  mutex_unlock(&shrinker_mutex);
>>>> Agreed, this is what I mentioned earlier as well.
>>>> ------------------------------------------------------------------
>>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>>> fails but before calling into "goto err"
>>>> ------------------------------------------------------------------
>>> 
>>> After discussion, it seems that v1 is acceptable.
>>> Hi, Muchun, do you have any other opinions?
>> 
>> I insist on my opinion, not mixing two different approaches
>> to do release resources.
> 
> It makes no sense.
> 
> This kvfree() is specifically to handle the case when 'info' is allocated,
> but not yet assigned to ->shrinker_info. And 'err:' block handles all
> other error cases. Putting kvfree() in 'err:' section is double-free
> timebomb.

Please carefully read my previous reply. I don’t think there is any double-free. FYI, I pasted my previous suggested fix here. Thanks.

--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
                       goto err;
               info->map_nr_max = shrinker_nr_max;
               if (shrinker_unit_alloc(info, NULL, nid))
-                       goto err;
+                       goto free;
               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
       }
       mutex_unlock(&shrinker_mutex);

       return ret;
-
+free:
+       kvfree(info);
err:
       mutex_unlock(&shrinker_mutex);
       free_shrinker_info(memcg);

> 
> --
>  Kiryl Shutsemau / Kirill A. Shutemov
Muchun Song Oct. 16, 2024, 2:08 p.m. UTC | #16
> On Oct 16, 2024, at 19:43, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 10/16/24 04:21, Muchun Song wrote:
>> 
>> 
>>> On Oct 16, 2024, at 09:25, chenridong <chenridong@huawei.com> wrote:
>>> On 2024/10/15 14:55, Anshuman Khandual wrote:
>>>> On 10/14/24 16:59, Kirill A. Shutemov wrote:
>>>>> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>> A memleak was found as bellow:
>>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>>>  comm "mkdir", pid 1559, jiffies 4294932666
>>>>>>  hex dump (first 32 bytes):
>>>>>>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>>>  backtrace (crc 2e7ef6fa):
>>>>>>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>>>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>>>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>>>    [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>>>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>>>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>>>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>>>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>>>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>>>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>>>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>>>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>>>> err, the info won't be freed. Just fix it.
>>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>>> ---
>>>>>> mm/shrinker.c | 1 +
>>>>>> 1 file changed, 1 insertion(+)
>>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>>>> --- a/mm/shrinker.c
>>>>>> +++ b/mm/shrinker.c
>>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>>   err:
>>>>>>  mutex_unlock(&shrinker_mutex);
>>>>>> + kvfree(info);
>>>>>>  free_shrinker_info(memcg);
>>>>>>  return -ENOMEM;
>>>>>> }
>>>>> NAK. If in the future there going to one more error case after
>>>>> rcu_assign_pointer() we will end up with double free.
>>>>> This should be safer:
>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>> index dc5d2a6fcfc4..763fd556bc7d 100644
>>>>> --- a/mm/shrinker.c
>>>>> +++ b/mm/shrinker.c
>>>>> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>  if (!info)
>>>>>  goto err;
>>>>>  info->map_nr_max = shrinker_nr_max;
>>>>> - if (shrinker_unit_alloc(info, NULL, nid))
>>>>> + if (shrinker_unit_alloc(info, NULL, nid)) {
>>>>> + kvfree(info);
>>>>>  goto err;
>>>>> + }
>>>>>  rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>>>  }
>>>>>  mutex_unlock(&shrinker_mutex);
>>>> Agreed, this is what I mentioned earlier as well.
>>>> ------------------------------------------------------------------
>>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>>> fails but before calling into "goto err"
>>>> ------------------------------------------------------------------
>>> After discussion, it seems that v1 is acceptable.
>>> Hi, Muchun, do you have any other opinions?
>> 
>> I insist on my opinion, not mixing two different approaches
>> to do release resources.
> 
> So instead we mix the cleanup of the whole function with the cleanup of what
> is effectively a per-iteration temporary variable?
> 
> The fact there was already a confusion in this thread about whether it's
> safe and relies on kvfree(NULL) to be a no-op, should be a hint.

Yes. I think someone is confused about my opinion.
I don’t care about whether we should apply this hit.
If we think the hint is tricky, we could add another
label to fix it like I suggested previously. Because
we already use goto-based approaches to
cleanup the resources, why not keeping
consistent? It will be easier for us to add a new
"if" statement and handle the failure case in the future.
For example, if we use his v1 proposal, we should do
the cleanups again for info. But for goto-based
version, we just add another label to do the
cleanups and go to the new label for failure case. goto-based fix is what I insisted on. I copied my previous suggested fix here to clarify my opinion.

--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
                      goto err;
              info->map_nr_max = shrinker_nr_max;
              if (shrinker_unit_alloc(info, NULL, nid))
-                       goto err;
+                       goto free;
              rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
      }
      mutex_unlock(&shrinker_mutex);

      return ret;
-
+free:
+       kvfree(info);
err:
      mutex_unlock(&shrinker_mutex);
      free_shrinker_info(memcg);

Muchun,
Thanks.

> 
> So no, I a gree with Kirill and others. Ideally the fix would also move the
> declaration of info into the for loop to make its scope more obvious.
> 
>> Thanks.
>> 
>>> Best regards,
>>> Ridong
Muchun Song Oct. 16, 2024, 2:22 p.m. UTC | #17
> On Oct 16, 2024, at 20:13, Vlastimil Babka <vbabka@suse.cz> wrote:
> 
> On 10/14/24 11:20, Muchun Song wrote:
>> 
>> 
>>>> On Oct 14, 2024, at 17:04, chenridong <chenridong@huawei.com> wrote:
>>> 
>>> 
>>> 
>>> On 2024/10/14 16:43, Muchun Song wrote:
>>>>> On Oct 14, 2024, at 16:13, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 10/14/24 08:53, Chen Ridong wrote:
>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>> 
>>>>>> A memleak was found as bellow:
>>>>>> 
>>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>>> comm "mkdir", pid 1559, jiffies 4294932666
>>>>>> hex dump (first 32 bytes):
>>>>>>   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>>   40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>>> backtrace (crc 2e7ef6fa):
>>>>>>   [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>>>   [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>>>   [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>>>   [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>>>   [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>>>   [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>>>   [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>>>   [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>>>   [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>>>   [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>>>   [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>>>   [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>> 
>>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>>>> err, the info won't be freed. Just fix it.
>>>>>> 
>>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>>> ---
>>>>>> mm/shrinker.c | 1 +
>>>>>> 1 file changed, 1 insertion(+)
>>>>>> 
>>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>>>> --- a/mm/shrinker.c
>>>>>> +++ b/mm/shrinker.c
>>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>> 
>>>>>> err:
>>>>>> mutex_unlock(&shrinker_mutex);
>>>>>> + kvfree(info);
>>>>>> free_shrinker_info(memcg);
>>>>>> return -ENOMEM;
>>>>>> }
>>>>> 
>>>>> There are two scenarios when "goto err:" gets called
>>>>> 
>>>>> - When shrinker_info allocations fails, no kvfree() is required
>>>>> - but after this change kvfree() would be called even
>>>>> when the allocation had failed originally, which does
>>>>>   not sound right
>>>> Yes. In this case, @info is NULL and kvfree could handle NULL.
>>>> It seems strange but the final behaviour correct.
>>>>> 
>>>>> - shrinker_unit_alloc() fails, kvfree() is actually required
>>>>> 
>>>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>>>> fails but before calling into "goto err".
>>>> We could do it like this, which avoids ambiguity (if someone ignores
>>>> that kvfree could handle NULL). Something like:
>>>> --- a/mm/shrinker.c
>>>> +++ b/mm/shrinker.c
>>>> @@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>                        goto err;
>>>>                info->map_nr_max = shrinker_nr_max;
>>>>                if (shrinker_unit_alloc(info, NULL, nid))
>>>> -                       goto err;
>>>> +                       goto free;
>>>>                rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>>        }
>>>>        mutex_unlock(&shrinker_mutex);
>>>>        return ret;
>>>> -
>>>> +free:
>>>> +       kvfree(info);
>>>> err:
>>>>        mutex_unlock(&shrinker_mutex);
>>>>        free_shrinker_info(memcg);
>>>> Thanks.
>>>>> 
>>>>> But curious, should not both kvzalloc_node()/kvfree() be avoided
>>>>> while inside mutex lock to avoid possible lockdep issues ?
>>> How about:
>>> 
>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>> index dc5d2a6fcfc4..7baee7f00497 100644
>>> --- a/mm/shrinker.c
>>> +++ b/mm/shrinker.c
>>> @@ -87,9 +87,9 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>                if (!info)
>>>                        goto err;
>>>                info->map_nr_max = shrinker_nr_max;
>>> +               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>                if (shrinker_unit_alloc(info, NULL, nid))
>>>                        goto err;
>>> -               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>        }
>>>        mutex_unlock(&shrinker_mutex);
>> 
>> No. We should make sure the @info is fully initialized before others
>> could see it. That's why rcu_assign_pointer is used here.
> 
> If the info is immediately visible, is the failure cleanup
> free_shrinker_info() safe? It uses kvfree(info) and not kvfree_rcu(), and
> shrinker_unit_free() is also doing kfree().

Qi told me that the @info will not visible immediately yesterday.
So non-rcu-based kvfree is safe. Even if this fix could
work properly, it’s a bit strange for me to use
rcu_assign_pointer to assign the @info without full initialization to it.

Muchun,
Thanks.

> 
>>> 
>>> I think this is concise.
>>> 
>>> Best regards,
>>> Ridong
>> 
>> 
>
Vlastimil Babka Oct. 16, 2024, 5:02 p.m. UTC | #18
On 10/16/24 16:08, Muchun Song wrote:
> 
> 
>> On Oct 16, 2024, at 19:43, Vlastimil Babka <vbabka@suse.cz> wrote:
>> On 10/16/24 04:21, Muchun Song wrote:
>>> 
>>> 
>>>> On Oct 16, 2024, at 09:25, chenridong <chenridong@huawei.com> wrote:
>>>> On 2024/10/15 14:55, Anshuman Khandual wrote:
>>>>> On 10/14/24 16:59, Kirill A. Shutemov wrote:
>>>>>> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
>>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>>> A memleak was found as bellow:
>>>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>>>>  comm "mkdir", pid 1559, jiffies 4294932666
>>>>>>>  hex dump (first 32 bytes):
>>>>>>>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>>>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>>>>  backtrace (crc 2e7ef6fa):
>>>>>>>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>>>>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>>>>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>>>>    [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>>>>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>>>>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>>>>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>>>>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>>>>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>>>>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>>>>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>>>>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>>>>> err, the info won't be freed. Just fix it.
>>>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>>>> ---
>>>>>>> mm/shrinker.c | 1 +
>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>>>>> --- a/mm/shrinker.c
>>>>>>> +++ b/mm/shrinker.c
>>>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>>>   err:
>>>>>>>  mutex_unlock(&shrinker_mutex);
>>>>>>> + kvfree(info);
>>>>>>>  free_shrinker_info(memcg);
>>>>>>>  return -ENOMEM;
>>>>>>> }
>>>>>> NAK. If in the future there going to one more error case after
>>>>>> rcu_assign_pointer() we will end up with double free.
>>>>>> This should be safer:
>>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>>> index dc5d2a6fcfc4..763fd556bc7d 100644
>>>>>> --- a/mm/shrinker.c
>>>>>> +++ b/mm/shrinker.c
>>>>>> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>>  if (!info)
>>>>>>  goto err;
>>>>>>  info->map_nr_max = shrinker_nr_max;
>>>>>> - if (shrinker_unit_alloc(info, NULL, nid))
>>>>>> + if (shrinker_unit_alloc(info, NULL, nid)) {
>>>>>> + kvfree(info);
>>>>>>  goto err;
>>>>>> + }
>>>>>>  rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>>>>  }
>>>>>>  mutex_unlock(&shrinker_mutex);
>>>>> Agreed, this is what I mentioned earlier as well.
>>>>> ------------------------------------------------------------------
>>>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>>>> fails but before calling into "goto err"
>>>>> ------------------------------------------------------------------
>>>> After discussion, it seems that v1 is acceptable.
>>>> Hi, Muchun, do you have any other opinions?
>>> 
>>> I insist on my opinion, not mixing two different approaches
>>> to do release resources.
>> 
>> So instead we mix the cleanup of the whole function with the cleanup of what
>> is effectively a per-iteration temporary variable?
>> 
>> The fact there was already a confusion in this thread about whether it's
>> safe and relies on kvfree(NULL) to be a no-op, should be a hint.
> 
> Yes. I think someone is confused about my opinion.
> I don’t care about whether we should apply this hit.
> If we think the hint is tricky, we could add another
> label to fix it like I suggested previously. Because
> we already use goto-based approaches to
> cleanup the resources, why not keeping
> consistent?

I think we're rather pragmatic than striving to be consistent for the sake
of consistency. goto is not the nicest thing in the world, but we (unlike
other projects) use it where it makes sense to avoid if/else nesting
explosion. Here for the info it's not the most pragmatic option.

> It will be easier for us to add a new
> "if" statement and handle the failure case in the future.

Let's not overengineer things for hypothetical future.

> For example, if we use his v1 proposal, we should do
> the cleanups again for info. But for goto-based
> version, we just add another label to do the
> cleanups and go to the new label for failure case. goto-based fix is what I insisted on. I copied my previous suggested fix here to clarify my opinion.

Again, info is a loop-iteration-local variable, v1 fix making it truly local
is the way to go. If there are further cleanups added in the loop itself in
the future, they could hopefully keep being local to the loop as well.
Cleanup of info outside the loop iteration is breaking its real scope.

> 
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>                       goto err;
>               info->map_nr_max = shrinker_nr_max;
>               if (shrinker_unit_alloc(info, NULL, nid))
> -                       goto err;
> +                       goto free;
>               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>       }
>       mutex_unlock(&shrinker_mutex);
> 
>       return ret;
> -
> +free:
> +       kvfree(info);
> err:
>       mutex_unlock(&shrinker_mutex);
>       free_shrinker_info(memcg);
> 
> Muchun,
> Thanks.
> 
>> 
>> So no, I a gree with Kirill and others. Ideally the fix would also move the
>> declaration of info into the for loop to make its scope more obvious.
>> 
>>> Thanks.
>>> 
>>>> Best regards,
>>>> Ridong
Roman Gushchin Oct. 16, 2024, 5:31 p.m. UTC | #19
On Wed, Oct 16, 2024 at 07:02:23PM +0200, Vlastimil Babka wrote:
> On 10/16/24 16:08, Muchun Song wrote:
> > 
> > 
> >> On Oct 16, 2024, at 19:43, Vlastimil Babka <vbabka@suse.cz> wrote:
> >> On 10/16/24 04:21, Muchun Song wrote:
> >>> 
> >>> 
> >>>> On Oct 16, 2024, at 09:25, chenridong <chenridong@huawei.com> wrote:
> >>>> On 2024/10/15 14:55, Anshuman Khandual wrote:
> >>>>> On 10/14/24 16:59, Kirill A. Shutemov wrote:
> >>>>>> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
> >>>>>>> From: Chen Ridong <chenridong@huawei.com>
> >>>>>>> A memleak was found as bellow:
> >>>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
> >>>>>>>  comm "mkdir", pid 1559, jiffies 4294932666
> >>>>>>>  hex dump (first 32 bytes):
> >>>>>>>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>>>>>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
> >>>>>>>  backtrace (crc 2e7ef6fa):
> >>>>>>>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
> >>>>>>>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
> >>>>>>>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
> >>>>>>>    [<ffffffff81198dd9>] online_css+0x29/0xa0
> >>>>>>>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
> >>>>>>>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
> >>>>>>>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
> >>>>>>>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
> >>>>>>>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
> >>>>>>>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
> >>>>>>>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
> >>>>>>>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >>>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
> >>>>>>> err, the info won't be freed. Just fix it.
> >>>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
> >>>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >>>>>>> ---
> >>>>>>> mm/shrinker.c | 1 +
> >>>>>>> 1 file changed, 1 insertion(+)
> >>>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
> >>>>>>> index dc5d2a6fcfc4..92270413190d 100644
> >>>>>>> --- a/mm/shrinker.c
> >>>>>>> +++ b/mm/shrinker.c
> >>>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
> >>>>>>>   err:
> >>>>>>>  mutex_unlock(&shrinker_mutex);
> >>>>>>> + kvfree(info);
> >>>>>>>  free_shrinker_info(memcg);
> >>>>>>>  return -ENOMEM;
> >>>>>>> }
> >>>>>> NAK. If in the future there going to one more error case after
> >>>>>> rcu_assign_pointer() we will end up with double free.
> >>>>>> This should be safer:
> >>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
> >>>>>> index dc5d2a6fcfc4..763fd556bc7d 100644
> >>>>>> --- a/mm/shrinker.c
> >>>>>> +++ b/mm/shrinker.c
> >>>>>> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
> >>>>>>  if (!info)
> >>>>>>  goto err;
> >>>>>>  info->map_nr_max = shrinker_nr_max;
> >>>>>> - if (shrinker_unit_alloc(info, NULL, nid))
> >>>>>> + if (shrinker_unit_alloc(info, NULL, nid)) {
> >>>>>> + kvfree(info);
> >>>>>>  goto err;
> >>>>>> + }
> >>>>>>  rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
> >>>>>>  }
> >>>>>>  mutex_unlock(&shrinker_mutex);
> >>>>> Agreed, this is what I mentioned earlier as well.
> >>>>> ------------------------------------------------------------------
> >>>>> I guess kvfree() should be called just after shrinker_unit_alloc()
> >>>>> fails but before calling into "goto err"
> >>>>> ------------------------------------------------------------------
> >>>> After discussion, it seems that v1 is acceptable.
> >>>> Hi, Muchun, do you have any other opinions?
> >>> 
> >>> I insist on my opinion, not mixing two different approaches
> >>> to do release resources.
> >> 
> >> So instead we mix the cleanup of the whole function with the cleanup of what
> >> is effectively a per-iteration temporary variable?
> >> 
> >> The fact there was already a confusion in this thread about whether it's
> >> safe and relies on kvfree(NULL) to be a no-op, should be a hint.
> > 
> > Yes. I think someone is confused about my opinion.
> > I don’t care about whether we should apply this hit.
> > If we think the hint is tricky, we could add another
> > label to fix it like I suggested previously. Because
> > we already use goto-based approaches to
> > cleanup the resources, why not keeping
> > consistent?
> 
> I think we're rather pragmatic than striving to be consistent for the sake
> of consistency. goto is not the nicest thing in the world, but we (unlike
> other projects) use it where it makes sense to avoid if/else nesting
> explosion. Here for the info it's not the most pragmatic option.
> 
> > It will be easier for us to add a new
> > "if" statement and handle the failure case in the future.
> 
> Let's not overengineer things for hypothetical future.
> 
> > For example, if we use his v1 proposal, we should do
> > the cleanups again for info. But for goto-based
> > version, we just add another label to do the
> > cleanups and go to the new label for failure case. goto-based fix is what I insisted on. I copied my previous suggested fix here to clarify my opinion.
> 
> Again, info is a loop-iteration-local variable, v1 fix making it truly local
> is the way to go. If there are further cleanups added in the loop itself in
> the future, they could hopefully keep being local to the loop as well.
> Cleanup of info outside the loop iteration is breaking its real scope.

+1 to that.

I don't think it's such a big deal and both versions are ok, but I strongly
prefer the original version (without introduction of a new label).

Please, feel free to use
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
with the original version.

Thanks!
Qi Zheng Oct. 17, 2024, 2:41 a.m. UTC | #20
On 2024/10/16 22:22, Muchun Song wrote:
> 
> 
>> On Oct 16, 2024, at 20:13, Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 10/14/24 11:20, Muchun Song wrote:
>>>
>>>
>>>>> On Oct 14, 2024, at 17:04, chenridong <chenridong@huawei.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/10/14 16:43, Muchun Song wrote:
>>>>>> On Oct 14, 2024, at 16:13, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/14/24 08:53, Chen Ridong wrote:
>>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>>>
>>>>>>> A memleak was found as bellow:
>>>>>>>
>>>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>>>> comm "mkdir", pid 1559, jiffies 4294932666
>>>>>>> hex dump (first 32 bytes):
>>>>>>>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>>>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>>>> backtrace (crc 2e7ef6fa):
>>>>>>>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>>>>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>>>>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>>>>    [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>>>>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>>>>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>>>>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>>>>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>>>>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>>>>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>>>>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>>>>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>>>
>>>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>>>>> err, the info won't be freed. Just fix it.
>>>>>>>
>>>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>>>> ---
>>>>>>> mm/shrinker.c | 1 +
>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>>>>> --- a/mm/shrinker.c
>>>>>>> +++ b/mm/shrinker.c
>>>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>>>
>>>>>>> err:
>>>>>>> mutex_unlock(&shrinker_mutex);
>>>>>>> + kvfree(info);
>>>>>>> free_shrinker_info(memcg);
>>>>>>> return -ENOMEM;
>>>>>>> }
>>>>>>
>>>>>> There are two scenarios when "goto err:" gets called
>>>>>>
>>>>>> - When shrinker_info allocations fails, no kvfree() is required
>>>>>> - but after this change kvfree() would be called even
>>>>>> when the allocation had failed originally, which does
>>>>>>    not sound right
>>>>> Yes. In this case, @info is NULL and kvfree could handle NULL.
>>>>> It seems strange but the final behaviour correct.
>>>>>>
>>>>>> - shrinker_unit_alloc() fails, kvfree() is actually required
>>>>>>
>>>>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>>>>> fails but before calling into "goto err".
>>>>> We could do it like this, which avoids ambiguity (if someone ignores
>>>>> that kvfree could handle NULL). Something like:
>>>>> --- a/mm/shrinker.c
>>>>> +++ b/mm/shrinker.c
>>>>> @@ -88,13 +88,14 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>                         goto err;
>>>>>                 info->map_nr_max = shrinker_nr_max;
>>>>>                 if (shrinker_unit_alloc(info, NULL, nid))
>>>>> -                       goto err;
>>>>> +                       goto free;
>>>>>                 rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>>>         }
>>>>>         mutex_unlock(&shrinker_mutex);
>>>>>         return ret;
>>>>> -
>>>>> +free:
>>>>> +       kvfree(info);
>>>>> err:
>>>>>         mutex_unlock(&shrinker_mutex);
>>>>>         free_shrinker_info(memcg);
>>>>> Thanks.
>>>>>>
>>>>>> But curious, should not both kvzalloc_node()/kvfree() be avoided
>>>>>> while inside mutex lock to avoid possible lockdep issues ?
>>>> How about:
>>>>
>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>> index dc5d2a6fcfc4..7baee7f00497 100644
>>>> --- a/mm/shrinker.c
>>>> +++ b/mm/shrinker.c
>>>> @@ -87,9 +87,9 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>                 if (!info)
>>>>                         goto err;
>>>>                 info->map_nr_max = shrinker_nr_max;
>>>> +               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>>                 if (shrinker_unit_alloc(info, NULL, nid))
>>>>                         goto err;
>>>> -               rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>>         }
>>>>         mutex_unlock(&shrinker_mutex);
>>>
>>> No. We should make sure the @info is fully initialized before others
>>> could see it. That's why rcu_assign_pointer is used here.
>>
>> If the info is immediately visible, is the failure cleanup
>> free_shrinker_info() safe? It uses kvfree(info) and not kvfree_rcu(), and
>> shrinker_unit_free() is also doing kfree().
> 
> Qi told me that the @info will not visible immediately yesterday.
> So non-rcu-based kvfree is safe. Even if this fix could

Yes, alloc_shrinker_info() is only called by mem_cgroup_css_online(). At
this time, the memcg is not online yet, so it is not visible to 
shrink_slab(). And free_shrinker_info() is also called by 
mem_cgroup_css_free(),
the memcg has already been offline. The shrinker_unit_free() is
also called by expand_one_shrinker_info(), but the shrinker_info 'new'
is also not visible at that time. So non-rcu-based kvfree is safe.

> work properly, it’s a bit strange for me to use
> rcu_assign_pointer to assign the @info without full initialization to it.

Agree.

> 
> Muchun,
> Thanks.
> 
>>
>>>>
>>>> I think this is concise.
>>>>
>>>> Best regards,
>>>> Ridong
>>>
>>>
>>
Chen Ridong Oct. 24, 2024, 1:26 a.m. UTC | #21
On 2024/10/17 1:31, Roman Gushchin wrote:
> On Wed, Oct 16, 2024 at 07:02:23PM +0200, Vlastimil Babka wrote:
>> On 10/16/24 16:08, Muchun Song wrote:
>>>
>>>
>>>> On Oct 16, 2024, at 19:43, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>> On 10/16/24 04:21, Muchun Song wrote:
>>>>>
>>>>>
>>>>>> On Oct 16, 2024, at 09:25, chenridong <chenridong@huawei.com> wrote:
>>>>>> On 2024/10/15 14:55, Anshuman Khandual wrote:
>>>>>>> On 10/14/24 16:59, Kirill A. Shutemov wrote:
>>>>>>>> On Mon, Oct 14, 2024 at 03:23:36AM +0000, Chen Ridong wrote:
>>>>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>>>>> A memleak was found as bellow:
>>>>>>>>> unreferenced object 0xffff8881010d2a80 (size 32):
>>>>>>>>>  comm "mkdir", pid 1559, jiffies 4294932666
>>>>>>>>>  hex dump (first 32 bytes):
>>>>>>>>>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>>>>>    40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  @...............
>>>>>>>>>  backtrace (crc 2e7ef6fa):
>>>>>>>>>    [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
>>>>>>>>>    [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
>>>>>>>>>    [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
>>>>>>>>>    [<ffffffff81198dd9>] online_css+0x29/0xa0
>>>>>>>>>    [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
>>>>>>>>>    [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
>>>>>>>>>    [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
>>>>>>>>>    [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220
>>>>>>>>>    [<ffffffff813e1c97>] do_mkdirat+0x87/0x130
>>>>>>>>>    [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
>>>>>>>>>    [<ffffffff81f8c928>] do_syscall_64+0x68/0x140
>>>>>>>>>    [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>>>>> In the alloc_shrinker_info function, when shrinker_unit_alloc return
>>>>>>>>> err, the info won't be freed. Just fix it.
>>>>>>>>> Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
>>>>>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>>>>>> ---
>>>>>>>>> mm/shrinker.c | 1 +
>>>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>>>>>> index dc5d2a6fcfc4..92270413190d 100644
>>>>>>>>> --- a/mm/shrinker.c
>>>>>>>>> +++ b/mm/shrinker.c
>>>>>>>>> @@ -97,6 +97,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>>>>>   err:
>>>>>>>>>  mutex_unlock(&shrinker_mutex);
>>>>>>>>> + kvfree(info);
>>>>>>>>>  free_shrinker_info(memcg);
>>>>>>>>>  return -ENOMEM;
>>>>>>>>> }
>>>>>>>> NAK. If in the future there going to one more error case after
>>>>>>>> rcu_assign_pointer() we will end up with double free.
>>>>>>>> This should be safer:
>>>>>>>> diff --git a/mm/shrinker.c b/mm/shrinker.c
>>>>>>>> index dc5d2a6fcfc4..763fd556bc7d 100644
>>>>>>>> --- a/mm/shrinker.c
>>>>>>>> +++ b/mm/shrinker.c
>>>>>>>> @@ -87,8 +87,10 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
>>>>>>>>  if (!info)
>>>>>>>>  goto err;
>>>>>>>>  info->map_nr_max = shrinker_nr_max;
>>>>>>>> - if (shrinker_unit_alloc(info, NULL, nid))
>>>>>>>> + if (shrinker_unit_alloc(info, NULL, nid)) {
>>>>>>>> + kvfree(info);
>>>>>>>>  goto err;
>>>>>>>> + }
>>>>>>>>  rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
>>>>>>>>  }
>>>>>>>>  mutex_unlock(&shrinker_mutex);
>>>>>>> Agreed, this is what I mentioned earlier as well.
>>>>>>> ------------------------------------------------------------------
>>>>>>> I guess kvfree() should be called just after shrinker_unit_alloc()
>>>>>>> fails but before calling into "goto err"
>>>>>>> ------------------------------------------------------------------
>>>>>> After discussion, it seems that v1 is acceptable.
>>>>>> Hi, Muchun, do you have any other opinions?
>>>>>
>>>>> I insist on my opinion, not mixing two different approaches
>>>>> to do release resources.
>>>>
>>>> So instead we mix the cleanup of the whole function with the cleanup of what
>>>> is effectively a per-iteration temporary variable?
>>>>
>>>> The fact there was already a confusion in this thread about whether it's
>>>> safe and relies on kvfree(NULL) to be a no-op, should be a hint.
>>>
>>> Yes. I think someone is confused about my opinion.
>>> I don’t care about whether we should apply this hit.
>>> If we think the hint is tricky, we could add another
>>> label to fix it like I suggested previously. Because
>>> we already use goto-based approaches to
>>> cleanup the resources, why not keeping
>>> consistent?
>>
>> I think we're rather pragmatic than striving to be consistent for the sake
>> of consistency. goto is not the nicest thing in the world, but we (unlike
>> other projects) use it where it makes sense to avoid if/else nesting
>> explosion. Here for the info it's not the most pragmatic option.
>>
>>> It will be easier for us to add a new
>>> "if" statement and handle the failure case in the future.
>>
>> Let's not overengineer things for hypothetical future.
>>
>>> For example, if we use his v1 proposal, we should do
>>> the cleanups again for info. But for goto-based
>>> version, we just add another label to do the
>>> cleanups and go to the new label for failure case. goto-based fix is what I insisted on. I copied my previous suggested fix here to clarify my opinion.
>>
>> Again, info is a loop-iteration-local variable, v1 fix making it truly local
>> is the way to go. If there are further cleanups added in the loop itself in
>> the future, they could hopefully keep being local to the loop as well.
>> Cleanup of info outside the loop iteration is breaking its real scope.
> 
> +1 to that.
> 
> I don't think it's such a big deal and both versions are ok, but I strongly
> prefer the original version (without introduction of a new label).
> 
> Please, feel free to use
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
> with the original version.
> 
> Thanks!

I agree with Roman.
Hello, Andrew and Dave, Do you have any opinions?

The original version:
https://lore.kernel.org/linux-kernel/4184c61f-80f7-4adc-8929-c29f959cb8df@huawei.com/

Best regards,
Ridong
Vlastimil Babka Oct. 24, 2024, 9:08 a.m. UTC | #22
On 10/24/24 03:26, Chen Ridong wrote:
>>>> For example, if we use his v1 proposal, we should do
>>>> the cleanups again for info. But for goto-based
>>>> version, we just add another label to do the
>>>> cleanups and go to the new label for failure case. goto-based fix is what I insisted on. I copied my previous suggested fix here to clarify my opinion.
>>>
>>> Again, info is a loop-iteration-local variable, v1 fix making it truly local
>>> is the way to go. If there are further cleanups added in the loop itself in
>>> the future, they could hopefully keep being local to the loop as well.
>>> Cleanup of info outside the loop iteration is breaking its real scope.
>> 
>> +1 to that.
>> 
>> I don't think it's such a big deal and both versions are ok, but I strongly
>> prefer the original version (without introduction of a new label).
>> 
>> Please, feel free to use
>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
>> with the original version.
>> 
>> Thanks!
> 
> I agree with Roman.
> Hello, Andrew and Dave, Do you have any opinions?
> 
> The original version:
> https://lore.kernel.org/linux-kernel/4184c61f-80f7-4adc-8929-c29f959cb8df@huawei.com/

Hi,

Can you resend the v1 as v3, but also move the declaration of "struct
shrinker_info *info;" inside the for_each_node() block? Also you can add all
the acks collected for that version and mine too:

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks.

> Best regards,
> Ridong
> 
>
Chen Ridong Oct. 25, 2024, 1:22 a.m. UTC | #23
On 2024/10/24 17:08, Vlastimil Babka wrote:
> On 10/24/24 03:26, Chen Ridong wrote:
>>>>> For example, if we use his v1 proposal, we should do
>>>>> the cleanups again for info. But for goto-based
>>>>> version, we just add another label to do the
>>>>> cleanups and go to the new label for failure case. goto-based fix is what I insisted on. I copied my previous suggested fix here to clarify my opinion.
>>>>
>>>> Again, info is a loop-iteration-local variable, v1 fix making it truly local
>>>> is the way to go. If there are further cleanups added in the loop itself in
>>>> the future, they could hopefully keep being local to the loop as well.
>>>> Cleanup of info outside the loop iteration is breaking its real scope.
>>>
>>> +1 to that.
>>>
>>> I don't think it's such a big deal and both versions are ok, but I strongly
>>> prefer the original version (without introduction of a new label).
>>>
>>> Please, feel free to use
>>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
>>> with the original version.
>>>
>>> Thanks!
>>
>> I agree with Roman.
>> Hello, Andrew and Dave, Do you have any opinions?
>>
>> The original version:
>> https://lore.kernel.org/linux-kernel/4184c61f-80f7-4adc-8929-c29f959cb8df@huawei.com/
> 
> Hi,
> 
> Can you resend the v1 as v3, but also move the declaration of "struct
> shrinker_info *info;" inside the for_each_node() block? Also you can add all
> the acks collected for that version and mine too:
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Thanks.
> 
>> Best regards,
>> Ridong
>>
>>

Will update, Thanks.

Best regards,
Ridong
diff mbox series

Patch

diff --git a/mm/shrinker.c b/mm/shrinker.c
index dc5d2a6fcfc4..92270413190d 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -97,6 +97,7 @@  int alloc_shrinker_info(struct mem_cgroup *memcg)
 
 err:
 	mutex_unlock(&shrinker_mutex);
+	kvfree(info);
 	free_shrinker_info(memcg);
 	return -ENOMEM;
 }