diff mbox series

padata: Fix possible divide-by-0 panic in padata_mt_helper()

Message ID 20240806174647.1050398-1-longman@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Herbert Xu
Headers show
Series padata: Fix possible divide-by-0 panic in padata_mt_helper() | expand

Commit Message

Waiman Long Aug. 6, 2024, 5:46 p.m. UTC
We are hit with a not easily reproducible divide-by-0 panic in padata.c
at bootup time.

  [   10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
  [   10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
  [   10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
  [   10.017908] Workqueue: events_unbound padata_mt_helper
  [   10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
    :
  [   10.017963] Call Trace:
  [   10.017968]  <TASK>
  [   10.018004]  ? padata_mt_helper+0x39/0xb0
  [   10.018084]  process_one_work+0x174/0x330
  [   10.018093]  worker_thread+0x266/0x3a0
  [   10.018111]  kthread+0xcf/0x100
  [   10.018124]  ret_from_fork+0x31/0x50
  [   10.018138]  ret_from_fork_asm+0x1a/0x30
  [   10.018147]  </TASK>

Looking at the padata_mt_helper() function, the only way a divide-by-0
panic can happen is when ps->chunk_size is 0. The way that chunk_size is
initialized in padata_do_multithreaded(), chunk_size can be 0 when the
min_chunk in the passed-in padata_mt_job structure is 0.

Fix this divide-by-0 panic by making sure that chunk_size will be at
least 1 no matter what the input parameters are.

Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/padata.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Herbert Xu Aug. 10, 2024, 4:05 a.m. UTC | #1
Waiman Long <longman@redhat.com> wrote:
>
> diff --git a/kernel/padata.c b/kernel/padata.c
> index 53f4bc912712..0fa6c2895460 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>        ps.chunk_size = max(ps.chunk_size, job->min_chunk);
>        ps.chunk_size = roundup(ps.chunk_size, job->align);
> 
> +       /*
> +        * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
> +        * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
> +        */
> +       if (!ps.chunk_size)
> +               ps.chunk_size = 1U;

Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP
instead?

Thanks,
Kamlesh Gurudasani Aug. 10, 2024, 5:44 p.m. UTC | #2
Waiman Long <longman@redhat.com> writes:

> We are hit with a not easily reproducible divide-by-0 panic in padata.c
> at bootup time.
>
>   [   10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
>   [   10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
>   [   10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
>   [   10.017908] Workqueue: events_unbound padata_mt_helper
>   [   10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
>     :
>   [   10.017963] Call Trace:
>   [   10.017968]  <TASK>
>   [   10.018004]  ? padata_mt_helper+0x39/0xb0
>   [   10.018084]  process_one_work+0x174/0x330
>   [   10.018093]  worker_thread+0x266/0x3a0
>   [   10.018111]  kthread+0xcf/0x100
>   [   10.018124]  ret_from_fork+0x31/0x50
>   [   10.018138]  ret_from_fork_asm+0x1a/0x30
>   [   10.018147]  </TASK>
>
> Looking at the padata_mt_helper() function, the only way a divide-by-0
> panic can happen is when ps->chunk_size is 0. The way that chunk_size is
> initialized in padata_do_multithreaded(), chunk_size can be 0 when the
> min_chunk in the passed-in padata_mt_job structure is 0.
>
> Fix this divide-by-0 panic by making sure that chunk_size will be at
> least 1 no matter what the input parameters are.
>
> Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/padata.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/kernel/padata.c b/kernel/padata.c
> index 53f4bc912712..0fa6c2895460 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>  	ps.chunk_size = max(ps.chunk_size, job->min_chunk);
>  	ps.chunk_size = roundup(ps.chunk_size, job->align);
>  
> +	/*
> +	 * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
> +	 * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
> +	 */
Thanks for the patch and detailed comment.
> +	if (!ps.chunk_size)
> +		ps.chunk_size = 1U;
> +
could it be
        ps.chunk_size = max(ps.chunk_size, 1U);
        
or can be merged with earlier max()
  	ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U));
  	ps.chunk_size = roundup(ps.chunk_size, job->align);

sits well with how entire file is written and compiler is optimizing
them to same level.

Kamlesh

>  	list_for_each_entry(pw, &works, pw_list)
>  		if (job->numa_aware) {
>  			int old_node = atomic_read(&last_used_nid);
> -- 
> 2.43.5
Waiman Long Aug. 11, 2024, 1:30 a.m. UTC | #3
On 8/10/24 00:05, Herbert Xu wrote:
> Waiman Long <longman@redhat.com> wrote:
>> diff --git a/kernel/padata.c b/kernel/padata.c
>> index 53f4bc912712..0fa6c2895460 100644
>> --- a/kernel/padata.c
>> +++ b/kernel/padata.c
>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>>         ps.chunk_size = max(ps.chunk_size, job->min_chunk);
>>         ps.chunk_size = roundup(ps.chunk_size, job->align);
>>
>> +       /*
>> +        * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
>> +        * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
>> +        */
>> +       if (!ps.chunk_size)
>> +               ps.chunk_size = 1U;
> Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP
> instead?

I think DIV_ROUND_UP() will exactly the same problem that if chunk_size 
is 0, you still got a 0 result. round_up() only if the 2nd argument is a 
power of 2 while with DIV_ROUND_UP(), the second argument can be any 
number except 0.

Cheers,
Longman
Herbert Xu Aug. 11, 2024, 1:45 a.m. UTC | #4
On Sat, Aug 10, 2024 at 09:30:25PM -0400, Waiman Long wrote:
> 
> On 8/10/24 00:05, Herbert Xu wrote:
> > Waiman Long <longman@redhat.com> wrote:
> > > diff --git a/kernel/padata.c b/kernel/padata.c
> > > index 53f4bc912712..0fa6c2895460 100644
> > > --- a/kernel/padata.c
> > > +++ b/kernel/padata.c
> > > @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
> > >         ps.chunk_size = max(ps.chunk_size, job->min_chunk);
> > >         ps.chunk_size = roundup(ps.chunk_size, job->align);
> > > 
> > > +       /*
> > > +        * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
> > > +        * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
> > > +        */
> > > +       if (!ps.chunk_size)
> > > +               ps.chunk_size = 1U;
> > Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP
> > instead?
> 
> I think DIV_ROUND_UP() will exactly the same problem that if chunk_size is
> 0, you still got a 0 result. round_up() only if the 2nd argument is a power
> of 2 while with DIV_ROUND_UP(), the second argument can be any number except
> 0.

Unless I'm missing something chunk_size cannot be zero before the
division because that's the first thing we check upon entry into
this function.

Cheers,
Waiman Long Aug. 11, 2024, 3:11 a.m. UTC | #5
On 8/10/24 21:45, Herbert Xu wrote:
> On Sat, Aug 10, 2024 at 09:30:25PM -0400, Waiman Long wrote:
>> On 8/10/24 00:05, Herbert Xu wrote:
>>> Waiman Long <longman@redhat.com> wrote:
>>>> diff --git a/kernel/padata.c b/kernel/padata.c
>>>> index 53f4bc912712..0fa6c2895460 100644
>>>> --- a/kernel/padata.c
>>>> +++ b/kernel/padata.c
>>>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>>>>          ps.chunk_size = max(ps.chunk_size, job->min_chunk);
>>>>          ps.chunk_size = roundup(ps.chunk_size, job->align);
>>>>
>>>> +       /*
>>>> +        * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
>>>> +        * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
>>>> +        */
>>>> +       if (!ps.chunk_size)
>>>> +               ps.chunk_size = 1U;
>>> Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP
>>> instead?
>> I think DIV_ROUND_UP() will exactly the same problem that if chunk_size is
>> 0, you still got a 0 result. round_up() only if the 2nd argument is a power
>> of 2 while with DIV_ROUND_UP(), the second argument can be any number except
>> 0.
> Unless I'm missing something chunk_size cannot be zero before the
> division because that's the first thing we check upon entry into
> this function.

chunk_size is initialized as

ps.chunk_size = job->size / (ps.nworks * load_balance_factor);

chunk_size will be 0 if job->size < (ps.nworks * load_balance_factor). 
If min_chunk is 0, chunk_size will remain 0.

After looking at the dump file when the crash happen at 
padata_mt_helper(). I had determined that ps->chunk_size was indeed 0 
which caused the divide-by-0 panic. I actually got 2 different bug 
reports of this div-by-0 panic, one with a debug and another one with a 
non-debug kernel.

Cheers,
Longman
Herbert Xu Aug. 11, 2024, 3:13 a.m. UTC | #6
On Sat, Aug 10, 2024 at 11:11:07PM -0400, Waiman Long wrote:
>
> > Unless I'm missing something chunk_size cannot be zero before the
> > division because that's the first thing we check upon entry into
> > this function.
> 
> chunk_size is initialized as
> 
> ps.chunk_size = job->size / (ps.nworks * load_balance_factor);
> 
> chunk_size will be 0 if job->size < (ps.nworks * load_balance_factor). If
> min_chunk is 0, chunk_size will remain 0.

That's why I was suggesting that you replace the division by
DIV_ROUND_UP.  That should ensure that ps.chunk_size is not zero.

Cheers,
Waiman Long Aug. 11, 2024, 3:27 a.m. UTC | #7
On 8/10/24 23:13, Herbert Xu wrote:
> On Sat, Aug 10, 2024 at 11:11:07PM -0400, Waiman Long wrote:
>>> Unless I'm missing something chunk_size cannot be zero before the
>>> division because that's the first thing we check upon entry into
>>> this function.
>> chunk_size is initialized as
>>
>> ps.chunk_size = job->size / (ps.nworks * load_balance_factor);
>>
>> chunk_size will be 0 if job->size < (ps.nworks * load_balance_factor). If
>> min_chunk is 0, chunk_size will remain 0.
> That's why I was suggesting that you replace the division by
> DIV_ROUND_UP.  That should ensure that ps.chunk_size is not zero.

Now I see what you mean. Yes, we can probably change that to a 
DIV_ROUND_UP operation to make sure that chunk_size is at least one 
unless job->size is 0. I still think the current patch is a bit more 
fail-safe.

Cheers, Longman
Waiman Long Aug. 11, 2024, 3:33 a.m. UTC | #8
On 8/10/24 13:44, Kamlesh Gurudasani wrote:
> Waiman Long <longman@redhat.com> writes:
>
>> We are hit with a not easily reproducible divide-by-0 panic in padata.c
>> at bootup time.
>>
>>    [   10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
>>    [   10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
>>    [   10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
>>    [   10.017908] Workqueue: events_unbound padata_mt_helper
>>    [   10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
>>      :
>>    [   10.017963] Call Trace:
>>    [   10.017968]  <TASK>
>>    [   10.018004]  ? padata_mt_helper+0x39/0xb0
>>    [   10.018084]  process_one_work+0x174/0x330
>>    [   10.018093]  worker_thread+0x266/0x3a0
>>    [   10.018111]  kthread+0xcf/0x100
>>    [   10.018124]  ret_from_fork+0x31/0x50
>>    [   10.018138]  ret_from_fork_asm+0x1a/0x30
>>    [   10.018147]  </TASK>
>>
>> Looking at the padata_mt_helper() function, the only way a divide-by-0
>> panic can happen is when ps->chunk_size is 0. The way that chunk_size is
>> initialized in padata_do_multithreaded(), chunk_size can be 0 when the
>> min_chunk in the passed-in padata_mt_job structure is 0.
>>
>> Fix this divide-by-0 panic by making sure that chunk_size will be at
>> least 1 no matter what the input parameters are.
>>
>> Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/padata.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/kernel/padata.c b/kernel/padata.c
>> index 53f4bc912712..0fa6c2895460 100644
>> --- a/kernel/padata.c
>> +++ b/kernel/padata.c
>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>>   	ps.chunk_size = max(ps.chunk_size, job->min_chunk);
>>   	ps.chunk_size = roundup(ps.chunk_size, job->align);
>>   
>> +	/*
>> +	 * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
>> +	 * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
>> +	 */
> Thanks for the patch and detailed comment.
>> +	if (!ps.chunk_size)
>> +		ps.chunk_size = 1U;
>> +
> could it be
>          ps.chunk_size = max(ps.chunk_size, 1U);
>          
> or can be merged with earlier max()
>    	ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U));
>    	ps.chunk_size = roundup(ps.chunk_size, job->align);
>
> sits well with how entire file is written and compiler is optimizing
> them to same level.

I had actually thought about doing that as an alternative. I used the 
current patch to avoid putting too many max() calls there. I can go this 
route if you guys prefer this.

Cheers,
Longman
Herbert Xu Aug. 11, 2024, 3:41 a.m. UTC | #9
On Sat, Aug 10, 2024 at 11:27:55PM -0400, Waiman Long wrote:
>
> Now I see what you mean. Yes, we can probably change that to a DIV_ROUND_UP
> operation to make sure that chunk_size is at least one unless job->size is
> 0. I still think the current patch is a bit more fail-safe.

The very first thing the function does is check that job->size is
not zero.  So this should be all that is necessary.

Thanks,
Kamlesh Gurudasani Aug. 11, 2024, 5:44 a.m. UTC | #10
Waiman Long <longman@redhat.com> writes:

> On 8/10/24 13:44, Kamlesh Gurudasani wrote:
>> Waiman Long <longman@redhat.com> writes:
>>
...
>>> diff --git a/kernel/padata.c b/kernel/padata.c
>>> index 53f4bc912712..0fa6c2895460 100644
>>> --- a/kernel/padata.c
>>> +++ b/kernel/padata.c
>>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>>>   	ps.chunk_size = max(ps.chunk_size, job->min_chunk);
>>>   	ps.chunk_size = roundup(ps.chunk_size, job->align);
>>>   
>>> +	/*
>>> +	 * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
>>> +	 * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
>>> +	 */
>> Thanks for the patch and detailed comment.
>>> +	if (!ps.chunk_size)
>>> +		ps.chunk_size = 1U;
>>> +
>> could it be
>>          ps.chunk_size = max(ps.chunk_size, 1U);
>>          
>> or can be merged with earlier max()
>>    	ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U));
>>    	ps.chunk_size = roundup(ps.chunk_size, job->align);
>>
>> sits well with how entire file is written and compiler is optimizing
>> them to same level.
>
> I had actually thought about doing that as an alternative. I used the 
> current patch to avoid putting too many max() calls there. I can go this 
> route if you guys prefer this.
Just curious, what is your reason for avoiding too many max() calls? Both
        if (!ps.chunk_size)
        	ps.chunk_size = 1U;
and
        ps.chunk_size = max(ps.chunk_size, 1U);

are having same number of instructions [1]. 

[1] https://godbolt.org/z/ajrK59c67

We can avoid nested max(), though following would make it easier to understand. 

   ps.chunk_size = max(ps.chunk_size, 1U);

Cheers,
Kamlesh

>
> Cheers,
> Longman
Waiman Long Aug. 13, 2024, 6:28 p.m. UTC | #11
On 8/11/24 01:44, Kamlesh Gurudasani wrote:
> Waiman Long <longman@redhat.com> writes:
>
>> On 8/10/24 13:44, Kamlesh Gurudasani wrote:
>>> Waiman Long <longman@redhat.com> writes:
>>>
> ...
>>>> diff --git a/kernel/padata.c b/kernel/padata.c
>>>> index 53f4bc912712..0fa6c2895460 100644
>>>> --- a/kernel/padata.c
>>>> +++ b/kernel/padata.c
>>>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>>>>    	ps.chunk_size = max(ps.chunk_size, job->min_chunk);
>>>>    	ps.chunk_size = roundup(ps.chunk_size, job->align);
>>>>    
>>>> +	/*
>>>> +	 * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
>>>> +	 * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
>>>> +	 */
>>> Thanks for the patch and detailed comment.
>>>> +	if (!ps.chunk_size)
>>>> +		ps.chunk_size = 1U;
>>>> +
>>> could it be
>>>           ps.chunk_size = max(ps.chunk_size, 1U);
>>>           
>>> or can be merged with earlier max()
>>>     	ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U));
>>>     	ps.chunk_size = roundup(ps.chunk_size, job->align);
>>>
>>> sits well with how entire file is written and compiler is optimizing
>>> them to same level.
>> I had actually thought about doing that as an alternative. I used the
>> current patch to avoid putting too many max() calls there. I can go this
>> route if you guys prefer this.
> Just curious, what is your reason for avoiding too many max() calls? Both
>          if (!ps.chunk_size)
>          	ps.chunk_size = 1U;
> and
>          ps.chunk_size = max(ps.chunk_size, 1U);
>
> are having same number of instructions [1].
>
> [1] https://godbolt.org/z/ajrK59c67
>
> We can avoid nested max(), though following would make it easier to understand.
>
>     ps.chunk_size = max(ps.chunk_size, 1U);

That will certainly work. My current patch has been merged into the 
Linus tree. You are welcome to post another patch to clean it up if you 
want.

Cheers,
Longman

>
> Cheers,
> Kamlesh
>
>> Cheers,
>> Longman
Herbert Xu Aug. 17, 2024, 7:12 a.m. UTC | #12
On Mon, Aug 12, 2024 at 10:04:07AM -0400, Waiman Long wrote:
>
> Anyway, using DIV_ROUND_UP() is a slight change in behavior as chunk_size
> will be increased by 1 in most cases. I am a bit hesitant to make this
> change without looking into more detail about the rationale behind the
> current code.

I don't think it matters much.  Just look at the two lines after
the division, they're both rounding the value up.  So clearly this
is expected to handle the case where work gets bunched up into the
first N CPUs, potentially leaving some CPUs unused.

But Daniel wrote the code so he can have the last say of whether
we should round up after the division or after the other two ops.

Daniel?

Cheers,
Daniel Jordan Aug. 19, 2024, 10:29 p.m. UTC | #13
On Sat, Aug 17, 2024 at 03:12:37PM GMT, Herbert Xu wrote:
> On Mon, Aug 12, 2024 at 10:04:07AM -0400, Waiman Long wrote:
> >
> > Anyway, using DIV_ROUND_UP() is a slight change in behavior as chunk_size
> > will be increased by 1 in most cases. I am a bit hesitant to make this
> > change without looking into more detail about the rationale behind the
> > current code.
> 
> I don't think it matters much.  Just look at the two lines after
> the division, they're both rounding the value up.  So clearly this
> is expected to handle the case where work gets bunched up into the
> first N CPUs, potentially leaving some CPUs unused.

Yeah, the caller is supposed to use min_chunk as a hint for what a
reasonable amount of work is per thread and so avoid wasteful amounts of
threads.

> But Daniel wrote the code so he can have the last say of whether
> we should round up after the division or after the other two ops.

I think either way works fine with the three existing users and how they
choose job->min_chunk and job->size.

The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine
oddball cases where rounding up is undesirable (say, near-zero values
for size, min_chunk, and align; padata_work_alloc_mt returns many fewer
works than requested; and a single unit of work is very expensive) so
that rounding up makes a bigger difference.  So, the way it now is seems
ok.


By the way, this bug must've happened coming from
hugetlb_pages_alloc_boot(), right, Waiman?  Because the other padata
users have hardcoded min_chunk.  I guess it was a case of

    h->max_huge_pages < num_node_state(N_MEMORY) * 2
Waiman Long Aug. 20, 2024, 12:07 a.m. UTC | #14
On 8/19/24 18:29, Daniel Jordan wrote:
> On Sat, Aug 17, 2024 at 03:12:37PM GMT, Herbert Xu wrote:
>> On Mon, Aug 12, 2024 at 10:04:07AM -0400, Waiman Long wrote:
>>> Anyway, using DIV_ROUND_UP() is a slight change in behavior as chunk_size
>>> will be increased by 1 in most cases. I am a bit hesitant to make this
>>> change without looking into more detail about the rationale behind the
>>> current code.
>> I don't think it matters much.  Just look at the two lines after
>> the division, they're both rounding the value up.  So clearly this
>> is expected to handle the case where work gets bunched up into the
>> first N CPUs, potentially leaving some CPUs unused.
> Yeah, the caller is supposed to use min_chunk as a hint for what a
> reasonable amount of work is per thread and so avoid wasteful amounts of
> threads.
>
>> But Daniel wrote the code so he can have the last say of whether
>> we should round up after the division or after the other two ops.
> I think either way works fine with the three existing users and how they
> choose job->min_chunk and job->size.
>
> The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine
> oddball cases where rounding up is undesirable (say, near-zero values
> for size, min_chunk, and align; padata_work_alloc_mt returns many fewer
> works than requested; and a single unit of work is very expensive) so
> that rounding up makes a bigger difference.  So, the way it now is seems
> ok.
>
>
> By the way, this bug must've happened coming from
> hugetlb_pages_alloc_boot(), right, Waiman?  Because the other padata
> users have hardcoded min_chunk.  I guess it was a case of
>
>      h->max_huge_pages < num_node_state(N_MEMORY) * 2
>
Yes, I guess the hugetlbfs caller is the cause of this div-by-0 problem. 
This is likely a bug that needs to be fixed. The current patch does 
guarantee that padata won't crash like that even with rogue caller.

Cheers,
Longman
Herbert Xu Aug. 20, 2024, 4:06 a.m. UTC | #15
On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote:
>
> The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine
> oddball cases where rounding up is undesirable (say, near-zero values
> for size, min_chunk, and align; padata_work_alloc_mt returns many fewer
> works than requested; and a single unit of work is very expensive) so
> that rounding up makes a bigger difference.  So, the way it now is seems
> ok.

In that case let's do the max ahead of the align check:

	ps.chunk_size = max(ps.chunk_size, 1ul);
	ps.chunk_size = roundup(ps.chunk_size, job->align);

If we do it after then it may come out unaligned (e.g., job->align = 8
and ps.chunk_size = 1).

Cheers,
Daniel Jordan Aug. 20, 2024, 9:24 p.m. UTC | #16
On Tue, Aug 20, 2024 at 12:06:47PM GMT, Herbert Xu wrote:
> On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote:
> >
> > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine
> > oddball cases where rounding up is undesirable (say, near-zero values
> > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer
> > works than requested; and a single unit of work is very expensive) so
> > that rounding up makes a bigger difference.  So, the way it now is seems
> > ok.
> 
> In that case let's do the max ahead of the align check:
> 
> 	ps.chunk_size = max(ps.chunk_size, 1ul);
> 	ps.chunk_size = roundup(ps.chunk_size, job->align);
> 
> If we do it after then it may come out unaligned (e.g., job->align = 8
> and ps.chunk_size = 1).

Sure, I think Kamlesh was the first to suggest max, so maybe Kamlesh
would like to make the change.  I'll send a patch otherwise.
Kamlesh Gurudasani Aug. 21, 2024, 8:10 a.m. UTC | #17
Daniel Jordan <daniel.m.jordan@oracle.com> writes:

> On Tue, Aug 20, 2024 at 12:06:47PM GMT, Herbert Xu wrote:
>> On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote:
>> >
>> > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine
>> > oddball cases where rounding up is undesirable (say, near-zero values
>> > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer
>> > works than requested; and a single unit of work is very expensive) so
>> > that rounding up makes a bigger difference.  So, the way it now is seems
>> > ok.
>> 
>> In that case let's do the max ahead of the align check:
>> 
>> 	ps.chunk_size = max(ps.chunk_size, 1ul);
>> 	ps.chunk_size = roundup(ps.chunk_size, job->align);
>> 
>> If we do it after then it may come out unaligned (e.g., job->align = 8
>> and ps.chunk_size = 1).
>
> Sure, I think Kamlesh was the first to suggest max, so maybe Kamlesh
> would like to make the change.  I'll send a patch otherwise.
Thanks for consideration, Daniel. I'll send a patch.

cheers,
Kamlesh
Kamlesh Gurudasani Aug. 21, 2024, 9:10 p.m. UTC | #18
Kamlesh Gurudasani <kamlesh@ti.com> writes:

> Daniel Jordan <daniel.m.jordan@oracle.com> writes:
>
>> On Tue, Aug 20, 2024 at 12:06:47PM GMT, Herbert Xu wrote:
>>> On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote:
>>> >
>>> > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine
>>> > oddball cases where rounding up is undesirable (say, near-zero values
>>> > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer
>>> > works than requested; and a single unit of work is very expensive) so
>>> > that rounding up makes a bigger difference.  So, the way it now is seems
>>> > ok.
>>> 
>>> In that case let's do the max ahead of the align check:
>>> 
>>> 	ps.chunk_size = max(ps.chunk_size, 1ul);
>>> 	ps.chunk_size = roundup(ps.chunk_size, job->align);
>>> 
>>> If we do it after then it may come out unaligned (e.g., job->align = 8
>>> and ps.chunk_size = 1).
>>
>> Sure, I think Kamlesh was the first to suggest max, so maybe Kamlesh
>> would like to make the change.  I'll send a patch otherwise.
> Thanks for consideration, Daniel. I'll send a patch.
Sent.

Just curious about one thing on line 495,

nworks = max(job->size / max(job->min_chunk, job->align), 1ul);

what happens if both min_chunk and align are 0.

cheers,
Kamlesh
Daniel Jordan Aug. 23, 2024, 12:45 a.m. UTC | #19
On Thu, Aug 22, 2024 at 02:40:57AM GMT, Kamlesh Gurudasani wrote:
> Kamlesh Gurudasani <kamlesh@ti.com> writes:
> 
> > Daniel Jordan <daniel.m.jordan@oracle.com> writes:
> >
> >> On Tue, Aug 20, 2024 at 12:06:47PM GMT, Herbert Xu wrote:
> >>> On Mon, Aug 19, 2024 at 06:29:52PM -0400, Daniel Jordan wrote:
> >>> >
> >>> > The DIV_ROUND_UP approach reads a bit nicer to me, but I can imagine
> >>> > oddball cases where rounding up is undesirable (say, near-zero values
> >>> > for size, min_chunk, and align; padata_work_alloc_mt returns many fewer
> >>> > works than requested; and a single unit of work is very expensive) so
> >>> > that rounding up makes a bigger difference.  So, the way it now is seems
> >>> > ok.
> >>> 
> >>> In that case let's do the max ahead of the align check:
> >>> 
> >>> 	ps.chunk_size = max(ps.chunk_size, 1ul);
> >>> 	ps.chunk_size = roundup(ps.chunk_size, job->align);
> >>> 
> >>> If we do it after then it may come out unaligned (e.g., job->align = 8
> >>> and ps.chunk_size = 1).
> >>
> >> Sure, I think Kamlesh was the first to suggest max, so maybe Kamlesh
> >> would like to make the change.  I'll send a patch otherwise.
> > Thanks for consideration, Daniel. I'll send a patch.
> Sent.
> 
> Just curious about one thing on line 495,
> 
> nworks = max(job->size / max(job->min_chunk, job->align), 1ul);
> 
> what happens if both min_chunk and align are 0.

That's a fair point.  It's another of those things that's not supposed
to happen, but it's worth making padata robust to it.
diff mbox series

Patch

diff --git a/kernel/padata.c b/kernel/padata.c
index 53f4bc912712..0fa6c2895460 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -517,6 +517,13 @@  void __init padata_do_multithreaded(struct padata_mt_job *job)
 	ps.chunk_size = max(ps.chunk_size, job->min_chunk);
 	ps.chunk_size = roundup(ps.chunk_size, job->align);
 
+	/*
+	 * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
+	 * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
+	 */
+	if (!ps.chunk_size)
+		ps.chunk_size = 1U;
+
 	list_for_each_entry(pw, &works, pw_list)
 		if (job->numa_aware) {
 			int old_node = atomic_read(&last_used_nid);