diff mbox series

[v3,1/4] hugetlb: Fix wrong use of nr_online_nodes

Message ID 20220413032915.251254-2-liupeng256@huawei.com (mailing list archive)
State New
Headers show
Series hugetlb: Fix some incorrect behavior | expand

Commit Message

Peng Liu April 13, 2022, 3:29 a.m. UTC
Certain systems are designed to have sparse/discontiguous nodes. In
this case, nr_online_nodes can not be used to walk through numa node.
Also, a valid node may be greater than nr_online_nodes.

However, in hugetlb, it is assumed that nodes are contiguous. Recheck
all the places that use nr_online_nodes, and repair them one by one.

Suggested-by: David Hildenbrand <david@redhat.com>
Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of gigantic pages can't work")
Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages parameter")
Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue warnings")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
---
 mm/hugetlb.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Andrew Morton April 13, 2022, 4:42 a.m. UTC | #1
On Wed, 13 Apr 2022 03:29:12 +0000 Peng Liu <liupeng256@huawei.com> wrote:

> Certain systems are designed to have sparse/discontiguous nodes. In
> this case, nr_online_nodes can not be used to walk through numa node.
> Also, a valid node may be greater than nr_online_nodes.
> 
> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
> all the places that use nr_online_nodes, and repair them one by one.
> 

What are the runtime effects of this shortcoming?
Peng Liu April 13, 2022, 6:27 a.m. UTC | #2
On 2022/4/13 12:42, Andrew Morton wrote:
> On Wed, 13 Apr 2022 03:29:12 +0000 Peng Liu<liupeng256@huawei.com>  wrote:
>
>> Certain systems are designed to have sparse/discontiguous nodes. In
>> this case, nr_online_nodes can not be used to walk through numa node.
>> Also, a valid node may be greater than nr_online_nodes.
>>
>> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
>> all the places that use nr_online_nodes, and repair them one by one.
>>
> What are the runtime effects of this shortcoming?
> .

For sparse/discontiguous nodes, the current code may treat a valid node
as invalid, and will fail to allocate all hugepages on a valid node that
"nid >= nr_online_nodes".

As David suggested:
if (tmp >= nr_online_nodes)
	goto invalid;

Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming
that "node < 2" is valid is wrong.
Baolin Wang April 13, 2022, 6:29 a.m. UTC | #3
On 4/13/2022 11:29 AM, Peng Liu wrote:
> Certain systems are designed to have sparse/discontiguous nodes. In
> this case, nr_online_nodes can not be used to walk through numa node.
> Also, a valid node may be greater than nr_online_nodes.
> 
> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
> all the places that use nr_online_nodes, and repair them one by one.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of gigantic pages can't work")
> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages parameter")
> Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue warnings")
> Signed-off-by: Peng Liu <liupeng256@huawei.com>

LGTM.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Andrew Morton April 13, 2022, 10:04 p.m. UTC | #4
On Wed, 13 Apr 2022 14:27:54 +0800 "liupeng (DM)" <liupeng256@huawei.com> wrote:

> 
> On 2022/4/13 12:42, Andrew Morton wrote:
> > On Wed, 13 Apr 2022 03:29:12 +0000 Peng Liu<liupeng256@huawei.com>  wrote:
> >
> >> Certain systems are designed to have sparse/discontiguous nodes. In
> >> this case, nr_online_nodes can not be used to walk through numa node.
> >> Also, a valid node may be greater than nr_online_nodes.
> >>
> >> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
> >> all the places that use nr_online_nodes, and repair them one by one.
> >>
> > What are the runtime effects of this shortcoming?
> > .
> 
> For sparse/discontiguous nodes, the current code may treat a valid node
> as invalid, and will fail to allocate all hugepages on a valid node that
> "nid >= nr_online_nodes".
> 
> As David suggested:
> if (tmp >= nr_online_nodes)
> 	goto invalid;
> 
> Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming
> that "node < 2" is valid is wrong.

So do you think we should backport thtis fix into earlier kernel releases?
Peng Liu April 14, 2022, 1:28 a.m. UTC | #5
On 2022/4/14 6:04, Andrew Morton wrote:
> On Wed, 13 Apr 2022 14:27:54 +0800 "liupeng (DM)"<liupeng256@huawei.com>  wrote:
>
>> On 2022/4/13 12:42, Andrew Morton wrote:
>>> On Wed, 13 Apr 2022 03:29:12 +0000 Peng Liu<liupeng256@huawei.com>   wrote:
>>>
>>>> Certain systems are designed to have sparse/discontiguous nodes. In
>>>> this case, nr_online_nodes can not be used to walk through numa node.
>>>> Also, a valid node may be greater than nr_online_nodes.
>>>>
>>>> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
>>>> all the places that use nr_online_nodes, and repair them one by one.
>>>>
>>> What are the runtime effects of this shortcoming?
>>> .
>> For sparse/discontiguous nodes, the current code may treat a valid node
>> as invalid, and will fail to allocate all hugepages on a valid node that
>> "nid >= nr_online_nodes".
>>
>> As David suggested:
>> if (tmp >= nr_online_nodes)
>> 	goto invalid;
>>
>> Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming
>> that "node < 2" is valid is wrong.
> So do you think we should backport thtis fix into earlier kernel releases?
> .

I think it is not an urgent bug, because:
1) Qemu does not support sparse node so far, although there are some sparse-node
issues to make qemu support sparse node.
2) I don't find an actual normal machine that reports sparse-node and need to
use hugepages so far.
Mike Kravetz April 14, 2022, 11:36 p.m. UTC | #6
On 4/12/22 20:29, Peng Liu wrote:
> Certain systems are designed to have sparse/discontiguous nodes. In
> this case, nr_online_nodes can not be used to walk through numa node.
> Also, a valid node may be greater than nr_online_nodes.
> 
> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
> all the places that use nr_online_nodes, and repair them one by one.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of gigantic pages can't work")
> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages parameter")
> Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue warnings")
> Signed-off-by: Peng Liu <liupeng256@huawei.com>
> ---
>  mm/hugetlb.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)

Thank you!

I am guessing that at one time nodes were contiguous at least at boot time.
When that changed, hugetlb was not updated. :(

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Davidlohr Bueso April 15, 2022, 2:09 a.m. UTC | #7
On Wed, 13 Apr 2022, Peng Liu wrote:

>Certain systems are designed to have sparse/discontiguous nodes. In
>this case, nr_online_nodes can not be used to walk through numa node.
>Also, a valid node may be greater than nr_online_nodes.
>
>However, in hugetlb, it is assumed that nodes are contiguous. Recheck
>all the places that use nr_online_nodes, and repair them one by one.
>
>Suggested-by: David Hildenbrand <david@redhat.com>
>Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of gigantic pages can't work")
>Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
>Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages parameter")
>Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue warnings")
>Signed-off-by: Peng Liu <liupeng256@huawei.com>
>Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

... but

>---
> mm/hugetlb.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
>diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>index b34f50156f7e..5b5a2a5a742f 100644
>--- a/mm/hugetlb.c
>+++ b/mm/hugetlb.c
>@@ -2979,7 +2979,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
>	struct huge_bootmem_page *m = NULL; /* initialize for clang */
>	int nr_nodes, node;
>
>-	if (nid != NUMA_NO_NODE && nid >= nr_online_nodes)
>+	if (nid != NUMA_NO_NODE && !node_online(nid))

afaict null_blk could also use this, actually the whole thing wants a
helper - node_valid()?
Kefeng Wang April 15, 2022, 5:41 a.m. UTC | #8
On 2022/4/15 10:09, Davidlohr Bueso wrote:
> On Wed, 13 Apr 2022, Peng Liu wrote:
>
>> Certain systems are designed to have sparse/discontiguous nodes. In
>> this case, nr_online_nodes can not be used to walk through numa node.
>> Also, a valid node may be greater than nr_online_nodes.
>>
>> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
>> all the places that use nr_online_nodes, and repair them one by one.
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of 
>> gigantic pages can't work")
>> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages 
>> parameter to support node allocation")
>> Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages 
>> parameter")
>> Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue 
>> warnings")
>> Signed-off-by: Peng Liu <liupeng256@huawei.com>
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>
> ... but
>
>> ---
>> mm/hugetlb.c | 12 ++++++------
>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index b34f50156f7e..5b5a2a5a742f 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -2979,7 +2979,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, 
>> int nid)
>>     struct huge_bootmem_page *m = NULL; /* initialize for clang */
>>     int nr_nodes, node;
>>
>> -    if (nid != NUMA_NO_NODE && nid >= nr_online_nodes)
>> +    if (nid != NUMA_NO_NODE && !node_online(nid))
>
> afaict null_blk could also use this, actually the whole thing wants a
> helper - node_valid()?
>
This one should be unnecessary, and this patch looks has a bug,

if a very nid passed to node_online(), it may crash,  could you re-check 
it,

see my changes below,

1) add tmp check against MAX_NUMNODES before node_online() check,

     and move it after get tmp in hugepages_setup() , this could cover 
both per-node alloc and normal alloc

2) due to for_each_online_node() usage, we can drop additional check of 
nid in __alloc_bootmem_huge_page()


$ git diff
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fb5a549169ce..5a3ddec181a0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2986,8 +2986,6 @@ int __alloc_bootmem_huge_page(struct hstate *h, 
int nid)
         struct huge_bootmem_page *m = NULL; /* initialize for clang */
         int nr_nodes, node;

-       if (nid != NUMA_NO_NODE && nid >= nr_online_nodes)
-               return 0;
         /* do node specific alloc */
         if (nid != NUMA_NO_NODE) {
                 m = memblock_alloc_try_nid_raw(huge_page_size(h), 
huge_page_size(h),
@@ -3095,7 +3093,7 @@ static void __init 
hugetlb_hstate_alloc_pages(struct hstate *h)
         }

         /* do node specific alloc */
-       for (i = 0; i < nr_online_nodes; i++) {
+       for_each_online_node(i) {
                 if (h->max_huge_pages_node[i] > 0) {
                         hugetlb_hstate_alloc_pages_onenode(h, i);
                         node_specific_alloc = true;
@@ -4059,7 +4057,7 @@ static int __init hugetlb_init(void)
                         default_hstate.max_huge_pages =
                                 default_hstate_max_huge_pages;

-                       for (i = 0; i < nr_online_nodes; i++)
+                       for_each_online_node(i)
default_hstate.max_huge_pages_node[i] =
default_hugepages_in_node[i];
                 }
@@ -4168,15 +4166,15 @@ static int __init hugepages_setup(char *s)
                 count = 0;
                 if (sscanf(p, "%lu%n", &tmp, &count) != 1)
                         goto invalid;
+               if (tmp > MAX_NUMNODES || !node_online(tmp))
+                       goto invalid;
                 /* Parameter is node format */
                 if (p[count] == ':') {
                         if (!hugetlb_node_alloc_supported()) {
                                 pr_warn("HugeTLB: architecture can't 
support node specific alloc, ignoring!\n");
                                 return 0;
                         }
-                       if (tmp >= nr_online_nodes)
-                               goto invalid;
-                       node = array_index_nospec(tmp, nr_online_nodes);
+                       node = array_index_nospec(tmp, MAX_NUMNODES);
                         p += count + 1;
                         /* Parse hugepages */
                         if (sscanf(p, "%lu%n", &tmp, &count) != 1)
@@ -4304,7 +4302,7 @@ static int __init default_hugepagesz_setup(char *s)
          */
         if (default_hstate_max_huge_pages) {
                 default_hstate.max_huge_pages = 
default_hstate_max_huge_pages;
-               for (i = 0; i < nr_online_nodes; i++)
+               for_each_online_node(i)
                         default_hstate.max_huge_pages_node[i] =
                                 default_hugepages_in_node[i];
                 if (hstate_is_gigantic(&default_hstate))


> .
Peng Liu April 15, 2022, 7:01 a.m. UTC | #9
On 2022/4/15 13:41, Kefeng Wang wrote:
>
> On 2022/4/15 10:09, Davidlohr Bueso wrote:
>> On Wed, 13 Apr 2022, Peng Liu wrote:
>>
>>> Certain systems are designed to have sparse/discontiguous nodes. In
>>> this case, nr_online_nodes can not be used to walk through numa node.
>>> Also, a valid node may be greater than nr_online_nodes.
>>>
>>> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
>>> all the places that use nr_online_nodes, and repair them one by one.
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of 
>>> gigantic pages can't work")
>>> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages 
>>> parameter to support node allocation")
>>> Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages 
>>> parameter")
>>> Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue 
>>> warnings")
>>> Signed-off-by: Peng Liu <liupeng256@huawei.com>
>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>
>> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>>
>> ... but
>>
>>> ---
>>> mm/hugetlb.c | 12 ++++++------
>>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index b34f50156f7e..5b5a2a5a742f 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2979,7 +2979,7 @@ int __alloc_bootmem_huge_page(struct hstate 
>>> *h, int nid)
>>>     struct huge_bootmem_page *m = NULL; /* initialize for clang */
>>>     int nr_nodes, node;
>>>
>>> -    if (nid != NUMA_NO_NODE && nid >= nr_online_nodes)
>>> +    if (nid != NUMA_NO_NODE && !node_online(nid))
>>
>> afaict null_blk could also use this, actually the whole thing wants a
>> helper - node_valid()?
>>
> This one should be unnecessary, and this patch looks has a bug,
>
> if a very nid passed to node_online(), it may crash,  could you 
> re-check it,
>
> see my changes below,
>
> 1) add tmp check against MAX_NUMNODES before node_online() check,
>
>     and move it after get tmp in hugepages_setup() , this could cover 
> both per-node alloc and normal alloc
>
> 2) due to for_each_online_node() usage, we can drop additional check 
> of nid in __alloc_bootmem_huge_page()
>
>
> $ git diff
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fb5a549169ce..5a3ddec181a0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2986,8 +2986,6 @@ int __alloc_bootmem_huge_page(struct hstate *h, 
> int nid)
>         struct huge_bootmem_page *m = NULL; /* initialize for clang */
>         int nr_nodes, node;
>
> -       if (nid != NUMA_NO_NODE && nid >= nr_online_nodes)
> -               return 0;
>         /* do node specific alloc */
>         if (nid != NUMA_NO_NODE) {
>                 m = memblock_alloc_try_nid_raw(huge_page_size(h), 
> huge_page_size(h),
> @@ -3095,7 +3093,7 @@ static void __init 
> hugetlb_hstate_alloc_pages(struct hstate *h)
>         }
>
>         /* do node specific alloc */
> -       for (i = 0; i < nr_online_nodes; i++) {
> +       for_each_online_node(i) {
>                 if (h->max_huge_pages_node[i] > 0) {
>                         hugetlb_hstate_alloc_pages_onenode(h, i);
>                         node_specific_alloc = true;
> @@ -4059,7 +4057,7 @@ static int __init hugetlb_init(void)
>                         default_hstate.max_huge_pages =
>                                 default_hstate_max_huge_pages;
>
> -                       for (i = 0; i < nr_online_nodes; i++)
> +                       for_each_online_node(i)
> default_hstate.max_huge_pages_node[i] =
> default_hugepages_in_node[i];
>                 }
> @@ -4168,15 +4166,15 @@ static int __init hugepages_setup(char *s)
>                 count = 0;
>                 if (sscanf(p, "%lu%n", &tmp, &count) != 1)
>                         goto invalid;
> +               if (tmp > MAX_NUMNODES || !node_online(tmp))
> +                       goto invalid;
>                 /* Parameter is node format */
>                 if (p[count] == ':') {
>                         if (!hugetlb_node_alloc_supported()) {
>                                 pr_warn("HugeTLB: architecture can't 
> support node specific alloc, ignoring!\n");
>                                 return 0;
>                         }
> -                       if (tmp >= nr_online_nodes)
> -                               goto invalid;
> -                       node = array_index_nospec(tmp, nr_online_nodes);
> +                       node = array_index_nospec(tmp, MAX_NUMNODES);
>                         p += count + 1;
>                         /* Parse hugepages */
>                         if (sscanf(p, "%lu%n", &tmp, &count) != 1)
> @@ -4304,7 +4302,7 @@ static int __init default_hugepagesz_setup(char *s)
>          */
>         if (default_hstate_max_huge_pages) {
>                 default_hstate.max_huge_pages = 
> default_hstate_max_huge_pages;
> -               for (i = 0; i < nr_online_nodes; i++)
> +               for_each_online_node(i)
>                         default_hstate.max_huge_pages_node[i] =
>                                 default_hugepages_in_node[i];
>                 if (hstate_is_gigantic(&default_hstate))
>
>
Yes, node_online is not a safe function which will cause panic if a very
big number nid is received. So, this patch needs to be modified.
Thanks.
Kefeng Wang April 16, 2022, 1:21 a.m. UTC | #10
On 2022/4/15 13:41, Kefeng Wang wrote:
>
> On 2022/4/15 10:09, Davidlohr Bueso wrote:
>> On Wed, 13 Apr 2022, Peng Liu wrote:
>>
>>> Certain systems are designed to have sparse/discontiguous nodes. In
>>> this case, nr_online_nodes can not be used to walk through numa node.
>>> Also, a valid node may be greater than nr_online_nodes.
>>>
>>> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
>>> all the places that use nr_online_nodes, and repair them one by one.
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of 
>>> gigantic pages can't work")
>>> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages 
>>> parameter to support node allocation")
>>> Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages 
>>> parameter")
>>> Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue 
>>> warnings")
>>> Signed-off-by: Peng Liu <liupeng256@huawei.com>
>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>
>> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>>
>> ... but
>>
>>> ---
>>> mm/hugetlb.c | 12 ++++++------
>>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index b34f50156f7e..5b5a2a5a742f 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2979,7 +2979,7 @@ int __alloc_bootmem_huge_page(struct hstate 
>>> *h, int nid)
>>>     struct huge_bootmem_page *m = NULL; /* initialize for clang */
>>>     int nr_nodes, node;
>>>
>>> -    if (nid != NUMA_NO_NODE && nid >= nr_online_nodes)
>>> +    if (nid != NUMA_NO_NODE && !node_online(nid))
>>
>> afaict null_blk could also use this, actually the whole thing wants a
>> helper - node_valid()?
>>
> This one should be unnecessary, and this patch looks has a bug,
>
> if a very nid passed to node_online(), it may crash,  could you 
> re-check it,
>
> see my changes below,
>
> 1) add tmp check against MAX_NUMNODES before node_online() check,
>
>     and move it after get tmp in hugepages_setup() , this could cover 
> both per-node alloc and normal alloc

sorry,for normal alloc, tmp is the number of huge pages, we don't  need 
the movement,   only add tmp >= MAX_NUMNODES is ok

>
> 2) due to for_each_online_node() usage, we can drop additional check 
> of nid in __alloc_bootmem_huge_page()
>
Andrew Morton April 19, 2022, 4:40 a.m. UTC | #11
On Sat, 16 Apr 2022 09:21:45 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:

> 
> On 2022/4/15 13:41, Kefeng Wang wrote:
> >
> > On 2022/4/15 10:09, Davidlohr Bueso wrote:
> >> On Wed, 13 Apr 2022, Peng Liu wrote:
> >>
> >>> Certain systems are designed to have sparse/discontiguous nodes. In
> >>> this case, nr_online_nodes can not be used to walk through numa node.
> >>> Also, a valid node may be greater than nr_online_nodes.
> >>>
> >>> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
> >>> all the places that use nr_online_nodes, and repair them one by one.
> >>>
> >>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>> Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of 
> >>> gigantic pages can't work")
> >>> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages 
> >>> parameter to support node allocation")
> >>> Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages 
> >>> parameter")
> >>> Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue 
> >>> warnings")
> >>> Signed-off-by: Peng Liu <liupeng256@huawei.com>
> >>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> >>
> >> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> >>
> >> ... but
> >>
> >>> ---
> >>> mm/hugetlb.c | 12 ++++++------
> >>> 1 file changed, 6 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index b34f50156f7e..5b5a2a5a742f 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -2979,7 +2979,7 @@ int __alloc_bootmem_huge_page(struct hstate 
> >>> *h, int nid)
> >>>     struct huge_bootmem_page *m = NULL; /* initialize for clang */
> >>>     int nr_nodes, node;
> >>>
> >>> -    if (nid != NUMA_NO_NODE && nid >= nr_online_nodes)
> >>> +    if (nid != NUMA_NO_NODE && !node_online(nid))
> >>
> >> afaict null_blk could also use this, actually the whole thing wants a
> >> helper - node_valid()?
> >>
> > This one should be unnecessary, and this patch looks has a bug,
> >
> > if a very nid passed to node_online(), it may crash,  could you 
> > re-check it,
> >
> > see my changes below,
> >
> > 1) add tmp check against MAX_NUMNODES before node_online() check,
> >
> >     and move it after get tmp in hugepages_setup() , this could cover 
> > both per-node alloc and normal alloc
> 
> sorry,for normal alloc, tmp is the number of huge pages, we don't  need 
> the movement,   only add tmp >= MAX_NUMNODES is ok
> 

Does the v4 patch address the issues which were raised in this thread?


--- a/mm/hugetlb.c~hugetlb-fix-wrong-use-of-nr_online_nodes-v4
+++ a/mm/hugetlb.c
@@ -2986,8 +2986,6 @@ int __alloc_bootmem_huge_page(struct hst
 	struct huge_bootmem_page *m = NULL; /* initialize for clang */
 	int nr_nodes, node;
 
-	if (nid != NUMA_NO_NODE && !node_online(nid))
-		return 0;
 	/* do node specific alloc */
 	if (nid != NUMA_NO_NODE) {
 		m = memblock_alloc_try_nid_raw(huge_page_size(h), huge_page_size(h),
@@ -4174,7 +4172,7 @@ static int __init hugepages_setup(char *
 				pr_warn("HugeTLB: architecture can't support node specific alloc, ignoring!\n");
 				return 0;
 			}
-			if (!node_online(tmp))
+			if (tmp >= MAX_NUMNODES || !node_online(tmp))
 				goto invalid;
 			node = array_index_nospec(tmp, MAX_NUMNODES);
 			p += count + 1;
Kefeng Wang April 19, 2022, 8:54 a.m. UTC | #12
On 2022/4/19 12:40, Andrew Morton wrote:
> On Sat, 16 Apr 2022 09:21:45 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>> On 2022/4/15 13:41, Kefeng Wang wrote:
>>> On 2022/4/15 10:09, Davidlohr Bueso wrote:
>>>> On Wed, 13 Apr 2022, Peng Liu wrote:
>>>>
>>>>> Certain systems are designed to have sparse/discontiguous nodes. In
>>>>> this case, nr_online_nodes can not be used to walk through numa node.
>>>>> Also, a valid node may be greater than nr_online_nodes.
>>>>>
>>>>> However, in hugetlb, it is assumed that nodes are contiguous. Recheck
>>>>> all the places that use nr_online_nodes, and repair them one by one.
>>>>>
>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>> Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of
>>>>> gigantic pages can't work")
>>>>> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages
>>>>> parameter to support node allocation")
>>>>> Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages
>>>>> parameter")
>>>>> Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue
>>>>> warnings")
>>>>> Signed-off-by: Peng Liu <liupeng256@huawei.com>
>>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>>> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>>>>
>>>> ... but
>>>>
>>>>> ---
>>>>> mm/hugetlb.c | 12 ++++++------
>>>>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>> index b34f50156f7e..5b5a2a5a742f 100644
>>>>> --- a/mm/hugetlb.c
>>>>> +++ b/mm/hugetlb.c
>>>>> @@ -2979,7 +2979,7 @@ int __alloc_bootmem_huge_page(struct hstate
>>>>> *h, int nid)
>>>>>      struct huge_bootmem_page *m = NULL; /* initialize for clang */
>>>>>      int nr_nodes, node;
>>>>>
>>>>> -    if (nid != NUMA_NO_NODE && nid >= nr_online_nodes)
>>>>> +    if (nid != NUMA_NO_NODE && !node_online(nid))
>>>> afaict null_blk could also use this, actually the whole thing wants a
>>>> helper - node_valid()?
>>>>
>>> This one should be unnecessary, and this patch looks has a bug,
>>>
>>> if a very nid passed to node_online(), it may crash,  could you
>>> re-check it,
>>>
>>> see my changes below,
>>>
>>> 1) add tmp check against MAX_NUMNODES before node_online() check,
>>>
>>>      and move it after get tmp in hugepages_setup() , this could cover
>>> both per-node alloc and normal alloc
>> sorry,for normal alloc, tmp is the number of huge pages, we don't  need
>> the movement,   only add tmp >= MAX_NUMNODES is ok
>>
> Does the v4 patch address the issues which were raised in this thread?
Yes, v4 has fix this.
>
>
> --- a/mm/hugetlb.c~hugetlb-fix-wrong-use-of-nr_online_nodes-v4
> +++ a/mm/hugetlb.c
> @@ -2986,8 +2986,6 @@ int __alloc_bootmem_huge_page(struct hst
>   	struct huge_bootmem_page *m = NULL; /* initialize for clang */
>   	int nr_nodes, node;
>   
> -	if (nid != NUMA_NO_NODE && !node_online(nid))
> -		return 0;
>   	/* do node specific alloc */
>   	if (nid != NUMA_NO_NODE) {
>   		m = memblock_alloc_try_nid_raw(huge_page_size(h), huge_page_size(h),
> @@ -4174,7 +4172,7 @@ static int __init hugepages_setup(char *
>   				pr_warn("HugeTLB: architecture can't support node specific alloc, ignoring!\n");
>   				return 0;
>   			}
> -			if (!node_online(tmp))
> +			if (tmp >= MAX_NUMNODES || !node_online(tmp))
>   				goto invalid;
>   			node = array_index_nospec(tmp, MAX_NUMNODES);
>   			p += count + 1;
> _
>
> .
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b34f50156f7e..5b5a2a5a742f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2979,7 +2979,7 @@  int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 	struct huge_bootmem_page *m = NULL; /* initialize for clang */
 	int nr_nodes, node;
 
-	if (nid != NUMA_NO_NODE && nid >= nr_online_nodes)
+	if (nid != NUMA_NO_NODE && !node_online(nid))
 		return 0;
 	/* do node specific alloc */
 	if (nid != NUMA_NO_NODE) {
@@ -3088,7 +3088,7 @@  static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 	}
 
 	/* do node specific alloc */
-	for (i = 0; i < nr_online_nodes; i++) {
+	for_each_online_node(i) {
 		if (h->max_huge_pages_node[i] > 0) {
 			hugetlb_hstate_alloc_pages_onenode(h, i);
 			node_specific_alloc = true;
@@ -4049,7 +4049,7 @@  static int __init hugetlb_init(void)
 			default_hstate.max_huge_pages =
 				default_hstate_max_huge_pages;
 
-			for (i = 0; i < nr_online_nodes; i++)
+			for_each_online_node(i)
 				default_hstate.max_huge_pages_node[i] =
 					default_hugepages_in_node[i];
 		}
@@ -4164,9 +4164,9 @@  static int __init hugepages_setup(char *s)
 				pr_warn("HugeTLB: architecture can't support node specific alloc, ignoring!\n");
 				return 0;
 			}
-			if (tmp >= nr_online_nodes)
+			if (!node_online(tmp))
 				goto invalid;
-			node = array_index_nospec(tmp, nr_online_nodes);
+			node = array_index_nospec(tmp, MAX_NUMNODES);
 			p += count + 1;
 			/* Parse hugepages */
 			if (sscanf(p, "%lu%n", &tmp, &count) != 1)
@@ -4294,7 +4294,7 @@  static int __init default_hugepagesz_setup(char *s)
 	 */
 	if (default_hstate_max_huge_pages) {
 		default_hstate.max_huge_pages = default_hstate_max_huge_pages;
-		for (i = 0; i < nr_online_nodes; i++)
+		for_each_online_node(i)
 			default_hstate.max_huge_pages_node[i] =
 				default_hugepages_in_node[i];
 		if (hstate_is_gigantic(&default_hstate))