Message ID | 20220416103526.3287348-1-liupeng256@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4] hugetlb: Fix wrong use of nr_online_nodes | expand |
On 2022/4/16 18:35, Peng Liu wrote: > Certain systems are designed to have sparse/discontiguous nodes. In > this case, nr_online_nodes can not be used to walk through numa node. > Also, a valid node may be greater than nr_online_nodes. > > However, in hugetlb, it is assumed that nodes are contiguous. Recheck > all the places that use nr_online_nodes, and repair them one by one. > > Suggested-by: David Hildenbrand <david@redhat.com> > Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of gigantic pages can't work") > Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") > Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages parameter") > Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue warnings") > Signed-off-by: Peng Liu <liupeng256@huawei.com> > --- > v3->v4: > Make sure nid is valid before use node_online, and __alloc_bootmem_huge_page > is no need to check node_online, which is suggested by Kefeng. Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> > mm/hugetlb.c | 12 +++++------- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index b34f50156f7e..a386c5f94932 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2979,8 +2979,6 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) > struct huge_bootmem_page *m = NULL; /* initialize for clang */ > int nr_nodes, node; > > - if (nid != NUMA_NO_NODE && nid >= nr_online_nodes) > - return 0; > /* do node specific alloc */ > if (nid != NUMA_NO_NODE) { > m = memblock_alloc_try_nid_raw(huge_page_size(h), huge_page_size(h), > @@ -3088,7 +3086,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) > } > > /* do node specific alloc */ > - for (i = 0; i < nr_online_nodes; i++) { > + for_each_online_node(i) { > if (h->max_huge_pages_node[i] > 0) { > hugetlb_hstate_alloc_pages_onenode(h, i); > node_specific_alloc = true; > @@ -4049,7 +4047,7 @@ static int __init hugetlb_init(void) > default_hstate.max_huge_pages = > default_hstate_max_huge_pages; > > - for (i = 0; i < nr_online_nodes; i++) > + for_each_online_node(i) > default_hstate.max_huge_pages_node[i] = > default_hugepages_in_node[i]; > } > @@ -4164,9 +4162,9 @@ static int __init hugepages_setup(char *s) > pr_warn("HugeTLB: architecture can't support node specific alloc, ignoring!\n"); > return 0; > } > - if (tmp >= nr_online_nodes) > + if (tmp >= MAX_NUMNODES || !node_online(tmp)) > goto invalid; > - node = array_index_nospec(tmp, nr_online_nodes); > + node = array_index_nospec(tmp, MAX_NUMNODES); > p += count + 1; > /* Parse hugepages */ > if (sscanf(p, "%lu%n", &tmp, &count) != 1) > @@ -4294,7 +4292,7 @@ static int __init default_hugepagesz_setup(char *s) > */ > if (default_hstate_max_huge_pages) { > default_hstate.max_huge_pages = default_hstate_max_huge_pages; > - for (i = 0; i < nr_online_nodes; i++) > + for_each_online_node(i) > default_hstate.max_huge_pages_node[i] = > default_hugepages_in_node[i]; > if (hstate_is_gigantic(&default_hstate))
On Sat, 16 Apr 2022 10:35:26 +0000 Peng Liu <liupeng256@huawei.com> wrote: > Certain systems are designed to have sparse/discontiguous nodes. In > this case, nr_online_nodes can not be used to walk through numa node. > Also, a valid node may be greater than nr_online_nodes. > > However, in hugetlb, it is assumed that nodes are contiguous. Recheck > all the places that use nr_online_nodes, and repair them one by one. oops. What are the user-visible runtime effects of this flaw?
On 2022/4/19 12:03, Andrew Morton wrote: > On Sat, 16 Apr 2022 10:35:26 +0000 Peng Liu <liupeng256@huawei.com> wrote: > >> Certain systems are designed to have sparse/discontiguous nodes. In >> this case, nr_online_nodes can not be used to walk through numa node. >> Also, a valid node may be greater than nr_online_nodes. >> >> However, in hugetlb, it is assumed that nodes are contiguous. Recheck >> all the places that use nr_online_nodes, and repair them one by one. > oops. > > What are the user-visible runtime effects of this flaw? For example, there are four node=0,1,2,3, and nid = 1 is offline node,nr_online_nodes = 3 1) per-node alloc (hugepages=1:2) fails, 2) per-node alloc (hugepages=3:2) fails, but it could succeed. I assume that there is no user-visible runtime effects. > .
On 2022/4/19 22:07, Kefeng Wang wrote: > > On 2022/4/19 12:03, Andrew Morton wrote: >> On Sat, 16 Apr 2022 10:35:26 +0000 Peng Liu <liupeng256@huawei.com> >> wrote: >> >>> Certain systems are designed to have sparse/discontiguous nodes. In >>> this case, nr_online_nodes can not be used to walk through numa node. >>> Also, a valid node may be greater than nr_online_nodes. >>> >>> However, in hugetlb, it is assumed that nodes are contiguous. Recheck >>> all the places that use nr_online_nodes, and repair them one by one. >> oops. >> >> What are the user-visible runtime effects of this flaw? > > For example, there are four node=0,1,2,3, and nid = 1 is offline > node,nr_online_nodes = 3 > > 1) per-node alloc (hugepages=1:2) fails, > > 2) per-node alloc (hugepages=3:2) fails, but it could succeed. > > I assume that there is no user-visible runtime effects. > Thanks, you are right. I have constructed node =0, 1, 3, 4, and requested huge pages as: hugepagesz=1G hugepages=0:1,4:1 hugepagesz=2M hugepages=0:1024,4:1024 Without this patch: HugeTLB: Invalid hugepages parameter 4:1 HugeTLB: Invalid hugepages parameter 4:1024 HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages With this patch: HugeTLB registered 1.00 GiB page size, pre-allocated 2 pages HugeTLB registered 2.00 MiB page size, pre-allocated 2048 pages >> . > .
On 16.04.22 12:35, Peng Liu wrote: > Certain systems are designed to have sparse/discontiguous nodes. In > this case, nr_online_nodes can not be used to walk through numa node. > Also, a valid node may be greater than nr_online_nodes. > > However, in hugetlb, it is assumed that nodes are contiguous. Recheck > all the places that use nr_online_nodes, and repair them one by one. > > Suggested-by: David Hildenbrand <david@redhat.com> > Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of gigantic pages can't work") > Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") > Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages parameter") > Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue warnings") > Signed-off-by: Peng Liu <liupeng256@huawei.com> > --- Acked-by: David Hildenbrand <david@redhat.com>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b34f50156f7e..a386c5f94932 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2979,8 +2979,6 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) struct huge_bootmem_page *m = NULL; /* initialize for clang */ int nr_nodes, node; - if (nid != NUMA_NO_NODE && nid >= nr_online_nodes) - return 0; /* do node specific alloc */ if (nid != NUMA_NO_NODE) { m = memblock_alloc_try_nid_raw(huge_page_size(h), huge_page_size(h), @@ -3088,7 +3086,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) } /* do node specific alloc */ - for (i = 0; i < nr_online_nodes; i++) { + for_each_online_node(i) { if (h->max_huge_pages_node[i] > 0) { hugetlb_hstate_alloc_pages_onenode(h, i); node_specific_alloc = true; @@ -4049,7 +4047,7 @@ static int __init hugetlb_init(void) default_hstate.max_huge_pages = default_hstate_max_huge_pages; - for (i = 0; i < nr_online_nodes; i++) + for_each_online_node(i) default_hstate.max_huge_pages_node[i] = default_hugepages_in_node[i]; } @@ -4164,9 +4162,9 @@ static int __init hugepages_setup(char *s) pr_warn("HugeTLB: architecture can't support node specific alloc, ignoring!\n"); return 0; } - if (tmp >= nr_online_nodes) + if (tmp >= MAX_NUMNODES || !node_online(tmp)) goto invalid; - node = array_index_nospec(tmp, nr_online_nodes); + node = array_index_nospec(tmp, MAX_NUMNODES); p += count + 1; /* Parse hugepages */ if (sscanf(p, "%lu%n", &tmp, &count) != 1) @@ -4294,7 +4292,7 @@ static int __init default_hugepagesz_setup(char *s) */ if (default_hstate_max_huge_pages) { default_hstate.max_huge_pages = default_hstate_max_huge_pages; - for (i = 0; i < nr_online_nodes; i++) + for_each_online_node(i) default_hstate.max_huge_pages_node[i] = default_hugepages_in_node[i]; if (hstate_is_gigantic(&default_hstate))
Certain systems are designed to have sparse/discontiguous nodes. In this case, nr_online_nodes can not be used to walk through numa node. Also, a valid node may be greater than nr_online_nodes. However, in hugetlb, it is assumed that nodes are contiguous. Recheck all the places that use nr_online_nodes, and repair them one by one. Suggested-by: David Hildenbrand <david@redhat.com> Fixes: 4178158ef8ca ("hugetlbfs: fix issue of preallocation of gigantic pages can't work") Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") Fixes: e79ce9832316 ("hugetlbfs: fix a truncation issue in hugepages parameter") Fixes: f9317f77a6e0 ("hugetlb: clean up potential spectre issue warnings") Signed-off-by: Peng Liu <liupeng256@huawei.com> --- v3->v4: Make sure nid is valid before use node_online, and __alloc_bootmem_huge_page is no need to check node_online, which is suggested by Kefeng. mm/hugetlb.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)