Message ID | 20190405214245.19834-1-jpittman@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | null_blk: prevent crash from bad home_node value | expand |
On 4/5/19 3:42 PM, John Pittman wrote: > At module load, if the selected home_node value is greater than > the available numa nodes, the system will crash in > __alloc_pages_nodemask() due to a bad paging request. Prevent this > user error crash by detecting the bad value, logging an error, and > setting g_home_node back to the default of NUMA_NO_NODE. Applied, thanks.
Hi Jens, I got some unexpected results with this patch this morning. When loading null_blk with no arguments, I got the invalid home_node value error message. I did some debug and found that the values seemed fine: + pr_err("g_home_node is %d", g_home_node); + pr_err("nr_online_nodes is %d", nr_online_nodes); + pr_err("NUMA_NO_NODE is %d", NUMA_NO_NODE); + + if (g_home_node >= nr_online_nodes) { + pr_err("null_blk: invalid home_node value\n"); + g_home_node = NUMA_NO_NODE; + } + if (g_queue_mode == NULL_Q_RQ) { pr_err("null_blk: legacy IO path no longer available\n"); return -EINVAL; [root@localhost linux]# modprobe null_blk [ 296.336473] g_home_node is -1 [ 296.336474] nr_online_nodes is 1 [ 296.336514] NUMA_NO_NODE is -1 [ 296.336558] null_blk: invalid home_node value [ 296.338825] null: module loaded [root@localhost linux]# numactl --hardware available: 1 nodes (0) node 0 cpus: 0 1 2 3 node 0 size: 828 MB node 0 free: 451 MB node distances: node 0 0: 10 Fortunately, and also unexpectedly to me, I was able to fix the issue just by casting nr_online_nodes as an int. + if (g_home_node >= (int)nr_online_nodes) { + pr_err("null_blk: invalid home_node value\n"); + g_home_node = NUMA_NO_NODE; + } + Would you like me to send in a v2 for this? Thanks. On Fri, Apr 5, 2019 at 6:22 PM Jens Axboe <axboe@kernel.dk> wrote: > > On 4/5/19 3:42 PM, John Pittman wrote: > > At module load, if the selected home_node value is greater than > > the available numa nodes, the system will crash in > > __alloc_pages_nodemask() due to a bad paging request. Prevent this > > user error crash by detecting the bad value, logging an error, and > > setting g_home_node back to the default of NUMA_NO_NODE. > > Applied, thanks. > > -- > Jens Axboe >
On 4/6/19 8:47 AM, John Pittman wrote: > Hi Jens, I got some unexpected results with this patch this morning. > When loading null_blk with no arguments, I got the invalid home_node > value error message. I did some debug and found that the values > seemed fine: > > + pr_err("g_home_node is %d", g_home_node); > + pr_err("nr_online_nodes is %d", nr_online_nodes); > + pr_err("NUMA_NO_NODE is %d", NUMA_NO_NODE); > + > + if (g_home_node >= nr_online_nodes) { > + pr_err("null_blk: invalid home_node value\n"); > + g_home_node = NUMA_NO_NODE; > + } > + > if (g_queue_mode == NULL_Q_RQ) { > pr_err("null_blk: legacy IO path no longer available\n"); > return -EINVAL; > > [root@localhost linux]# modprobe null_blk > [ 296.336473] g_home_node is -1 > [ 296.336474] nr_online_nodes is 1 > [ 296.336514] NUMA_NO_NODE is -1 > [ 296.336558] null_blk: invalid home_node value > [ 296.338825] null: module loaded > > [root@localhost linux]# numactl --hardware > available: 1 nodes (0) > node 0 cpus: 0 1 2 3 > node 0 size: 828 MB > node 0 free: 451 MB > node distances: > node 0 > 0: 10 > > Fortunately, and also unexpectedly to me, I was able to fix the issue > just by casting nr_online_nodes as an int. > > + if (g_home_node >= (int)nr_online_nodes) { > + pr_err("null_blk: invalid home_node value\n"); > + g_home_node = NUMA_NO_NODE; > + } > + > > Would you like me to send in a v2 for this? Well, I already added it... I'll just edit it. But I think we should make it: if (g_home_node != NUMA_NO_NODE && g_home_node >= nr_online_nodes) instead.
Yep, yours works better. :) Thanks a lot Jens. On Sat, Apr 6, 2019 at 12:50 PM Jens Axboe <axboe@kernel.dk> wrote: > > On 4/6/19 8:47 AM, John Pittman wrote: > > Hi Jens, I got some unexpected results with this patch this morning. > > When loading null_blk with no arguments, I got the invalid home_node > > value error message. I did some debug and found that the values > > seemed fine: > > > > + pr_err("g_home_node is %d", g_home_node); > > + pr_err("nr_online_nodes is %d", nr_online_nodes); > > + pr_err("NUMA_NO_NODE is %d", NUMA_NO_NODE); > > + > > + if (g_home_node >= nr_online_nodes) { > > + pr_err("null_blk: invalid home_node value\n"); > > + g_home_node = NUMA_NO_NODE; > > + } > > + > > if (g_queue_mode == NULL_Q_RQ) { > > pr_err("null_blk: legacy IO path no longer available\n"); > > return -EINVAL; > > > > [root@localhost linux]# modprobe null_blk > > [ 296.336473] g_home_node is -1 > > [ 296.336474] nr_online_nodes is 1 > > [ 296.336514] NUMA_NO_NODE is -1 > > [ 296.336558] null_blk: invalid home_node value > > [ 296.338825] null: module loaded > > > > [root@localhost linux]# numactl --hardware > > available: 1 nodes (0) > > node 0 cpus: 0 1 2 3 > > node 0 size: 828 MB > > node 0 free: 451 MB > > node distances: > > node 0 > > 0: 10 > > > > Fortunately, and also unexpectedly to me, I was able to fix the issue > > just by casting nr_online_nodes as an int. > > > > + if (g_home_node >= (int)nr_online_nodes) { > > + pr_err("null_blk: invalid home_node value\n"); > > + g_home_node = NUMA_NO_NODE; > > + } > > + > > > > Would you like me to send in a v2 for this? > > Well, I already added it... I'll just edit it. But I think we should > make it: > > if (g_home_node != NUMA_NO_NODE && g_home_node >= nr_online_nodes) > > instead. > > -- > Jens Axboe >
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c index 417a9f15c116..c7fb7a133c8d 100644 --- a/drivers/block/null_blk_main.c +++ b/drivers/block/null_blk_main.c @@ -1748,6 +1748,11 @@ static int __init null_init(void) return -EINVAL; } + if (g_home_node >= nr_online_nodes) { + pr_err("null_blk: invalid home_node value\n"); + g_home_node = NUMA_NO_NODE; + } + if (g_queue_mode == NULL_Q_RQ) { pr_err("null_blk: legacy IO path no longer available\n"); return -EINVAL;
At module load, if the selected home_node value is greater than the available numa nodes, the system will crash in __alloc_pages_nodemask() due to a bad paging request. Prevent this user error crash by detecting the bad value, logging an error, and setting g_home_node back to the default of NUMA_NO_NODE. Signed-off-by: John Pittman <jpittman@redhat.com> --- drivers/block/null_blk_main.c | 5 +++++ 1 file changed, 5 insertions(+)