diff mbox series

null_blk: prevent crash from bad home_node value

Message ID 20190405214245.19834-1-jpittman@redhat.com (mailing list archive)
State New, archived
Headers show
Series null_blk: prevent crash from bad home_node value | expand

Commit Message

John Pittman April 5, 2019, 9:42 p.m. UTC
At module load, if the selected home_node value is greater than
the available numa nodes, the system will crash in
__alloc_pages_nodemask() due to a bad paging request.  Prevent this
user error crash by detecting the bad value, logging an error, and
setting g_home_node back to the default of NUMA_NO_NODE.

Signed-off-by: John Pittman <jpittman@redhat.com>
---
 drivers/block/null_blk_main.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Jens Axboe April 5, 2019, 10:22 p.m. UTC | #1
On 4/5/19 3:42 PM, John Pittman wrote:
> At module load, if the selected home_node value is greater than
> the available numa nodes, the system will crash in
> __alloc_pages_nodemask() due to a bad paging request.  Prevent this
> user error crash by detecting the bad value, logging an error, and
> setting g_home_node back to the default of NUMA_NO_NODE.

Applied, thanks.
John Pittman April 6, 2019, 2:47 p.m. UTC | #2
Hi Jens, I got some unexpected results with this patch this morning.
When loading null_blk with no arguments, I got the invalid home_node
value error message.  I did some debug and found that the values
seemed fine:

+       pr_err("g_home_node is %d", g_home_node);
+       pr_err("nr_online_nodes is %d", nr_online_nodes);
+       pr_err("NUMA_NO_NODE is %d", NUMA_NO_NODE);
+
+       if (g_home_node >= nr_online_nodes) {
+               pr_err("null_blk: invalid home_node value\n");
+               g_home_node = NUMA_NO_NODE;
+       }
+
        if (g_queue_mode == NULL_Q_RQ) {
                pr_err("null_blk: legacy IO path no longer available\n");
                return -EINVAL;

[root@localhost linux]# modprobe null_blk
[  296.336473] g_home_node is -1
[  296.336474] nr_online_nodes is 1
[  296.336514] NUMA_NO_NODE is -1
[  296.336558] null_blk: invalid home_node value
[  296.338825] null: module loaded

[root@localhost linux]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 828 MB
node 0 free: 451 MB
node distances:
node   0
  0:  10

Fortunately, and also unexpectedly to me, I was able to fix the issue
just by casting nr_online_nodes as an int.

+       if (g_home_node >= (int)nr_online_nodes) {
+               pr_err("null_blk: invalid home_node value\n");
+               g_home_node = NUMA_NO_NODE;
+       }
+

Would you like me to send in a v2 for this?
Thanks.

On Fri, Apr 5, 2019 at 6:22 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 4/5/19 3:42 PM, John Pittman wrote:
> > At module load, if the selected home_node value is greater than
> > the available numa nodes, the system will crash in
> > __alloc_pages_nodemask() due to a bad paging request.  Prevent this
> > user error crash by detecting the bad value, logging an error, and
> > setting g_home_node back to the default of NUMA_NO_NODE.
>
> Applied, thanks.
>
> --
> Jens Axboe
>
Jens Axboe April 6, 2019, 4:50 p.m. UTC | #3
On 4/6/19 8:47 AM, John Pittman wrote:
> Hi Jens, I got some unexpected results with this patch this morning.
> When loading null_blk with no arguments, I got the invalid home_node
> value error message.  I did some debug and found that the values
> seemed fine:
> 
> +       pr_err("g_home_node is %d", g_home_node);
> +       pr_err("nr_online_nodes is %d", nr_online_nodes);
> +       pr_err("NUMA_NO_NODE is %d", NUMA_NO_NODE);
> +
> +       if (g_home_node >= nr_online_nodes) {
> +               pr_err("null_blk: invalid home_node value\n");
> +               g_home_node = NUMA_NO_NODE;
> +       }
> +
>         if (g_queue_mode == NULL_Q_RQ) {
>                 pr_err("null_blk: legacy IO path no longer available\n");
>                 return -EINVAL;
> 
> [root@localhost linux]# modprobe null_blk
> [  296.336473] g_home_node is -1
> [  296.336474] nr_online_nodes is 1
> [  296.336514] NUMA_NO_NODE is -1
> [  296.336558] null_blk: invalid home_node value
> [  296.338825] null: module loaded
> 
> [root@localhost linux]# numactl --hardware
> available: 1 nodes (0)
> node 0 cpus: 0 1 2 3
> node 0 size: 828 MB
> node 0 free: 451 MB
> node distances:
> node   0
>   0:  10
> 
> Fortunately, and also unexpectedly to me, I was able to fix the issue
> just by casting nr_online_nodes as an int.
> 
> +       if (g_home_node >= (int)nr_online_nodes) {
> +               pr_err("null_blk: invalid home_node value\n");
> +               g_home_node = NUMA_NO_NODE;
> +       }
> +
> 
> Would you like me to send in a v2 for this?

Well, I already added it... I'll just edit it. But I think we should
make it:

if (g_home_node != NUMA_NO_NODE && g_home_node >= nr_online_nodes)

instead.
John Pittman April 6, 2019, 10:11 p.m. UTC | #4
Yep, yours works better. :)  Thanks a lot Jens.


On Sat, Apr 6, 2019 at 12:50 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 4/6/19 8:47 AM, John Pittman wrote:
> > Hi Jens, I got some unexpected results with this patch this morning.
> > When loading null_blk with no arguments, I got the invalid home_node
> > value error message.  I did some debug and found that the values
> > seemed fine:
> >
> > +       pr_err("g_home_node is %d", g_home_node);
> > +       pr_err("nr_online_nodes is %d", nr_online_nodes);
> > +       pr_err("NUMA_NO_NODE is %d", NUMA_NO_NODE);
> > +
> > +       if (g_home_node >= nr_online_nodes) {
> > +               pr_err("null_blk: invalid home_node value\n");
> > +               g_home_node = NUMA_NO_NODE;
> > +       }
> > +
> >         if (g_queue_mode == NULL_Q_RQ) {
> >                 pr_err("null_blk: legacy IO path no longer available\n");
> >                 return -EINVAL;
> >
> > [root@localhost linux]# modprobe null_blk
> > [  296.336473] g_home_node is -1
> > [  296.336474] nr_online_nodes is 1
> > [  296.336514] NUMA_NO_NODE is -1
> > [  296.336558] null_blk: invalid home_node value
> > [  296.338825] null: module loaded
> >
> > [root@localhost linux]# numactl --hardware
> > available: 1 nodes (0)
> > node 0 cpus: 0 1 2 3
> > node 0 size: 828 MB
> > node 0 free: 451 MB
> > node distances:
> > node   0
> >   0:  10
> >
> > Fortunately, and also unexpectedly to me, I was able to fix the issue
> > just by casting nr_online_nodes as an int.
> >
> > +       if (g_home_node >= (int)nr_online_nodes) {
> > +               pr_err("null_blk: invalid home_node value\n");
> > +               g_home_node = NUMA_NO_NODE;
> > +       }
> > +
> >
> > Would you like me to send in a v2 for this?
>
> Well, I already added it... I'll just edit it. But I think we should
> make it:
>
> if (g_home_node != NUMA_NO_NODE && g_home_node >= nr_online_nodes)
>
> instead.
>
> --
> Jens Axboe
>
diff mbox series

Patch

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 417a9f15c116..c7fb7a133c8d 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1748,6 +1748,11 @@  static int __init null_init(void)
 		return -EINVAL;
 	}
 
+	if (g_home_node >= nr_online_nodes) {
+		pr_err("null_blk: invalid home_node value\n");
+		g_home_node = NUMA_NO_NODE;
+	}
+
 	if (g_queue_mode == NULL_Q_RQ) {
 		pr_err("null_blk: legacy IO path no longer available\n");
 		return -EINVAL;