Message ID | 20190930015213.8865-2-xiubli@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: Avoid memory reclaim when allocating | expand |
On 2019/09/29 18:52, xiubli@redhat.com wrote: > From: Xiubo Li <xiubli@redhat.com> > > For some storage drivers, such as the nbd, when there has new socket > connections added, it will update the hardware queue number by calling > blk_mq_update_nr_hw_queues(), in which it will freeze all the queues > first. And then tries to do the hardware queue updating stuff. > > But int blk_mq_alloc_rq_map()-->blk_mq_init_tags(), when allocating > memory for tags, it may cause the mm do the memories direct reclaiming, > since the queues has been freezed, so if needs to flush the page cache > to disk, it will stuck in generic_make_request()-->blk_queue_enter() by > waiting the queues to be unfreezed and then cause deadlock here. > > Since the memory size requested here is a small one, which will make > it not that easy to happen with a large size, but in theory this could > happen when the OS is running in pressure and out of memory. > > Gabriel Krisman Bertazi has hit the similar issue by fixing it in > commit 36e1f3d10786 ("blk-mq: Avoid memory reclaim when remapping > queues"), but might forget this part. > > Signed-off-by: Xiubo Li <xiubli@redhat.com> > CC: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> > Reviewed-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq-tag.c | 5 +++-- > block/blk-mq-tag.h | 5 ++++- > block/blk-mq.c | 3 ++- > 3 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > index 008388e82b5c..04ee0e4c3fa1 100644 > --- a/block/blk-mq-tag.c > +++ b/block/blk-mq-tag.c > @@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags, > > struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, > unsigned int reserved_tags, > - int node, int alloc_policy) > + int node, int alloc_policy, > + gfp_t flags) > { > struct blk_mq_tags *tags; > > @@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, > return NULL; > } > > - tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node); > + tags = kzalloc_node(sizeof(*tags), flags, node); > if (!tags) > return NULL; > > diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h > index 61deab0b5a5a..296e0bc97126 100644 > --- a/block/blk-mq-tag.h > +++ b/block/blk-mq-tag.h > @@ -22,7 +22,10 @@ struct blk_mq_tags { > }; > > > -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy); > +extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, > + unsigned int reserved_tags, > + int node, int alloc_policy, > + gfp_t flags); > extern void blk_mq_free_tags(struct blk_mq_tags *tags); > > extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data); > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 240416057f28..9c52e4dfe132 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, > node = set->numa_node; > > tags = blk_mq_init_tags(nr_tags, reserved_tags, node, > - BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags)); > + BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags), > + GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY); You added the gfp_t argument to blk_mq_init_tags() but you are only using that argument with a hardcoded value here. So why not simply call kzalloc_node() in that function with the flags GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY ? That would avoid needing to add the "gfp_t flags" argument and still fit with your patch 2 definition of BLK_MQ_GFP_FLAGS. > if (!tags) > return NULL; > >
On 2019/9/30 13:28, Damien Le Moal wrote: > On 2019/09/29 18:52, xiubli@redhat.com wrote: >> From: Xiubo Li <xiubli@redhat.com> >> >> For some storage drivers, such as the nbd, when there has new socket >> connections added, it will update the hardware queue number by calling >> blk_mq_update_nr_hw_queues(), in which it will freeze all the queues >> first. And then tries to do the hardware queue updating stuff. >> >> But int blk_mq_alloc_rq_map()-->blk_mq_init_tags(), when allocating >> memory for tags, it may cause the mm do the memories direct reclaiming, >> since the queues has been freezed, so if needs to flush the page cache >> to disk, it will stuck in generic_make_request()-->blk_queue_enter() by >> waiting the queues to be unfreezed and then cause deadlock here. >> >> Since the memory size requested here is a small one, which will make >> it not that easy to happen with a large size, but in theory this could >> happen when the OS is running in pressure and out of memory. >> >> Gabriel Krisman Bertazi has hit the similar issue by fixing it in >> commit 36e1f3d10786 ("blk-mq: Avoid memory reclaim when remapping >> queues"), but might forget this part. >> >> Signed-off-by: Xiubo Li <xiubli@redhat.com> >> CC: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> >> Reviewed-by: Ming Lei <ming.lei@redhat.com> >> --- >> block/blk-mq-tag.c | 5 +++-- >> block/blk-mq-tag.h | 5 ++++- >> block/blk-mq.c | 3 ++- >> 3 files changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c >> index 008388e82b5c..04ee0e4c3fa1 100644 >> --- a/block/blk-mq-tag.c >> +++ b/block/blk-mq-tag.c >> @@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags, >> >> struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, >> unsigned int reserved_tags, >> - int node, int alloc_policy) >> + int node, int alloc_policy, >> + gfp_t flags) >> { >> struct blk_mq_tags *tags; >> >> @@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, >> return NULL; >> } >> >> - tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node); >> + tags = kzalloc_node(sizeof(*tags), flags, node); >> if (!tags) >> return NULL; >> >> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h >> index 61deab0b5a5a..296e0bc97126 100644 >> --- a/block/blk-mq-tag.h >> +++ b/block/blk-mq-tag.h >> @@ -22,7 +22,10 @@ struct blk_mq_tags { >> }; >> >> >> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy); >> +extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, >> + unsigned int reserved_tags, >> + int node, int alloc_policy, >> + gfp_t flags); >> extern void blk_mq_free_tags(struct blk_mq_tags *tags); >> >> extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data); >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 240416057f28..9c52e4dfe132 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, >> node = set->numa_node; >> >> tags = blk_mq_init_tags(nr_tags, reserved_tags, node, >> - BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags)); >> + BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags), >> + GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY); > You added the gfp_t argument to blk_mq_init_tags() but you are only using that > argument with a hardcoded value here. So why not simply call kzalloc_node() in > that function with the flags GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY ? That > would avoid needing to add the "gfp_t flags" argument and still fit with your > patch 2 definition of BLK_MQ_GFP_FLAGS. The blk_mq_init_tags() is defined in another separate file, which I think it means to provide a common way of initializing the tags stuff, and currently in this path it needs GFP_NOIO while in others in future it may not. Thanks, BRs >> if (!tags) >> return NULL; >> >> >
On 2019/09/29 22:50, Xiubo Li wrote: > On 2019/9/30 13:28, Damien Le Moal wrote: >> On 2019/09/29 18:52, xiubli@redhat.com wrote: >>> From: Xiubo Li <xiubli@redhat.com> >>> >>> For some storage drivers, such as the nbd, when there has new socket >>> connections added, it will update the hardware queue number by calling >>> blk_mq_update_nr_hw_queues(), in which it will freeze all the queues >>> first. And then tries to do the hardware queue updating stuff. >>> >>> But int blk_mq_alloc_rq_map()-->blk_mq_init_tags(), when allocating >>> memory for tags, it may cause the mm do the memories direct reclaiming, >>> since the queues has been freezed, so if needs to flush the page cache >>> to disk, it will stuck in generic_make_request()-->blk_queue_enter() by >>> waiting the queues to be unfreezed and then cause deadlock here. >>> >>> Since the memory size requested here is a small one, which will make >>> it not that easy to happen with a large size, but in theory this could >>> happen when the OS is running in pressure and out of memory. >>> >>> Gabriel Krisman Bertazi has hit the similar issue by fixing it in >>> commit 36e1f3d10786 ("blk-mq: Avoid memory reclaim when remapping >>> queues"), but might forget this part. >>> >>> Signed-off-by: Xiubo Li <xiubli@redhat.com> >>> CC: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> >>> Reviewed-by: Ming Lei <ming.lei@redhat.com> >>> --- >>> block/blk-mq-tag.c | 5 +++-- >>> block/blk-mq-tag.h | 5 ++++- >>> block/blk-mq.c | 3 ++- >>> 3 files changed, 9 insertions(+), 4 deletions(-) >>> >>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c >>> index 008388e82b5c..04ee0e4c3fa1 100644 >>> --- a/block/blk-mq-tag.c >>> +++ b/block/blk-mq-tag.c >>> @@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags, >>> >>> struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, >>> unsigned int reserved_tags, >>> - int node, int alloc_policy) >>> + int node, int alloc_policy, >>> + gfp_t flags) >>> { >>> struct blk_mq_tags *tags; >>> >>> @@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, >>> return NULL; >>> } >>> >>> - tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node); >>> + tags = kzalloc_node(sizeof(*tags), flags, node); >>> if (!tags) >>> return NULL; >>> >>> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h >>> index 61deab0b5a5a..296e0bc97126 100644 >>> --- a/block/blk-mq-tag.h >>> +++ b/block/blk-mq-tag.h >>> @@ -22,7 +22,10 @@ struct blk_mq_tags { >>> }; >>> >>> >>> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy); >>> +extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, >>> + unsigned int reserved_tags, >>> + int node, int alloc_policy, >>> + gfp_t flags); >>> extern void blk_mq_free_tags(struct blk_mq_tags *tags); >>> >>> extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data); >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index 240416057f28..9c52e4dfe132 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, >>> node = set->numa_node; >>> >>> tags = blk_mq_init_tags(nr_tags, reserved_tags, node, >>> - BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags)); >>> + BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags), >>> + GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY); >> You added the gfp_t argument to blk_mq_init_tags() but you are only using that >> argument with a hardcoded value here. So why not simply call kzalloc_node() in >> that function with the flags GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY ? That >> would avoid needing to add the "gfp_t flags" argument and still fit with your >> patch 2 definition of BLK_MQ_GFP_FLAGS. > > The blk_mq_init_tags() is defined in another separate file, which I think it means to provide a common way of initializing the tags stuff, and currently in this path it needs GFP_NOIO while in others in future it may not. blk_mq_alloc_rq_map() is currently the only user of blk_mq_init_tags(), so I do not see the point in doing this change now. If it is needed "in the future" then do it then. I do not mind the patch going in as is, but I really think that everything can be folded into your patch 2 without the addition of blk_mq_init_tags() additional argument. > > Thanks, > BRs > > >>> if (!tags) >>> return NULL; >>> >>> >> > >
On 2019/9/30 14:20, Damien Le Moal wrote: > On 2019/09/29 22:50, Xiubo Li wrote: >> On 2019/9/30 13:28, Damien Le Moal wrote: >>> On 2019/09/29 18:52, xiubli@redhat.com wrote: >>>> From: Xiubo Li <xiubli@redhat.com> >>>> >>>> For some storage drivers, such as the nbd, when there has new socket >>>> connections added, it will update the hardware queue number by calling >>>> blk_mq_update_nr_hw_queues(), in which it will freeze all the queues >>>> first. And then tries to do the hardware queue updating stuff. >>>> >>>> But int blk_mq_alloc_rq_map()-->blk_mq_init_tags(), when allocating >>>> memory for tags, it may cause the mm do the memories direct reclaiming, >>>> since the queues has been freezed, so if needs to flush the page cache >>>> to disk, it will stuck in generic_make_request()-->blk_queue_enter() by >>>> waiting the queues to be unfreezed and then cause deadlock here. >>>> >>>> Since the memory size requested here is a small one, which will make >>>> it not that easy to happen with a large size, but in theory this could >>>> happen when the OS is running in pressure and out of memory. >>>> >>>> Gabriel Krisman Bertazi has hit the similar issue by fixing it in >>>> commit 36e1f3d10786 ("blk-mq: Avoid memory reclaim when remapping >>>> queues"), but might forget this part. >>>> >>>> Signed-off-by: Xiubo Li <xiubli@redhat.com> >>>> CC: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> >>>> Reviewed-by: Ming Lei <ming.lei@redhat.com> >>>> --- >>>> block/blk-mq-tag.c | 5 +++-- >>>> block/blk-mq-tag.h | 5 ++++- >>>> block/blk-mq.c | 3 ++- >>>> 3 files changed, 9 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c >>>> index 008388e82b5c..04ee0e4c3fa1 100644 >>>> --- a/block/blk-mq-tag.c >>>> +++ b/block/blk-mq-tag.c >>>> @@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags, >>>> >>>> struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, >>>> unsigned int reserved_tags, >>>> - int node, int alloc_policy) >>>> + int node, int alloc_policy, >>>> + gfp_t flags) >>>> { >>>> struct blk_mq_tags *tags; >>>> >>>> @@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, >>>> return NULL; >>>> } >>>> >>>> - tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node); >>>> + tags = kzalloc_node(sizeof(*tags), flags, node); >>>> if (!tags) >>>> return NULL; >>>> >>>> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h >>>> index 61deab0b5a5a..296e0bc97126 100644 >>>> --- a/block/blk-mq-tag.h >>>> +++ b/block/blk-mq-tag.h >>>> @@ -22,7 +22,10 @@ struct blk_mq_tags { >>>> }; >>>> >>>> >>>> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy); >>>> +extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, >>>> + unsigned int reserved_tags, >>>> + int node, int alloc_policy, >>>> + gfp_t flags); >>>> extern void blk_mq_free_tags(struct blk_mq_tags *tags); >>>> >>>> extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data); >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 240416057f28..9c52e4dfe132 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, >>>> node = set->numa_node; >>>> >>>> tags = blk_mq_init_tags(nr_tags, reserved_tags, node, >>>> - BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags)); >>>> + BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags), >>>> + GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY); >>> You added the gfp_t argument to blk_mq_init_tags() but you are only using that >>> argument with a hardcoded value here. So why not simply call kzalloc_node() in >>> that function with the flags GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY ? That >>> would avoid needing to add the "gfp_t flags" argument and still fit with your >>> patch 2 definition of BLK_MQ_GFP_FLAGS. >> The blk_mq_init_tags() is defined in another separate file, which I think it means to provide a common way of initializing the tags stuff, and currently in this path it needs GFP_NOIO while in others in future it may not. > blk_mq_alloc_rq_map() is currently the only user of blk_mq_init_tags(), so I do > not see the point in doing this change now. If it is needed "in the future" then > do it then. > > I do not mind the patch going in as is, but I really think that everything can > be folded into your patch 2 without the addition of blk_mq_init_tags() > additional argument. Actually I do not object to folding it into patch 2 :-) Just from the v2 I supposed that it would be easier to be reviewed by splitting it into a separate one. Thanks, BRs >> Thanks, >> BRs >> >> >>>> if (!tags) >>>> return NULL; >>>> >>>> >> >
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 008388e82b5c..04ee0e4c3fa1 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -462,7 +462,8 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags, struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, unsigned int reserved_tags, - int node, int alloc_policy) + int node, int alloc_policy, + gfp_t flags) { struct blk_mq_tags *tags; @@ -471,7 +472,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, return NULL; } - tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node); + tags = kzalloc_node(sizeof(*tags), flags, node); if (!tags) return NULL; diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 61deab0b5a5a..296e0bc97126 100644 --- a/block/blk-mq-tag.h +++ b/block/blk-mq-tag.h @@ -22,7 +22,10 @@ struct blk_mq_tags { }; -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy); +extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, + unsigned int reserved_tags, + int node, int alloc_policy, + gfp_t flags); extern void blk_mq_free_tags(struct blk_mq_tags *tags); extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data); diff --git a/block/blk-mq.c b/block/blk-mq.c index 240416057f28..9c52e4dfe132 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2090,7 +2090,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, node = set->numa_node; tags = blk_mq_init_tags(nr_tags, reserved_tags, node, - BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags)); + BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags), + GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY); if (!tags) return NULL;