Message ID | 20210517171722.1266878-2-bfoster@redhat.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | iomap: avoid soft lockup warnings on large ioends | expand |
On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote: > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) > next = bio->bi_private; > > /* walk each page on bio, ending page IO on them */ > - bio_for_each_segment_all(bv, bio, iter_all) > + bio_for_each_segment_all(bv, bio, iter_all) { > iomap_finish_page_writeback(inode, bv->bv_page, error, > bv->bv_len); > + if (!atomic) > + cond_resched(); > + } I don't know that it makes sense to check after _every_ page. I might go for every segment. Some users check after every thousand pages.
On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote: > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote: > > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) > > next = bio->bi_private; > > > > /* walk each page on bio, ending page IO on them */ > > - bio_for_each_segment_all(bv, bio, iter_all) > > + bio_for_each_segment_all(bv, bio, iter_all) { > > iomap_finish_page_writeback(inode, bv->bv_page, error, > > bv->bv_len); > > + if (!atomic) > > + cond_resched(); > > + } > > I don't know that it makes sense to check after _every_ page. I might > go for every segment. Some users check after every thousand pages. > The handful of examples I come across on a brief scan (including the other iomap usage) have a similar pattern as used here. I don't doubt there are others, but I think I'd prefer to have more reasoning behind adding more code than might be necessary (i.e. do we expect additional overhead to be measurable here?). As it is, the intent isn't so much to check on every page as much as this just happens to be the common point of the function to cover both long bio chains and single vector bios with large numbers of pages. Brian
On Tue, May 18, 2021 at 07:38:01AM -0400, Brian Foster wrote: > On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote: > > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote: > > > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) > > > next = bio->bi_private; > > > > > > /* walk each page on bio, ending page IO on them */ > > > - bio_for_each_segment_all(bv, bio, iter_all) > > > + bio_for_each_segment_all(bv, bio, iter_all) { > > > iomap_finish_page_writeback(inode, bv->bv_page, error, > > > bv->bv_len); > > > + if (!atomic) > > > + cond_resched(); > > > + } > > > > I don't know that it makes sense to check after _every_ page. I might > > go for every segment. Some users check after every thousand pages. > > > > The handful of examples I come across on a brief scan (including the > other iomap usage) have a similar pattern as used here. I don't doubt > there are others, but I think I'd prefer to have more reasoning behind > adding more code than might be necessary (i.e. do we expect additional > overhead to be measurable here?). As it is, the intent isn't so much to > check on every page as much as this just happens to be the common point > of the function to cover both long bio chains and single vector bios > with large numbers of pages. It's been a while since I waded through the macro hell to find out what cond_resched actually does, but iirc it can do some fairly heavyweight things (disable preemption, call the scheduler, rcu stuff) which is why we're supposed to be a little judicious about amortizing each call over a few thousand pages. --D > Brian >
On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote: > The iomap ioend mechanism has the ability to construct very large, > contiguous bios and/or bio chains. This has been reported to lead to BTW, it is actually wrong to complete a large bio chains in iomap_finish_ioend(), which may risk in bio allocation deadlock, cause bio_alloc_bioset() relies on bio submission to make forward progress. But it becomes not true when all chained bios are freed just after the whole ioend is done since all chained bios(except for the one embedded in ioend) are allocated from same bioset(fs_bio_set). Thanks, Ming
On Thu, May 20, 2021 at 02:58:58PM -0700, Darrick J. Wong wrote: > On Tue, May 18, 2021 at 07:38:01AM -0400, Brian Foster wrote: > > On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote: > > > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote: > > > > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) > > > > next = bio->bi_private; > > > > > > > > /* walk each page on bio, ending page IO on them */ > > > > - bio_for_each_segment_all(bv, bio, iter_all) > > > > + bio_for_each_segment_all(bv, bio, iter_all) { > > > > iomap_finish_page_writeback(inode, bv->bv_page, error, > > > > bv->bv_len); > > > > + if (!atomic) > > > > + cond_resched(); > > > > + } > > > > > > I don't know that it makes sense to check after _every_ page. I might > > > go for every segment. Some users check after every thousand pages. > > > > > > > The handful of examples I come across on a brief scan (including the > > other iomap usage) have a similar pattern as used here. I don't doubt > > there are others, but I think I'd prefer to have more reasoning behind > > adding more code than might be necessary (i.e. do we expect additional > > overhead to be measurable here?). As it is, the intent isn't so much to > > check on every page as much as this just happens to be the common point > > of the function to cover both long bio chains and single vector bios > > with large numbers of pages. > > It's been a while since I waded through the macro hell to find out what > cond_resched actually does, but iirc it can do some fairly heavyweight > things (disable preemption, call the scheduler, rcu stuff) which is why > we're supposed to be a little judicious about amortizing each call over > a few thousand pages. > It looks to me it just checks some state bit and only does any work if actually necessary. I suppose not doing that less often is cheaper than doing it more, but it's not clear to me it's enough that it really matters and/or warrants more code to filter out calls.. What exactly did you have in mind for logic? I suppose we could always stuff a 'if (!(count++ % 1024)) cond_resched();' or some such in the inner loop, but that might have less of an effect on larger chains constructed of bios with fewer pages (depending on whether that might still be possible). Brian > --D > > > Brian > > >
On Sat, May 22, 2021 at 03:45:11PM +0800, Ming Lei wrote: > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote: > > The iomap ioend mechanism has the ability to construct very large, > > contiguous bios and/or bio chains. This has been reported to lead to > > BTW, it is actually wrong to complete a large bio chains in > iomap_finish_ioend(), which may risk in bio allocation deadlock, cause > bio_alloc_bioset() relies on bio submission to make forward progress. But > it becomes not true when all chained bios are freed just after the whole > ioend is done since all chained bios(except for the one embedded in ioend) > are allocated from same bioset(fs_bio_set). > Interesting. Do you have a reproducer (or error report) for this? Is it addressed by the next patch, or are further changes required? Brian > > Thanks, > Ming >
On Mon, May 24, 2021 at 07:57:48AM -0400, Brian Foster wrote: > On Sat, May 22, 2021 at 03:45:11PM +0800, Ming Lei wrote: > > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote: > > > The iomap ioend mechanism has the ability to construct very large, > > > contiguous bios and/or bio chains. This has been reported to lead to > > > > BTW, it is actually wrong to complete a large bio chains in > > iomap_finish_ioend(), which may risk in bio allocation deadlock, cause > > bio_alloc_bioset() relies on bio submission to make forward progress. But > > it becomes not true when all chained bios are freed just after the whole > > ioend is done since all chained bios(except for the one embedded in ioend) > > are allocated from same bioset(fs_bio_set). > > > > Interesting. Do you have a reproducer (or error report) for this? Is it No, but the theory has been applied for long time. > addressed by the next patch, or are further changes required? Your patchset can't address the issue. Thanks, Ming
On Mon, May 24, 2021 at 07:57:31AM -0400, Brian Foster wrote: > On Thu, May 20, 2021 at 02:58:58PM -0700, Darrick J. Wong wrote: > > On Tue, May 18, 2021 at 07:38:01AM -0400, Brian Foster wrote: > > > On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote: > > > > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote: > > > > > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) > > > > > next = bio->bi_private; > > > > > > > > > > /* walk each page on bio, ending page IO on them */ > > > > > - bio_for_each_segment_all(bv, bio, iter_all) > > > > > + bio_for_each_segment_all(bv, bio, iter_all) { > > > > > iomap_finish_page_writeback(inode, bv->bv_page, error, > > > > > bv->bv_len); > > > > > + if (!atomic) > > > > > + cond_resched(); > > > > > + } > > > > > > > > I don't know that it makes sense to check after _every_ page. I might > > > > go for every segment. Some users check after every thousand pages. > > > > > > > > > > The handful of examples I come across on a brief scan (including the > > > other iomap usage) have a similar pattern as used here. I don't doubt > > > there are others, but I think I'd prefer to have more reasoning behind > > > adding more code than might be necessary (i.e. do we expect additional > > > overhead to be measurable here?). As it is, the intent isn't so much to > > > check on every page as much as this just happens to be the common point > > > of the function to cover both long bio chains and single vector bios > > > with large numbers of pages. > > > > It's been a while since I waded through the macro hell to find out what > > cond_resched actually does, but iirc it can do some fairly heavyweight > > things (disable preemption, call the scheduler, rcu stuff) which is why > > we're supposed to be a little judicious about amortizing each call over > > a few thousand pages. > > > > It looks to me it just checks some state bit and only does any work if > actually necessary. I suppose not doing that less often is cheaper than > doing it more, but it's not clear to me it's enough that it really > matters and/or warrants more code to filter out calls.. > > What exactly did you have in mind for logic? I suppose we could always > stuff a 'if (!(count++ % 1024)) cond_resched();' or some such in the > inner loop, but that might have less of an effect on larger chains > constructed of bios with fewer pages (depending on whether that might > still be possible). I /was/ thinking about a function level page counter until I noticed that iomap_{write,unshare}_actor call cond_resched for every page it touches. I withdraw the comment. :) --D > > Brian > > > --D > > > > > Brian > > > > > >
On Mon, May 24, 2021 at 09:53:05AM -0700, Darrick J. Wong wrote: > On Mon, May 24, 2021 at 07:57:31AM -0400, Brian Foster wrote: > > On Thu, May 20, 2021 at 02:58:58PM -0700, Darrick J. Wong wrote: > > > On Tue, May 18, 2021 at 07:38:01AM -0400, Brian Foster wrote: > > > > On Mon, May 17, 2021 at 06:54:34PM +0100, Matthew Wilcox wrote: > > > > > On Mon, May 17, 2021 at 01:17:20PM -0400, Brian Foster wrote: > > > > > > @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) > > > > > > next = bio->bi_private; > > > > > > > > > > > > /* walk each page on bio, ending page IO on them */ > > > > > > - bio_for_each_segment_all(bv, bio, iter_all) > > > > > > + bio_for_each_segment_all(bv, bio, iter_all) { > > > > > > iomap_finish_page_writeback(inode, bv->bv_page, error, > > > > > > bv->bv_len); > > > > > > + if (!atomic) > > > > > > + cond_resched(); > > > > > > + } > > > > > > > > > > I don't know that it makes sense to check after _every_ page. I might > > > > > go for every segment. Some users check after every thousand pages. > > > > > > > > > > > > > The handful of examples I come across on a brief scan (including the > > > > other iomap usage) have a similar pattern as used here. I don't doubt > > > > there are others, but I think I'd prefer to have more reasoning behind > > > > adding more code than might be necessary (i.e. do we expect additional > > > > overhead to be measurable here?). As it is, the intent isn't so much to > > > > check on every page as much as this just happens to be the common point > > > > of the function to cover both long bio chains and single vector bios > > > > with large numbers of pages. > > > > > > It's been a while since I waded through the macro hell to find out what > > > cond_resched actually does, but iirc it can do some fairly heavyweight > > > things (disable preemption, call the scheduler, rcu stuff) which is why > > > we're supposed to be a little judicious about amortizing each call over > > > a few thousand pages. > > > > > > > It looks to me it just checks some state bit and only does any work if > > actually necessary. I suppose not doing that less often is cheaper than > > doing it more, but it's not clear to me it's enough that it really > > matters and/or warrants more code to filter out calls.. > > > > What exactly did you have in mind for logic? I suppose we could always > > stuff a 'if (!(count++ % 1024)) cond_resched();' or some such in the > > inner loop, but that might have less of an effect on larger chains > > constructed of bios with fewer pages (depending on whether that might > > still be possible). > > I /was/ thinking about a function level page counter until I noticed > that iomap_{write,unshare}_actor call cond_resched for every page it > touches. I withdraw the comment. :) Oh, also: Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > > --D > > > > > Brian > > > > > --D > > > > > > > Brian > > > > > > > > >
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 414769a6ad11..642422775e4e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1061,7 +1061,7 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page, * ioend after this. */ static void -iomap_finish_ioend(struct iomap_ioend *ioend, int error) +iomap_finish_ioend(struct iomap_ioend *ioend, int error, bool atomic) { struct inode *inode = ioend->io_inode; struct bio *bio = &ioend->io_inline_bio; @@ -1084,9 +1084,12 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) next = bio->bi_private; /* walk each page on bio, ending page IO on them */ - bio_for_each_segment_all(bv, bio, iter_all) + bio_for_each_segment_all(bv, bio, iter_all) { iomap_finish_page_writeback(inode, bv->bv_page, error, bv->bv_len); + if (!atomic) + cond_resched(); + } bio_put(bio); } /* The ioend has been freed by bio_put() */ @@ -1099,17 +1102,17 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error) } void -iomap_finish_ioends(struct iomap_ioend *ioend, int error) +iomap_finish_ioends(struct iomap_ioend *ioend, int error, bool atomic) { struct list_head tmp; list_replace_init(&ioend->io_list, &tmp); - iomap_finish_ioend(ioend, error); + iomap_finish_ioend(ioend, error, atomic); while (!list_empty(&tmp)) { ioend = list_first_entry(&tmp, struct iomap_ioend, io_list); list_del_init(&ioend->io_list); - iomap_finish_ioend(ioend, error); + iomap_finish_ioend(ioend, error, atomic); } } EXPORT_SYMBOL_GPL(iomap_finish_ioends); @@ -1178,7 +1181,7 @@ static void iomap_writepage_end_bio(struct bio *bio) { struct iomap_ioend *ioend = bio->bi_private; - iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status)); + iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status), true); } /* diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 9b08db45ce85..84cd6cf46b12 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -123,7 +123,7 @@ xfs_end_ioend( if (!error && xfs_ioend_is_append(ioend)) error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size); done: - iomap_finish_ioends(ioend, error); + iomap_finish_ioends(ioend, error, false); memalloc_nofs_restore(nofs_flag); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index d202fd2d0f91..07f3f4e69084 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -232,7 +232,7 @@ struct iomap_writepage_ctx { const struct iomap_writeback_ops *ops; }; -void iomap_finish_ioends(struct iomap_ioend *ioend, int error); +void iomap_finish_ioends(struct iomap_ioend *ioend, int error, bool atomic); void iomap_ioend_try_merge(struct iomap_ioend *ioend, struct list_head *more_ioends, void (*merge_private)(struct iomap_ioend *ioend,
The iomap ioend mechanism has the ability to construct very large, contiguous bios and/or bio chains. This has been reported to lead to soft lockup warnings in bio completion due to the amount of page processing that occurs. Update the ioend completion path with a parameter to indicate atomic context and insert a cond_resched() call to avoid soft lockups in either scenario. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/iomap/buffered-io.c | 15 +++++++++------ fs/xfs/xfs_aops.c | 2 +- include/linux/iomap.h | 2 +- 3 files changed, 11 insertions(+), 8 deletions(-)