Message ID | 1482485418-4190-1-git-send-email-chandan@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Friday, December 23, 2016 03:00:18 PM Chandan Rajendra wrote: > The following deadlock is seen when executing generic/113 test, > > ---------------------------------------------------------+---------------------------------------------------- > Direct I/O task Fast fsync task > ---------------------------------------------------------+---------------------------------------------------- > btrfs_direct_IO > __blockdev_direct_IO > do_blockdev_direct_IO > do_direct_IO > btrfs_get_blocks_direct > while (blocks needs to written) > get_more_blocks (first iteration) > btrfs_get_blocks_direct > btrfs_create_dio_extent > down_read(&BTRFS_I(inode) >dio_sem) > Create and add extent map and ordered extent > up_read(&BTRFS_I(inode) >dio_sem) > btrfs_sync_file > btrfs_log_dentry_safe > btrfs_log_inode_parent > btrfs_log_inode > btrfs_log_changed_extents > down_write(&BTRFS_I(inode) >dio_sem) > Collect new extent maps and ordered extents > wait for ordered extent completion > get_more_blocks (second iteration) > btrfs_get_blocks_direct > btrfs_create_dio_extent > down_read(&BTRFS_I(inode) >dio_sem) > -------------------------------------------------------------------------------------------------------------- > > In the above description, Btrfs direct I/O code path has not yet started > submitting bios for file range covered by the initial ordered > extent. Meanwhile, The fast fsync task obtains the write semaphore and > waits for I/O on the ordered extent to get completed. However, the > Direct I/O task is now blocked on obtaining the read semaphore. > > To resolve the deadlock, this commit modifies the Direct I/O code path > to obtain the read semaphore before invoking > __blockdev_direct_IO(). The semaphore is then given up after > __blockdev_direct_IO() returns. This allows the Direct I/O code to > complete I/O on all the ordered extents it creates. > Btw, I was able to reproduce the issue on kdave/for-next branch with "Merge branch 'for-next-next-4.9-20161125' into for-next-20161125" as the topmost commit. The issue cannot be reproduced yet on latest code available from kdave/for-next branch.
On Friday, December 23, 2016 03:57:40 PM Chandan Rajendra wrote: > On Friday, December 23, 2016 03:00:18 PM Chandan Rajendra wrote: > > The following deadlock is seen when executing generic/113 test, > > > > ---------------------------------------------------------+---------------------------------------------------- > > Direct I/O task Fast fsync task > > ---------------------------------------------------------+---------------------------------------------------- > > btrfs_direct_IO > > __blockdev_direct_IO > > do_blockdev_direct_IO > > do_direct_IO > > btrfs_get_blocks_direct > > while (blocks needs to written) > > get_more_blocks (first iteration) > > btrfs_get_blocks_direct > > btrfs_create_dio_extent > > down_read(&BTRFS_I(inode) >dio_sem) > > Create and add extent map and ordered extent > > up_read(&BTRFS_I(inode) >dio_sem) > > btrfs_sync_file > > btrfs_log_dentry_safe > > btrfs_log_inode_parent > > btrfs_log_inode > > btrfs_log_changed_extents > > down_write(&BTRFS_I(inode) >dio_sem) > > Collect new extent maps and ordered extents > > wait for ordered extent completion > > get_more_blocks (second iteration) > > btrfs_get_blocks_direct > > btrfs_create_dio_extent > > down_read(&BTRFS_I(inode) >dio_sem) > > -------------------------------------------------------------------------------------------------------------- > > > > In the above description, Btrfs direct I/O code path has not yet started > > submitting bios for file range covered by the initial ordered > > extent. Meanwhile, The fast fsync task obtains the write semaphore and > > waits for I/O on the ordered extent to get completed. However, the > > Direct I/O task is now blocked on obtaining the read semaphore. > > > > To resolve the deadlock, this commit modifies the Direct I/O code path > > to obtain the read semaphore before invoking > > __blockdev_direct_IO(). The semaphore is then given up after > > __blockdev_direct_IO() returns. This allows the Direct I/O code to > > complete I/O on all the ordered extents it creates. > > > > Btw, I was able to reproduce the issue on kdave/for-next branch with "Merge > branch 'for-next-next-4.9-20161125' into for-next-20161125" as the topmost > commit. The issue cannot be reproduced yet on latest code available from > kdave/for-next branch. > > Maybe changes in upstream might have masked the issue in the recent kdave/for-next branch. I say that because 'git bisect' resulted in the following commit ... e3597e6090ddf40904dce6d0a5a404e2c490cac6 Author: Chris Mason <clm@fb.com> AuthorDate: Tue Nov 1 12:54:45 2016 -0700 Commit: Chris Mason <clm@fb.com> CommitDate: Tue Nov 1 12:54:45 2016 -0700 Parent: 570dd45 btrfs: fix races on root_log_ctx lists Parent: 9d1032c btrfs: fix WARNING in btrfs_select_ref_head() Merged: btrfs-next-for-linus-4.8 kdave-master linus-v4.7-rc6 local-v4.7-rc4 Containing: direct-io-fsync-deadlock kdave-for-next Follows: v4.8-rc8 (57) Precedes: next-20161219 (30006) Merge branch 'for-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.9 5 files changed, 29 insertions(+), 9 deletions(-) fs/btrfs/extent-tree.c | 3 +++ fs/btrfs/extent_io.c | 8 ++++---- fs/btrfs/inode.c | 13 +++++++++---- fs/btrfs/ioctl.c | 5 +++++ fs/btrfs/relocation.c | 9 ++++++++-
On Fri, Dec 23, 2016 at 05:27:55PM +0530, Chandan Rajendra wrote: > On Friday, December 23, 2016 03:57:40 PM Chandan Rajendra wrote: > > On Friday, December 23, 2016 03:00:18 PM Chandan Rajendra wrote: > > > The following deadlock is seen when executing generic/113 test, > > > > > > ---------------------------------------------------------+---------------------------------------------------- > > > Direct I/O task Fast fsync task > > > ---------------------------------------------------------+---------------------------------------------------- > > > btrfs_direct_IO > > > __blockdev_direct_IO > > > do_blockdev_direct_IO > > > do_direct_IO > > > btrfs_get_blocks_direct > > > while (blocks needs to written) > > > get_more_blocks (first iteration) > > > btrfs_get_blocks_direct > > > btrfs_create_dio_extent > > > down_read(&BTRFS_I(inode) >dio_sem) > > > Create and add extent map and ordered extent > > > up_read(&BTRFS_I(inode) >dio_sem) > > > btrfs_sync_file > > > btrfs_log_dentry_safe > > > btrfs_log_inode_parent > > > btrfs_log_inode > > > btrfs_log_changed_extents > > > down_write(&BTRFS_I(inode) >dio_sem) > > > Collect new extent maps and ordered extents > > > wait for ordered extent completion > > > get_more_blocks (second iteration) > > > btrfs_get_blocks_direct > > > btrfs_create_dio_extent > > > down_read(&BTRFS_I(inode) >dio_sem) > > > -------------------------------------------------------------------------------------------------------------- > > > > > > In the above description, Btrfs direct I/O code path has not yet started > > > submitting bios for file range covered by the initial ordered > > > extent. Meanwhile, The fast fsync task obtains the write semaphore and > > > waits for I/O on the ordered extent to get completed. However, the > > > Direct I/O task is now blocked on obtaining the read semaphore. > > > > > > To resolve the deadlock, this commit modifies the Direct I/O code path > > > to obtain the read semaphore before invoking > > > __blockdev_direct_IO(). The semaphore is then given up after > > > __blockdev_direct_IO() returns. This allows the Direct I/O code to > > > complete I/O on all the ordered extents it creates. > > > > > > > Btw, I was able to reproduce the issue on kdave/for-next branch with "Merge > > branch 'for-next-next-4.9-20161125' into for-next-20161125" as the topmost > > commit. The issue cannot be reproduced yet on latest code available from > > kdave/for-next branch. > > > > > > Maybe changes in upstream might have masked the issue in the recent > kdave/for-next branch. I say that because 'git bisect' resulted in the > following commit ... I guess that the for-next branch didn't revert this patch[1] as upstream did, so that generic/113 would complain, however, even w/o that patch, this fix is still required since the deadlock could be reproduced by running generic/113 with '-ofragment=data' and in fact Filipe has proposed a almost same fix but not a real patch in this thread [2]. [1]: Btrfs: adjust len of writes if following a preallocated extent https://patchwork.kernel.org/patch/9413129/ [2]: https://patchwork.kernel.org/patch/9445231/ Thanks, -liubo > > e3597e6090ddf40904dce6d0a5a404e2c490cac6 > Author: Chris Mason <clm@fb.com> > AuthorDate: Tue Nov 1 12:54:45 2016 -0700 > Commit: Chris Mason <clm@fb.com> > CommitDate: Tue Nov 1 12:54:45 2016 -0700 > > Parent: 570dd45 btrfs: fix races on root_log_ctx lists > Parent: 9d1032c btrfs: fix WARNING in btrfs_select_ref_head() > Merged: btrfs-next-for-linus-4.8 kdave-master linus-v4.7-rc6 local-v4.7-rc4 > Containing: direct-io-fsync-deadlock kdave-for-next > Follows: v4.8-rc8 (57) > Precedes: next-20161219 (30006) > > Merge branch 'for-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.9 > > 5 files changed, 29 insertions(+), 9 deletions(-) > fs/btrfs/extent-tree.c | 3 +++ > fs/btrfs/extent_io.c | 8 ++++---- > fs/btrfs/inode.c | 13 +++++++++---- > fs/btrfs/ioctl.c | 5 +++++ > fs/btrfs/relocation.c | 9 ++++++++- > > -- > chandan > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, December 23, 2016 04:18:00 PM Liu Bo wrote: > On Fri, Dec 23, 2016 at 05:27:55PM +0530, Chandan Rajendra wrote: > > On Friday, December 23, 2016 03:57:40 PM Chandan Rajendra wrote: > > > On Friday, December 23, 2016 03:00:18 PM Chandan Rajendra wrote: > > > > The following deadlock is seen when executing generic/113 test, > > > > > > > > ---------------------------------------------------------+---------------------------------------------------- > > > > Direct I/O task Fast fsync task > > > > ---------------------------------------------------------+---------------------------------------------------- > > > > btrfs_direct_IO > > > > __blockdev_direct_IO > > > > do_blockdev_direct_IO > > > > do_direct_IO > > > > btrfs_get_blocks_direct > > > > while (blocks needs to written) > > > > get_more_blocks (first iteration) > > > > btrfs_get_blocks_direct > > > > btrfs_create_dio_extent > > > > down_read(&BTRFS_I(inode) >dio_sem) > > > > Create and add extent map and ordered extent > > > > up_read(&BTRFS_I(inode) >dio_sem) > > > > btrfs_sync_file > > > > btrfs_log_dentry_safe > > > > btrfs_log_inode_parent > > > > btrfs_log_inode > > > > btrfs_log_changed_extents > > > > down_write(&BTRFS_I(inode) >dio_sem) > > > > Collect new extent maps and ordered extents > > > > wait for ordered extent completion > > > > get_more_blocks (second iteration) > > > > btrfs_get_blocks_direct > > > > btrfs_create_dio_extent > > > > down_read(&BTRFS_I(inode) >dio_sem) > > > > -------------------------------------------------------------------------------------------------------------- > > > > > > > > In the above description, Btrfs direct I/O code path has not yet started > > > > submitting bios for file range covered by the initial ordered > > > > extent. Meanwhile, The fast fsync task obtains the write semaphore and > > > > waits for I/O on the ordered extent to get completed. However, the > > > > Direct I/O task is now blocked on obtaining the read semaphore. > > > > > > > > To resolve the deadlock, this commit modifies the Direct I/O code path > > > > to obtain the read semaphore before invoking > > > > __blockdev_direct_IO(). The semaphore is then given up after > > > > __blockdev_direct_IO() returns. This allows the Direct I/O code to > > > > complete I/O on all the ordered extents it creates. > > > > > > > > > > Btw, I was able to reproduce the issue on kdave/for-next branch with "Merge > > > branch 'for-next-next-4.9-20161125' into for-next-20161125" as the topmost > > > commit. The issue cannot be reproduced yet on latest code available from > > > kdave/for-next branch. > > > > > > > > > > Maybe changes in upstream might have masked the issue in the recent > > kdave/for-next branch. I say that because 'git bisect' resulted in the > > following commit ... > > I guess that the for-next branch didn't revert this patch[1] as upstream > did, so that generic/113 would complain, however, even w/o that patch, > this fix is still required since the deadlock could be reproduced by > running generic/113 with '-ofragment=data' and in fact Filipe has > proposed a almost same fix but not a real patch in this thread [2]. > > [1]: Btrfs: adjust len of writes if following a preallocated extent > https://patchwork.kernel.org/patch/9413129/ > [2]: https://patchwork.kernel.org/patch/9445231/ > Ah ok. Thanks for pointing it out.
On Fri, Dec 23, 2016 at 04:18:00PM -0800, Liu Bo wrote: > > > Btw, I was able to reproduce the issue on kdave/for-next branch with "Merge > > > branch 'for-next-next-4.9-20161125' into for-next-20161125" as the topmost > > > commit. The issue cannot be reproduced yet on latest code available from > > > kdave/for-next branch. > > > > Maybe changes in upstream might have masked the issue in the recent > > kdave/for-next branch. I say that because 'git bisect' resulted in the > > following commit ... > > I guess that the for-next branch didn't revert this patch[1] as upstream > did, [...] > [1]: Btrfs: adjust len of writes if following a preallocated extent > https://patchwork.kernel.org/patch/9413129/ The revert was present in the for-next-20161219 branch, the for-next branches sit on top of recent upstream rc tag and are merged from various dev branches so the patch difference should be minimal in the end. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 23, 2016 at 9:30 AM, Chandan Rajendra <chandan@linux.vnet.ibm.com> wrote: > The following deadlock is seen when executing generic/113 test, > > ---------------------------------------------------------+---------------------------------------------------- > Direct I/O task Fast fsync task > ---------------------------------------------------------+---------------------------------------------------- > btrfs_direct_IO > __blockdev_direct_IO > do_blockdev_direct_IO > do_direct_IO > btrfs_get_blocks_direct > while (blocks needs to written) > get_more_blocks (first iteration) > btrfs_get_blocks_direct > btrfs_create_dio_extent > down_read(&BTRFS_I(inode) >dio_sem) > Create and add extent map and ordered extent > up_read(&BTRFS_I(inode) >dio_sem) > btrfs_sync_file > btrfs_log_dentry_safe > btrfs_log_inode_parent > btrfs_log_inode > btrfs_log_changed_extents > down_write(&BTRFS_I(inode) >dio_sem) > Collect new extent maps and ordered extents > wait for ordered extent completion > get_more_blocks (second iteration) > btrfs_get_blocks_direct > btrfs_create_dio_extent > down_read(&BTRFS_I(inode) >dio_sem) > -------------------------------------------------------------------------------------------------------------- > > In the above description, Btrfs direct I/O code path has not yet started > submitting bios for file range covered by the initial ordered > extent. Meanwhile, The fast fsync task obtains the write semaphore and > waits for I/O on the ordered extent to get completed. However, the > Direct I/O task is now blocked on obtaining the read semaphore. > > To resolve the deadlock, this commit modifies the Direct I/O code path > to obtain the read semaphore before invoking > __blockdev_direct_IO(). The semaphore is then given up after > __blockdev_direct_IO() returns. This allows the Direct I/O code to > complete I/O on all the ordered extents it creates. > > Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> As pointed out by Liu Bo, this was known, discussed and with a proposed fix in another older thread: https://www.marc.info/?l=linux-btrfs&m=147998603528213&w=2 I was on vacations and waiting for Josef's feedback as well (cc'ed but never replied) before making a change log and formalizing the proposed patch in that thread. From my point of view, it's ok and you can have: Reviewed-by: Filipe Manana <fdmanana@suse.com> thanks > --- > fs/btrfs/inode.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 5ca88f0..f796037 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -7325,7 +7325,6 @@ static struct extent_map *btrfs_create_dio_extent(struct inode *inode, > struct extent_map *em = NULL; > int ret; > > - down_read(&BTRFS_I(inode)->dio_sem); > if (type != BTRFS_ORDERED_NOCOW) { > em = create_pinned_em(inode, start, len, orig_start, > block_start, block_len, orig_block_len, > @@ -7344,7 +7343,6 @@ static struct extent_map *btrfs_create_dio_extent(struct inode *inode, > em = ERR_PTR(ret); > } > out: > - up_read(&BTRFS_I(inode)->dio_sem); > > return em; > } > @@ -8800,6 +8798,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) > dio_data.unsubmitted_oe_range_start = (u64)offset; > dio_data.unsubmitted_oe_range_end = (u64)offset; > current->journal_info = &dio_data; > + down_read(&BTRFS_I(inode)->dio_sem); > } else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, > &BTRFS_I(inode)->runtime_flags)) { > inode_dio_end(inode); > @@ -8812,6 +8811,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) > iter, btrfs_get_blocks_direct, NULL, > btrfs_submit_direct, flags); > if (iov_iter_rw(iter) == WRITE) { > + up_read(&BTRFS_I(inode)->dio_sem); > current->journal_info = NULL; > if (ret < 0 && ret != -EIOCBQUEUED) { > if (dio_data.reserve) > -- > 2.5.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 06, 2017 at 11:45:46AM +0000, Filipe Manana wrote: > >From my point of view, it's ok and you can have: > > Reviewed-by: Filipe Manana <fdmanana@suse.com> Added to next and queued for 4.10, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5ca88f0..f796037 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7325,7 +7325,6 @@ static struct extent_map *btrfs_create_dio_extent(struct inode *inode, struct extent_map *em = NULL; int ret; - down_read(&BTRFS_I(inode)->dio_sem); if (type != BTRFS_ORDERED_NOCOW) { em = create_pinned_em(inode, start, len, orig_start, block_start, block_len, orig_block_len, @@ -7344,7 +7343,6 @@ static struct extent_map *btrfs_create_dio_extent(struct inode *inode, em = ERR_PTR(ret); } out: - up_read(&BTRFS_I(inode)->dio_sem); return em; } @@ -8800,6 +8798,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) dio_data.unsubmitted_oe_range_start = (u64)offset; dio_data.unsubmitted_oe_range_end = (u64)offset; current->journal_info = &dio_data; + down_read(&BTRFS_I(inode)->dio_sem); } else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, &BTRFS_I(inode)->runtime_flags)) { inode_dio_end(inode); @@ -8812,6 +8811,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) iter, btrfs_get_blocks_direct, NULL, btrfs_submit_direct, flags); if (iov_iter_rw(iter) == WRITE) { + up_read(&BTRFS_I(inode)->dio_sem); current->journal_info = NULL; if (ret < 0 && ret != -EIOCBQUEUED) { if (dio_data.reserve)
The following deadlock is seen when executing generic/113 test, ---------------------------------------------------------+---------------------------------------------------- Direct I/O task Fast fsync task ---------------------------------------------------------+---------------------------------------------------- btrfs_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO btrfs_get_blocks_direct while (blocks needs to written) get_more_blocks (first iteration) btrfs_get_blocks_direct btrfs_create_dio_extent down_read(&BTRFS_I(inode) >dio_sem) Create and add extent map and ordered extent up_read(&BTRFS_I(inode) >dio_sem) btrfs_sync_file btrfs_log_dentry_safe btrfs_log_inode_parent btrfs_log_inode btrfs_log_changed_extents down_write(&BTRFS_I(inode) >dio_sem) Collect new extent maps and ordered extents wait for ordered extent completion get_more_blocks (second iteration) btrfs_get_blocks_direct btrfs_create_dio_extent down_read(&BTRFS_I(inode) >dio_sem) -------------------------------------------------------------------------------------------------------------- In the above description, Btrfs direct I/O code path has not yet started submitting bios for file range covered by the initial ordered extent. Meanwhile, The fast fsync task obtains the write semaphore and waits for I/O on the ordered extent to get completed. However, the Direct I/O task is now blocked on obtaining the read semaphore. To resolve the deadlock, this commit modifies the Direct I/O code path to obtain the read semaphore before invoking __blockdev_direct_IO(). The semaphore is then given up after __blockdev_direct_IO() returns. This allows the Direct I/O code to complete I/O on all the ordered extents it creates. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)