Message ID | 20170920110504.GU8034@eguan.usersys.redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Sep 20, 2017 at 07:05:04PM +0800, Eryu Guan wrote: > On Tue, Sep 19, 2017 at 07:34:06AM -0700, Christoph Hellwig wrote: > > On Tue, Sep 19, 2017 at 10:13:52AM -0400, Brian Foster wrote: > > > Can we pass a boolean or flag to xfs_iomap_write_unwritten() to have it > > > update the incore i_size after unwritten extent conversion? Then move > > > (or remove) the associated update from xfs_dio_write_end_io(). > > > > I don't think we even need a flag - all three callers of > > xfs_iomap_write_unwritten want to update the file size. > > I tried this approach, but seems there's some problem in the buffered > aio path, generic/112 (aio fsx) failed quickly. But I haven't digged > into the reason (maybe I screwed it up, not the method is wrong..). > From the generic/112 results: Size error: expected 0x3785b stat 0x38000 seek 0x38000 I suspect the problem is that the offset+size from buffered I/O completion is not based on the inode size. Rather, it is buffer head granularity size of the ioend. Given that, it probably does make sense to skip the update from this path. > Then I tried Brian's suggestion, pass a boolean to > xfs_iomap_write_unwritten() to tell if we want it to update in-core > isize after unwritten extent conversion, and skip the in-core isize > update in xfs_dio_write_end_io() accordingly. This approach seems to > work, it passed the test Eric posted here, and fstests 'aio' group > tests, a run of 'quick' group didn't find any new failure as well. > > I attached the WIP patch (without proper comments) I was testing, if > this looks fine I can format a formal patch and do more testings. > Thanks. This mostly looks reasonable to me... > Thanks, > Eryu > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 29172609f2a3..288da47e9ac5 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -343,7 +343,7 @@ xfs_end_io( > error = xfs_reflink_end_cow(ip, offset, size); > break; > case XFS_IO_UNWRITTEN: > - error = xfs_iomap_write_unwritten(ip, offset, size); > + error = xfs_iomap_write_unwritten(ip, offset, size, false); Maybe add a single line comment here wrt to the above (why we don't update isize). > break; > default: > ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans); > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 350b6d43ba23..f3ad024573e7 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -435,6 +435,7 @@ xfs_dio_write_end_io( > struct xfs_inode *ip = XFS_I(inode); > loff_t offset = iocb->ki_pos; > bool update_size = false; > + bool write_unwritten = (flags & IOMAP_DIO_UNWRITTEN); > int error = 0; > > trace_xfs_end_io_direct_write(ip, offset, size); > @@ -458,7 +459,8 @@ xfs_dio_write_end_io( > */ > spin_lock(&ip->i_flags_lock); > if (offset + size > i_size_read(inode)) { > - i_size_write(inode, offset + size); > + if (!write_unwritten) > + i_size_write(inode, offset + size); > update_size = true; I find the logic a little confusing here. For !write_unwritten, update_size means to update the on-disk size. Otherwise, it instructs iomap_write_unwritten() to also update the in-core size. The latter also checks for appends, however, so it might as well always be true from here. It seems that for such a small function, we should be able to make this a bit easier to follow. ;P Should we ever see an isize update at all on a IOMAP_DIO_COW completion (wouldn't reflinked blocks have to be within eof)? If not, then we can presumably rule out isize updates in that case. I think that just leaves the case where a dio write occurs on a pre-existing block. Hmm, could we just move this whole hunk down to after the iomap_write_unwritten() call and eliminate the need for the flag entirely? E.g., something like: ... if (IOMAP_DIO_COW) { ... } /* unwritten conversion updates isize */ if (IOMAP_DIO_UNWRITTEN) return xfs_iomap_write_unwritten(ip, offset, size, true); if (offset + size > i_size_read(inode)) { i_size_write(inode, offset + size); error = xfs_setfilesize(...); } return error; > } > spin_unlock(&ip->i_flags_lock); > @@ -469,8 +471,8 @@ xfs_dio_write_end_io( > return error; > } > > - if (flags & IOMAP_DIO_UNWRITTEN) > - error = xfs_iomap_write_unwritten(ip, offset, size); > + if (write_unwritten) > + error = xfs_iomap_write_unwritten(ip, offset, size, update_size); > else if (update_size) > error = xfs_setfilesize(ip, offset, size); > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index a1909bc064e9..0a088586371e 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -829,7 +829,8 @@ int > xfs_iomap_write_unwritten( > xfs_inode_t *ip, > xfs_off_t offset, > - xfs_off_t count) > + xfs_off_t count, > + bool update_size) > { > xfs_mount_t *mp = ip->i_mount; > xfs_fileoff_t offset_fsb; > @@ -840,6 +841,7 @@ xfs_iomap_write_unwritten( > xfs_trans_t *tp; > xfs_bmbt_irec_t imap; > struct xfs_defer_ops dfops; > + struct inode *inode = VFS_I(ip); > xfs_fsize_t i_size; > uint resblks; > int error; > @@ -900,6 +902,13 @@ xfs_iomap_write_unwritten( > if (i_size > offset + count) > i_size = offset + count; > > + if (update_size) { > + spin_lock(&ip->i_flags_lock); > + if (i_size > i_size_read(inode)) > + i_size_write(inode, i_size); > + spin_unlock(&ip->i_flags_lock); We have XFS_ILOCK_EXCL here so I don't think the spinlocks are necessary any longer. That means this could probably be condensed to something like if (update_size && i_size > i_size_read(inode)) i_size_write(inode, i_size) > + } > + > i_size = xfs_new_eof(ip, i_size); > if (i_size) { > ip->i_d.di_size = i_size; > diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h > index 00db3ecea084..ee535065c5d0 100644 > --- a/fs/xfs/xfs_iomap.h > +++ b/fs/xfs/xfs_iomap.h > @@ -27,7 +27,7 @@ int xfs_iomap_write_direct(struct xfs_inode *, xfs_off_t, size_t, > struct xfs_bmbt_irec *, int); > int xfs_iomap_write_allocate(struct xfs_inode *, int, xfs_off_t, > struct xfs_bmbt_irec *); > -int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t); > +int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool); > > void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *, > struct xfs_bmbt_irec *); > diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c > index 2f2dc3c09ad0..4246876df7b7 100644 > --- a/fs/xfs/xfs_pnfs.c > +++ b/fs/xfs/xfs_pnfs.c > @@ -274,7 +274,7 @@ xfs_fs_commit_blocks( > (end - 1) >> PAGE_SHIFT); > WARN_ON_ONCE(error); > > - error = xfs_iomap_write_unwritten(ip, start, length); > + error = xfs_iomap_write_unwritten(ip, start, length, false); Note that this path does update isize. It runs another transaction to get around the fact that write_unwritten() wouldn't log a new on disk size. There is some validation thing here though, so we might need to check whether it is Ok to run that earlier and whether it should continue to return an error on failure. Christoph? That said, maybe a follow on patch would be better since this one might be stable fodder. Brian > if (error) > goto out_drop_iolock; > } > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 20, 2017 at 08:55:30AM -0400, Brian Foster wrote: > On Wed, Sep 20, 2017 at 07:05:04PM +0800, Eryu Guan wrote: > > On Tue, Sep 19, 2017 at 07:34:06AM -0700, Christoph Hellwig wrote: > > > On Tue, Sep 19, 2017 at 10:13:52AM -0400, Brian Foster wrote: > > > > Can we pass a boolean or flag to xfs_iomap_write_unwritten() to have it > > > > update the incore i_size after unwritten extent conversion? Then move > > > > (or remove) the associated update from xfs_dio_write_end_io(). > > > > > > I don't think we even need a flag - all three callers of > > > xfs_iomap_write_unwritten want to update the file size. > > > > I tried this approach, but seems there's some problem in the buffered > > aio path, generic/112 (aio fsx) failed quickly. But I haven't digged > > into the reason (maybe I screwed it up, not the method is wrong..). > > > > From the generic/112 results: > > Size error: expected 0x3785b stat 0x38000 seek 0x38000 > > I suspect the problem is that the offset+size from buffered I/O > completion is not based on the inode size. Rather, it is buffer head > granularity size of the ioend. Given that, it probably does make sense > to skip the update from this path. Yeah, you're right. xfs_io -fc "falloc -k 0 4k" -c "pwrite 0 2k" -c fsync /mnt/xfs/testfile This results in 4k file size (block size is also 4k). > > > Then I tried Brian's suggestion, pass a boolean to > > xfs_iomap_write_unwritten() to tell if we want it to update in-core > > isize after unwritten extent conversion, and skip the in-core isize > > update in xfs_dio_write_end_io() accordingly. This approach seems to > > work, it passed the test Eric posted here, and fstests 'aio' group > > tests, a run of 'quick' group didn't find any new failure as well. > > > > I attached the WIP patch (without proper comments) I was testing, if > > this looks fine I can format a formal patch and do more testings. > > > > Thanks. This mostly looks reasonable to me... > > > Thanks, > > Eryu > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > index 29172609f2a3..288da47e9ac5 100644 > > --- a/fs/xfs/xfs_aops.c > > +++ b/fs/xfs/xfs_aops.c > > @@ -343,7 +343,7 @@ xfs_end_io( > > error = xfs_reflink_end_cow(ip, offset, size); > > break; > > case XFS_IO_UNWRITTEN: > > - error = xfs_iomap_write_unwritten(ip, offset, size); > > + error = xfs_iomap_write_unwritten(ip, offset, size, false); > > Maybe add a single line comment here wrt to the above (why we don't > update isize). Will do. > > > break; > > default: > > ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans); > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index 350b6d43ba23..f3ad024573e7 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -435,6 +435,7 @@ xfs_dio_write_end_io( > > struct xfs_inode *ip = XFS_I(inode); > > loff_t offset = iocb->ki_pos; > > bool update_size = false; > > + bool write_unwritten = (flags & IOMAP_DIO_UNWRITTEN); > > int error = 0; > > > > trace_xfs_end_io_direct_write(ip, offset, size); > > @@ -458,7 +459,8 @@ xfs_dio_write_end_io( > > */ > > spin_lock(&ip->i_flags_lock); > > if (offset + size > i_size_read(inode)) { > > - i_size_write(inode, offset + size); > > + if (!write_unwritten) > > + i_size_write(inode, offset + size); > > update_size = true; > > I find the logic a little confusing here. For !write_unwritten, > update_size means to update the on-disk size. Otherwise, it instructs > iomap_write_unwritten() to also update the in-core size. The latter also > checks for appends, however, so it might as well always be true from > here. It seems that for such a small function, we should be able to make > this a bit easier to follow. ;P Agreed, it looks confusing when update_size serves as indicators of both in-core and on-disk size update. And we can pass 'true' unconditionally to xfs_iomap_write_unwritten() in this case. > > Should we ever see an isize update at all on a IOMAP_DIO_COW completion > (wouldn't reflinked blocks have to be within eof)? If not, then we can > presumably rule out isize updates in that case. I think that just leaves > the case where a dio write occurs on a pre-existing block. Hmm, could we > just move this whole hunk down to after the iomap_write_unwritten() call > and eliminate the need for the flag entirely? E.g., something like: > > ... > if (IOMAP_DIO_COW) { > ... > } > > /* unwritten conversion updates isize */ > if (IOMAP_DIO_UNWRITTEN) > return xfs_iomap_write_unwritten(ip, offset, size, true); > > if (offset + size > i_size_read(inode)) { > i_size_write(inode, offset + size); > error = xfs_setfilesize(...); > } > > return error; This seems fine, tests in 'clone' group all passed without new failures, I'll update patch as you suggested. I thought about something like this too, but I'm not familiar with the CoW path, so I decided to keep the original logic as much as possible. Thanks for your review and suggestions! Eryu > > > } > > spin_unlock(&ip->i_flags_lock); > > @@ -469,8 +471,8 @@ xfs_dio_write_end_io( > > return error; > > } > > > > - if (flags & IOMAP_DIO_UNWRITTEN) > > - error = xfs_iomap_write_unwritten(ip, offset, size); > > + if (write_unwritten) > > + error = xfs_iomap_write_unwritten(ip, offset, size, update_size); > > else if (update_size) > > error = xfs_setfilesize(ip, offset, size); > > > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > > index a1909bc064e9..0a088586371e 100644 > > --- a/fs/xfs/xfs_iomap.c > > +++ b/fs/xfs/xfs_iomap.c > > @@ -829,7 +829,8 @@ int > > xfs_iomap_write_unwritten( > > xfs_inode_t *ip, > > xfs_off_t offset, > > - xfs_off_t count) > > + xfs_off_t count, > > + bool update_size) > > { > > xfs_mount_t *mp = ip->i_mount; > > xfs_fileoff_t offset_fsb; > > @@ -840,6 +841,7 @@ xfs_iomap_write_unwritten( > > xfs_trans_t *tp; > > xfs_bmbt_irec_t imap; > > struct xfs_defer_ops dfops; > > + struct inode *inode = VFS_I(ip); > > xfs_fsize_t i_size; > > uint resblks; > > int error; > > @@ -900,6 +902,13 @@ xfs_iomap_write_unwritten( > > if (i_size > offset + count) > > i_size = offset + count; > > > > + if (update_size) { > > + spin_lock(&ip->i_flags_lock); > > + if (i_size > i_size_read(inode)) > > + i_size_write(inode, i_size); > > + spin_unlock(&ip->i_flags_lock); > > We have XFS_ILOCK_EXCL here so I don't think the spinlocks are > necessary any longer. That means this could probably be condensed to > something like > > if (update_size && i_size > i_size_read(inode)) > i_size_write(inode, i_size) > > > + } > > + > > i_size = xfs_new_eof(ip, i_size); > > if (i_size) { > > ip->i_d.di_size = i_size; > > diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h > > index 00db3ecea084..ee535065c5d0 100644 > > --- a/fs/xfs/xfs_iomap.h > > +++ b/fs/xfs/xfs_iomap.h > > @@ -27,7 +27,7 @@ int xfs_iomap_write_direct(struct xfs_inode *, xfs_off_t, size_t, > > struct xfs_bmbt_irec *, int); > > int xfs_iomap_write_allocate(struct xfs_inode *, int, xfs_off_t, > > struct xfs_bmbt_irec *); > > -int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t); > > +int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool); > > > > void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *, > > struct xfs_bmbt_irec *); > > diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c > > index 2f2dc3c09ad0..4246876df7b7 100644 > > --- a/fs/xfs/xfs_pnfs.c > > +++ b/fs/xfs/xfs_pnfs.c > > @@ -274,7 +274,7 @@ xfs_fs_commit_blocks( > > (end - 1) >> PAGE_SHIFT); > > WARN_ON_ONCE(error); > > > > - error = xfs_iomap_write_unwritten(ip, start, length); > > + error = xfs_iomap_write_unwritten(ip, start, length, false); > > Note that this path does update isize. It runs another transaction to > get around the fact that write_unwritten() wouldn't log a new on disk > size. There is some validation thing here though, so we might need to > check whether it is Ok to run that earlier and whether it should > continue to return an error on failure. Christoph? > > That said, maybe a follow on patch would be better since this one might > be stable fodder. > > Brian > > > if (error) > > goto out_drop_iolock; > > } > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 29172609f2a3..288da47e9ac5 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -343,7 +343,7 @@ xfs_end_io( error = xfs_reflink_end_cow(ip, offset, size); break; case XFS_IO_UNWRITTEN: - error = xfs_iomap_write_unwritten(ip, offset, size); + error = xfs_iomap_write_unwritten(ip, offset, size, false); break; default: ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans); diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 350b6d43ba23..f3ad024573e7 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -435,6 +435,7 @@ xfs_dio_write_end_io( struct xfs_inode *ip = XFS_I(inode); loff_t offset = iocb->ki_pos; bool update_size = false; + bool write_unwritten = (flags & IOMAP_DIO_UNWRITTEN); int error = 0; trace_xfs_end_io_direct_write(ip, offset, size); @@ -458,7 +459,8 @@ xfs_dio_write_end_io( */ spin_lock(&ip->i_flags_lock); if (offset + size > i_size_read(inode)) { - i_size_write(inode, offset + size); + if (!write_unwritten) + i_size_write(inode, offset + size); update_size = true; } spin_unlock(&ip->i_flags_lock); @@ -469,8 +471,8 @@ xfs_dio_write_end_io( return error; } - if (flags & IOMAP_DIO_UNWRITTEN) - error = xfs_iomap_write_unwritten(ip, offset, size); + if (write_unwritten) + error = xfs_iomap_write_unwritten(ip, offset, size, update_size); else if (update_size) error = xfs_setfilesize(ip, offset, size); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index a1909bc064e9..0a088586371e 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -829,7 +829,8 @@ int xfs_iomap_write_unwritten( xfs_inode_t *ip, xfs_off_t offset, - xfs_off_t count) + xfs_off_t count, + bool update_size) { xfs_mount_t *mp = ip->i_mount; xfs_fileoff_t offset_fsb; @@ -840,6 +841,7 @@ xfs_iomap_write_unwritten( xfs_trans_t *tp; xfs_bmbt_irec_t imap; struct xfs_defer_ops dfops; + struct inode *inode = VFS_I(ip); xfs_fsize_t i_size; uint resblks; int error; @@ -900,6 +902,13 @@ xfs_iomap_write_unwritten( if (i_size > offset + count) i_size = offset + count; + if (update_size) { + spin_lock(&ip->i_flags_lock); + if (i_size > i_size_read(inode)) + i_size_write(inode, i_size); + spin_unlock(&ip->i_flags_lock); + } + i_size = xfs_new_eof(ip, i_size); if (i_size) { ip->i_d.di_size = i_size; diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h index 00db3ecea084..ee535065c5d0 100644 --- a/fs/xfs/xfs_iomap.h +++ b/fs/xfs/xfs_iomap.h @@ -27,7 +27,7 @@ int xfs_iomap_write_direct(struct xfs_inode *, xfs_off_t, size_t, struct xfs_bmbt_irec *, int); int xfs_iomap_write_allocate(struct xfs_inode *, int, xfs_off_t, struct xfs_bmbt_irec *); -int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t); +int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool); void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *, struct xfs_bmbt_irec *); diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c index 2f2dc3c09ad0..4246876df7b7 100644 --- a/fs/xfs/xfs_pnfs.c +++ b/fs/xfs/xfs_pnfs.c @@ -274,7 +274,7 @@ xfs_fs_commit_blocks( (end - 1) >> PAGE_SHIFT); WARN_ON_ONCE(error); - error = xfs_iomap_write_unwritten(ip, start, length); + error = xfs_iomap_write_unwritten(ip, start, length, false); if (error) goto out_drop_iolock; }