Message ID | 1470245586-14068-1-git-send-email-hch@lst.de (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On Wed, Aug 03, 2016 at 07:33:06PM +0200, Christoph Hellwig wrote: > The space reservations was without an explaination back in commit > > "Add error reporting calls in error paths that return EFSCORRUPTED" > > back in 2003. There is no reason to reserve disk blocks in the > transaction when allocating blocks for delalloc space as we already > reserved the space when creating the delalloc extent. > > With this fix we stop running out of the reserved pool in generic/229, > which has happened for long time with small blocksize file systems, > and has increased in severity with the new buffered write path. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_iomap.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 2114d53..279353c 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -691,7 +691,6 @@ xfs_iomap_write_allocate( > xfs_trans_t *tp; > int nimaps; > int error = 0; > - int nres; > > /* > * Make sure that the dquots are there. > @@ -715,12 +714,15 @@ xfs_iomap_write_allocate( > * is in the delayed allocation extent on which we sit > * but before our buffer starts. > */ > - > nimaps = 0; > while (nimaps == 0) { > - nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); > - > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, nres, > + /* > + * We have already reserved space for the extent and any > + * indirect blocks when creating the delalloc extent, > + * there is no need to reserve space in this transaction > + * again. > + */ > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, > 0, XFS_TRANS_RESERVE, &tp); > if (error) > return error; > @@ -783,7 +785,7 @@ xfs_iomap_write_allocate( > */ > error = xfs_bmapi_write(tp, ip, map_start_fsb, > count_fsb, 0, &first_block, > - nres, imap, &nimaps, > + 0, imap, &nimaps, > &dfops); I don't think this part of the fix is correct. nres feeds into args->total which is then used during the AGFL fixup checks. If this is not set correctly, then we'll select AGs we have enough space in the AG to fix up the AGFL, but not enough space to allocate all the BMBT blocks we require. That then leads to ABBA deadlocks on AGF locks near ENOSPC - see commit dbd5c8c ("xfs: pass total block res. as total xfs_bmapi_write() parameter") for the full details. I've been testing a local version of this fix since you pointed out the problem that still passed nres into xfs_bmapi_write() and I haven't seen any problems, so I think it is correct to keep nres here. I'm going to drop this hunk from this patch for the moment in my tree. Cheers, Dave.
On Fri, Aug 05, 2016 at 10:03:54AM +1000, Dave Chinner wrote: > I don't think this part of the fix is correct. nres feeds into > args->total which is then used during the AGFL fixup checks. If this > is not set correctly, then we'll select AGs we have enough space in > the AG to fix up the AGFL, but not enough space to allocate all the > BMBT blocks we require. That then leads to ABBA deadlocks on AGF > locks near ENOSPC - see commit dbd5c8c ("xfs: pass total block > res. as total xfs_bmapi_write() parameter") for the full details. I've been going forth and back between both versions and both have tested fine - I couldn't really convince me which one is more correct. > I've been testing a local version of this fix since you pointed out > the problem that still passed nres into xfs_bmapi_write() and I > haven't seen any problems, so I think it is correct to keep nres > here. I'm going to drop this hunk from this patch for the moment in > my tree. Ok, sounds fine. If you want a real resend let me know.
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 2114d53..279353c 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -691,7 +691,6 @@ xfs_iomap_write_allocate( xfs_trans_t *tp; int nimaps; int error = 0; - int nres; /* * Make sure that the dquots are there. @@ -715,12 +714,15 @@ xfs_iomap_write_allocate( * is in the delayed allocation extent on which we sit * but before our buffer starts. */ - nimaps = 0; while (nimaps == 0) { - nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); - - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, nres, + /* + * We have already reserved space for the extent and any + * indirect blocks when creating the delalloc extent, + * there is no need to reserve space in this transaction + * again. + */ + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, XFS_TRANS_RESERVE, &tp); if (error) return error; @@ -783,7 +785,7 @@ xfs_iomap_write_allocate( */ error = xfs_bmapi_write(tp, ip, map_start_fsb, count_fsb, 0, &first_block, - nres, imap, &nimaps, + 0, imap, &nimaps, &dfops); if (error) goto trans_cancel;
The space reservations was without an explaination back in commit "Add error reporting calls in error paths that return EFSCORRUPTED" back in 2003. There is no reason to reserve disk blocks in the transaction when allocating blocks for delalloc space as we already reserved the space when creating the delalloc extent. With this fix we stop running out of the reserved pool in generic/229, which has happened for long time with small blocksize file systems, and has increased in severity with the new buffered write path. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_iomap.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)