Message ID | 20160630172239.GA23082@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jun 30, 2016 at 07:22:39PM +0200, Christoph Hellwig wrote: > On Tue, Jun 28, 2016 at 10:26:49AM +1000, Dave Chinner wrote: > > Christoph, it look slike there's an ENOSPC+ENOMEM behavioural regression here. > > generic/224 on my 1p/1GB RAM VM using a 1k lock size filesystem has > > significantly different behaviour once ENOSPC is hit withi this patchset. > > > > It ends up with an endless stream of errors like this: > > I've spent some time trying to reproduce this. I'm actually getting > the OOM killer almost reproducible for for-next without the iomap > patches as well when just using 1GB of mem. 1400 MB is the minimum > I can reproducibly finish the test with either code base. > > But with the 1400 MB setup I see a few interesting things. Even > with the baseline, no-iomap case I see a few errors in the log: > > [ 70.407465] Filesystem "vdc": reserve blocks depleted! Consider increasing > reserve pool > size. > [ 70.195645] XFS (vdc): page discard on page ffff88005682a988, inode 0xd3, offset 761856. > [ 70.408079] Buffer I/O error on dev vdc, logical block 1048513, lost async > page write > [ 70.408598] Buffer I/O error on dev vdc, logical block 1048514, lost async > page write > 27s > > With iomap I also see the spew of page discard errors your see, but while > I see a lot of them, the rest still finishes after a reasonable time, > just a few seconds more than the pre-iomap baseline. I also see the > reserve block depleted message in this case. The reserve block pool depleted message is normal for me in this test. We're throwing a thousand concurrent processes at the filesystem at ENOSPC, and so the metadata reservation for the delayed allocation totals quite a lot. We only reserve 8192 blocks for the reserve pool, so a delalloc reservation for one page on each file (4 blocks per page, which means a couple of blocks for the metadata reservation via the indlen calculation) is going to consume the reserve pool quite quickly if the up front reservation overshoots the XFS_ALLOC_SET_ASIDE() ENOSPC threshold. > Digging into the reserve block depleted message - it seems we have > too many parallel iomap_allocate transactions going on. I suspect > this might be because the writeback code will not finish a writeback > context if we have multiple blocks inside a page, which can > happen easily for this 1k ENOSPC setup. Right - this test has regularly triggered that warning on this particular test setup for me - it's not something new to the iomap patchset. > I've not had time to fully > check if this is what really happens, but I did a quick hack (see below) > to only allocate 1k at a time in iomap_begin, and with that generic/224 > finishes without the warning spew. Of course this isn't a real fix, > and I need to fully understand what's going on in writeback due to > different allocation / dirtying patterns from the iomap change. Which tends to indicate that the multi-block allocation has a larger indlen reservation, and that's what is causing the code to hit whatever edge-case that is leading to it not recovering. However, I'm still wondering how we are not throwing ENOSPC back to userspace at XFS_ALLOC_SET_ASIDE limits. Cheers, Dave.
On Thu, Jun 30, 2016 at 07:22:39PM +0200, Christoph Hellwig wrote: > On Tue, Jun 28, 2016 at 10:26:49AM +1000, Dave Chinner wrote: > > Christoph, it look slike there's an ENOSPC+ENOMEM behavioural regression here. > > generic/224 on my 1p/1GB RAM VM using a 1k lock size filesystem has > > significantly different behaviour once ENOSPC is hit withi this patchset. > > > > It ends up with an endless stream of errors like this: > > I've spent some time trying to reproduce this. I'm actually getting > the OOM killer almost reproducible for for-next without the iomap > patches as well when just using 1GB of mem. 1400 MB is the minimum > I can reproducibly finish the test with either code base. > > But with the 1400 MB setup I see a few interesting things. Even > with the baseline, no-iomap case I see a few errors in the log: > > [ 70.407465] Filesystem "vdc": reserve blocks depleted! Consider increasing > reserve pool > size. > [ 70.195645] XFS (vdc): page discard on page ffff88005682a988, inode 0xd3, offset 761856. > [ 70.408079] Buffer I/O error on dev vdc, logical block 1048513, lost async > page write > [ 70.408598] Buffer I/O error on dev vdc, logical block 1048514, lost async > page write > 27s > > With iomap I also see the spew of page discard errors your see, but while > I see a lot of them, the rest still finishes after a reasonable time, > just a few seconds more than the pre-iomap baseline. I also see the > reserve block depleted message in this case. > > Digging into the reserve block depleted message - it seems we have > too many parallel iomap_allocate transactions going on. I suspect > this might be because the writeback code will not finish a writeback > context if we have multiple blocks inside a page, which can > happen easily for this 1k ENOSPC setup. I've not had time to fully > check if this is what really happens, but I did a quick hack (see below) > to only allocate 1k at a time in iomap_begin, and with that generic/224 > finishes without the warning spew. Of course this isn't a real fix, > and I need to fully understand what's going on in writeback due to > different allocation / dirtying patterns from the iomap change. Any progress here, Christoph? The current test run has been running generic/224 on the 1GB mem test Vm for almost 6 hours now, and it's still discarding pages. This doesn't always happen - sometimes it takes the normal amount of time to run, but every so often it falls into this "discard every page" loop and it takes hours to complete... Cheers, Dave.
On Mon, Jul 18, 2016 at 09:14:00PM +1000, Dave Chinner wrote: > On Thu, Jun 30, 2016 at 07:22:39PM +0200, Christoph Hellwig wrote: > > On Tue, Jun 28, 2016 at 10:26:49AM +1000, Dave Chinner wrote: > > > Christoph, it look slike there's an ENOSPC+ENOMEM behavioural regression here. > > > generic/224 on my 1p/1GB RAM VM using a 1k lock size filesystem has > > > significantly different behaviour once ENOSPC is hit withi this patchset. > > > > > > It ends up with an endless stream of errors like this: > > > > I've spent some time trying to reproduce this. I'm actually getting > > the OOM killer almost reproducible for for-next without the iomap > > patches as well when just using 1GB of mem. 1400 MB is the minimum > > I can reproducibly finish the test with either code base. > > > > But with the 1400 MB setup I see a few interesting things. Even > > with the baseline, no-iomap case I see a few errors in the log: > > > > [ 70.407465] Filesystem "vdc": reserve blocks depleted! Consider increasing > > reserve pool > > size. > > [ 70.195645] XFS (vdc): page discard on page ffff88005682a988, inode 0xd3, offset 761856. > > [ 70.408079] Buffer I/O error on dev vdc, logical block 1048513, lost async > > page write > > [ 70.408598] Buffer I/O error on dev vdc, logical block 1048514, lost async > > page write > > 27s > > > > With iomap I also see the spew of page discard errors your see, but while > > I see a lot of them, the rest still finishes after a reasonable time, > > just a few seconds more than the pre-iomap baseline. I also see the > > reserve block depleted message in this case. > > > > Digging into the reserve block depleted message - it seems we have > > too many parallel iomap_allocate transactions going on. I suspect > > this might be because the writeback code will not finish a writeback > > context if we have multiple blocks inside a page, which can > > happen easily for this 1k ENOSPC setup. I've not had time to fully > > check if this is what really happens, but I did a quick hack (see below) > > to only allocate 1k at a time in iomap_begin, and with that generic/224 > > finishes without the warning spew. Of course this isn't a real fix, > > and I need to fully understand what's going on in writeback due to > > different allocation / dirtying patterns from the iomap change. > > Any progress here, Christoph? The current test run has been running > generic/224 on the 1GB mem test Vm for almost 6 hours now, and it's > still discarding pages. This doesn't always happen - sometimes it > takes the normal amount of time to run, but every so often it falls > into this "discard every page" loop and it takes hours to > complete... .... and I've now got a 16p/16GB RAM VM stuck in this loop in generic/224, so it's not limited to low memory machines.... Cheers, Dave.
On Mon, Jul 18, 2016 at 09:14:00PM +1000, Dave Chinner wrote:
> Any progress here, Christoph?
Nothing after the last posting yet. Shortly after that I left to Japan
for about two weeks of hiking and a conference, which doesn't help my
ability to spend some quiet hours with the code. It's on top of
my priority list for non-trivial things and I should be able to get
back it tomorrow after finishing the next conference today..
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 620fc91..d9afba2 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1018,7 +1018,7 @@ xfs_file_iomap_begin( * Note that the values needs to be less than 32-bits wide until * the lower level functions are updated. */ - length = min_t(loff_t, length, 1024 * PAGE_SIZE); + length = min_t(loff_t, length, 1024); if (xfs_get_extsz_hint(ip)) { /* * xfs_iomap_write_direct() expects the shared lock. It