Message ID | cover.1713363472.git.josef@toxicpanda.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: restrain lock extent usage during writeback | expand |
On Wed, Apr 17, 2024 at 10:35:44AM -0400, Josef Bacik wrote: > discreet set of operations. Being able to clearly define what the extent lock > is protecting will give us a better idea of how to reduce it's usage or possibly > replace it in the future. It should also allow to stop taking it in ->read_folio and ->readahead, which is what is making the btrfs I/O path so weird and incompatbile with the rest of the kernel. I tried to get rid of just that a while ago but spectacularly failed. Maybe doing this in smaller steps and by someone knowning the code better is going to be more successful.
On 22:54 17/04, Christoph Hellwig wrote: > On Wed, Apr 17, 2024 at 10:35:44AM -0400, Josef Bacik wrote: > > discreet set of operations. Being able to clearly define what the extent lock > > is protecting will give us a better idea of how to reduce it's usage or possibly > > replace it in the future. > > It should also allow to stop taking it in ->read_folio and ->readahead, > which is what is making the btrfs I/O path so weird and incompatbile > with the rest of the kernel. I tried to get rid of just that a while > ago but spectacularly failed. Maybe doing this in smaller steps and > by someone knowning the code better is going to be more successful. > The only reason I have encountered for taking extent locks during reads is for checksums. read()s collects checksums before submitting the bio where as writeback() adds the checksums during bio completion. So, there is a small window where a read() performed immediately after writeback+truncate pages would give an EIO because the checksum is not in the checksum tree and does not match the calculated checksum. If we can delay retrieving the checksum or wait for ordered extents to complete before performing the read, I think avoiding extent locks during read is possible.
On Thu, Apr 18, 2024 at 08:45:35AM -0500, Goldwyn Rodrigues wrote: > The only reason I have encountered for taking extent locks during reads > is for checksums. read()s collects checksums before submitting the bio > where as writeback() adds the checksums during bio completion. > > So, there is a small window where a read() performed immediately after > writeback+truncate pages would give an EIO because the checksum is > not in the checksum tree and does not match the calculated checksum. > > If we can delay retrieving the checksum or wait for ordered extents to > complete before performing the read, I think avoiding extent locks > during read is possible. And the fix for that is to only clear the writeback bit once the ordered extent processing has finished, which is the other bit making btrfs I/O so different from the core kernels expectations. It is highly coupled with the extent lock semantics as far as I can tell.
On Thu, Apr 18, 2024 at 07:47:26AM -0700, Christoph Hellwig wrote: > On Thu, Apr 18, 2024 at 08:45:35AM -0500, Goldwyn Rodrigues wrote: > > The only reason I have encountered for taking extent locks during reads > > is for checksums. read()s collects checksums before submitting the bio > > where as writeback() adds the checksums during bio completion. > > > > So, there is a small window where a read() performed immediately after > > writeback+truncate pages would give an EIO because the checksum is > > not in the checksum tree and does not match the calculated checksum. > > > > If we can delay retrieving the checksum or wait for ordered extents to > > complete before performing the read, I think avoiding extent locks > > during read is possible. > > And the fix for that is to only clear the writeback bit once the > ordered extent processing has finished, which is the other bit > making btrfs I/O so different from the core kernels expectations. > It is highly coupled with the extent lock semantics as far as I > can tell. > They aren't coupled, but they are related. My follow-up work is to make this change and see what breaks, and then tackle the read side. This should pave the way for a straightforward iomap conversion. Thanks, Josef