Message ID | d879704250b5f890a755873aefe3171cbd193ae9.1675093524.git.ritesh.list@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iomap: Add support for subpage dirty state tracking to improve write performance | expand |
On Mon, Jan 30, 2023 at 09:44:11PM +0530, Ritesh Harjani (IBM) wrote: > The problem is that commit[1] moved iop creation later i.e. after checking for > whether the folio is uptodate. And if the folio is uptodate, it simply > returns and doesn't allocate a iop. > Now what can happen is that during __iomap_write_begin() for bs < ps, > there can be a folio which is marked uptodate but does not have a iomap_page > structure allocated. > (I think one of the reason it can happen is due to memory pressure, we > can end up freeing folio->private resource). > > Thus the iop structure will only gets allocated at the time of writeback > in iomap_writepage_map(). This I think, was a not problem till now since > we anyway only track uptodate status in iop (no support of tracking > dirty bitmap status which later patches will add), and we also end up > setting all the bits in iomap_page_create(), if the page is uptodate. delayed iop allocation is a feature and not a bug. We might have to refine the criteria for sub-page dirty tracking, but in general having the iop allocates is a memory and performance overhead and should be avoided as much as possible. In fact I still have some unfinished work to allocate it even more lazily.
On 23/01/30 09:02AM, Christoph Hellwig wrote: > On Mon, Jan 30, 2023 at 09:44:11PM +0530, Ritesh Harjani (IBM) wrote: > > The problem is that commit[1] moved iop creation later i.e. after checking for > > whether the folio is uptodate. And if the folio is uptodate, it simply > > returns and doesn't allocate a iop. > > Now what can happen is that during __iomap_write_begin() for bs < ps, > > there can be a folio which is marked uptodate but does not have a iomap_page > > structure allocated. > > (I think one of the reason it can happen is due to memory pressure, we > > can end up freeing folio->private resource). > > > > Thus the iop structure will only gets allocated at the time of writeback > > in iomap_writepage_map(). This I think, was a not problem till now since > > we anyway only track uptodate status in iop (no support of tracking > > dirty bitmap status which later patches will add), and we also end up > > setting all the bits in iomap_page_create(), if the page is uptodate. > > delayed iop allocation is a feature and not a bug. We might have to > refine the criteria for sub-page dirty tracking, but in general having > the iop allocates is a memory and performance overhead and should be > avoided as much as possible. In fact I still have some unfinished > work to allocate it even more lazily. So, what I meant here was that the commit[1] chaged the behavior/functionality without indenting to. I agree it's not a bug. But when I added dirty bitmap tracking support, I couldn't understand for sometime on why were we allocating iop only at the time of writeback. And it was due to a small line change which somehow slipped into this commit [1]. Hence I made this as a seperate patch so that it doesn't slip through again w/o getting noticed/review. Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but with subpage dirty tracking in iop->state[], if we end up allocating iop only at the time of writeback, than that might cause some performance degradation compared to, if we allocat iop at ->write_begin() and mark the required dirty bit ranges in ->write_end(). Like how we do in this patch series. (Ofcourse it is true only for bs < ps use case). [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/ Thanks again for your quick review!! -ritesh
On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote: > > > Thus the iop structure will only gets allocated at the time of writeback > > > in iomap_writepage_map(). This I think, was a not problem till now since > > > we anyway only track uptodate status in iop (no support of tracking > > > dirty bitmap status which later patches will add), and we also end up > > > setting all the bits in iomap_page_create(), if the page is uptodate. > > > > delayed iop allocation is a feature and not a bug. We might have to > > refine the criteria for sub-page dirty tracking, but in general having > > the iop allocates is a memory and performance overhead and should be > > avoided as much as possible. In fact I still have some unfinished > > work to allocate it even more lazily. > > So, what I meant here was that the commit[1] chaged the behavior/functionality > without indenting to. I agree it's not a bug. It didn't change the behaviour or functionality. It broke your patches, but it certainly doesn't deserve its own commit reverting it -- because it's not wrong. > But when I added dirty bitmap tracking support, I couldn't understand for > sometime on why were we allocating iop only at the time of writeback. > And it was due to a small line change which somehow slipped into this commit [1]. > Hence I made this as a seperate patch so that it doesn't slip through again w/o > getting noticed/review. It didn't "slip through". It was intended. > Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but > with subpage dirty tracking in iop->state[], if we end up allocating iop only > at the time of writeback, than that might cause some performance degradation > compared to, if we allocat iop at ->write_begin() and mark the required dirty > bit ranges in ->write_end(). Like how we do in this patch series. > (Ofcourse it is true only for bs < ps use case). > > [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/ You absolutely can allocate it in iomap_write_begin, but you can avoid allocating it until writeback time if (pos, len) entirely overlap the folio. ie: if (pos > folio_pos(folio) || pos + len < folio_pos(folio) + folio_size(folio)) iop = iomap_page_create(iter->inode, folio, iter->flags, false);
On 23/01/30 09:00PM, Matthew Wilcox wrote: > On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote: > > > > Thus the iop structure will only gets allocated at the time of writeback > > > > in iomap_writepage_map(). This I think, was a not problem till now since > > > > we anyway only track uptodate status in iop (no support of tracking > > > > dirty bitmap status which later patches will add), and we also end up > > > > setting all the bits in iomap_page_create(), if the page is uptodate. > > > > > > delayed iop allocation is a feature and not a bug. We might have to > > > refine the criteria for sub-page dirty tracking, but in general having > > > the iop allocates is a memory and performance overhead and should be > > > avoided as much as possible. In fact I still have some unfinished > > > work to allocate it even more lazily. > > > > So, what I meant here was that the commit[1] chaged the behavior/functionality > > without indenting to. I agree it's not a bug. > > It didn't change the behaviour or functionality. It broke your patches, > but it certainly doesn't deserve its own commit reverting it -- because > it's not wrong. > > > But when I added dirty bitmap tracking support, I couldn't understand for > > sometime on why were we allocating iop only at the time of writeback. > > And it was due to a small line change which somehow slipped into this commit [1]. > > Hence I made this as a seperate patch so that it doesn't slip through again w/o > > getting noticed/review. > > It didn't "slip through". It was intended. > > > Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but > > with subpage dirty tracking in iop->state[], if we end up allocating iop only > > at the time of writeback, than that might cause some performance degradation > > compared to, if we allocat iop at ->write_begin() and mark the required dirty > > bit ranges in ->write_end(). Like how we do in this patch series. > > (Ofcourse it is true only for bs < ps use case). > > > > [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/ > > You absolutely can allocate it in iomap_write_begin, but you can avoid > allocating it until writeback time if (pos, len) entirely overlap the > folio. ie: > > if (pos > folio_pos(folio) || > pos + len < folio_pos(folio) + folio_size(folio)) > iop = iomap_page_create(iter->inode, folio, iter->flags, false); Thanks for the suggestion. However do you think it will be better if this is introduced along with lazy allocation changes which Christoph was mentioning about? Why I am thinking that is because, with above approach we delay the allocation of iop until writeback, for entire folio overlap case. But then later in __iomap_write_begin(), we require iop if folio is not uptodate. Hence we again will have to do some checks to see if the iop is not allocated then allocate it (which is for entire folio overlap case). That somehow looked like an overkill for a very little gain in the context of this patch series. Kindly let me know your thoughts on this. -ritesh
On Wed, Feb 01, 2023 at 12:07:25AM +0530, Ritesh Harjani (IBM) wrote: > On 23/01/30 09:00PM, Matthew Wilcox wrote: > > On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote: > > > > > Thus the iop structure will only gets allocated at the time of writeback > > > > > in iomap_writepage_map(). This I think, was a not problem till now since > > > > > we anyway only track uptodate status in iop (no support of tracking > > > > > dirty bitmap status which later patches will add), and we also end up > > > > > setting all the bits in iomap_page_create(), if the page is uptodate. > > > > > > > > delayed iop allocation is a feature and not a bug. We might have to > > > > refine the criteria for sub-page dirty tracking, but in general having > > > > the iop allocates is a memory and performance overhead and should be > > > > avoided as much as possible. In fact I still have some unfinished > > > > work to allocate it even more lazily. > > > > > > So, what I meant here was that the commit[1] chaged the behavior/functionality > > > without indenting to. I agree it's not a bug. > > > > It didn't change the behaviour or functionality. It broke your patches, > > but it certainly doesn't deserve its own commit reverting it -- because > > it's not wrong. > > > > > But when I added dirty bitmap tracking support, I couldn't understand for > > > sometime on why were we allocating iop only at the time of writeback. > > > And it was due to a small line change which somehow slipped into this commit [1]. > > > Hence I made this as a seperate patch so that it doesn't slip through again w/o > > > getting noticed/review. > > > > It didn't "slip through". It was intended. > > > > > Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but > > > with subpage dirty tracking in iop->state[], if we end up allocating iop only > > > at the time of writeback, than that might cause some performance degradation > > > compared to, if we allocat iop at ->write_begin() and mark the required dirty > > > bit ranges in ->write_end(). Like how we do in this patch series. > > > (Ofcourse it is true only for bs < ps use case). > > > > > > [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/ > > > > You absolutely can allocate it in iomap_write_begin, but you can avoid > > allocating it until writeback time if (pos, len) entirely overlap the > > folio. ie: > > > > if (pos > folio_pos(folio) || > > pos + len < folio_pos(folio) + folio_size(folio)) > > iop = iomap_page_create(iter->inode, folio, iter->flags, false); > > Thanks for the suggestion. However do you think it will be better if this is > introduced along with lazy allocation changes which Christoph was mentioning > about? > Why I am thinking that is because, with above approach we delay the allocation > of iop until writeback, for entire folio overlap case. But then later > in __iomap_write_begin(), we require iop if folio is not uptodate. > Hence we again will have to do some checks to see if the iop is not allocated > then allocate it (which is for entire folio overlap case). > That somehow looked like an overkill for a very little gain in the context of > this patch series. Kindly let me know your thoughts on this. Look at *why* __iomap_write_begin() allocates an iop. It's to read in the blocks which are going to be partially-overwritten by the write. If the write overlaps the entire folio, there are no parts which need to be read in, and we can simply return. Maybe we should make that more obvious: if (folio_test_uptodate(folio)) return 0; if (pos <= folio_pos(folio) && pos + len >= folio_pos(folio) + folio_size(folio)) return 0; folio_clear_error(folio); (I think pos must always be >= folio_pos(), so that <= could be ==, but it doesn't hurt anything to use <=)
On 23/01/31 06:48PM, Matthew Wilcox wrote: > On Wed, Feb 01, 2023 at 12:07:25AM +0530, Ritesh Harjani (IBM) wrote: > > On 23/01/30 09:00PM, Matthew Wilcox wrote: > > > On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote: > > > > > > Thus the iop structure will only gets allocated at the time of writeback > > > > > > in iomap_writepage_map(). This I think, was a not problem till now since > > > > > > we anyway only track uptodate status in iop (no support of tracking > > > > > > dirty bitmap status which later patches will add), and we also end up > > > > > > setting all the bits in iomap_page_create(), if the page is uptodate. > > > > > > > > > > delayed iop allocation is a feature and not a bug. We might have to > > > > > refine the criteria for sub-page dirty tracking, but in general having > > > > > the iop allocates is a memory and performance overhead and should be > > > > > avoided as much as possible. In fact I still have some unfinished > > > > > work to allocate it even more lazily. > > > > > > > > So, what I meant here was that the commit[1] chaged the behavior/functionality > > > > without indenting to. I agree it's not a bug. > > > > > > It didn't change the behaviour or functionality. It broke your patches, > > > but it certainly doesn't deserve its own commit reverting it -- because > > > it's not wrong. > > > > > > > But when I added dirty bitmap tracking support, I couldn't understand for > > > > sometime on why were we allocating iop only at the time of writeback. > > > > And it was due to a small line change which somehow slipped into this commit [1]. > > > > Hence I made this as a seperate patch so that it doesn't slip through again w/o > > > > getting noticed/review. > > > > > > It didn't "slip through". It was intended. > > > > > > > Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but > > > > with subpage dirty tracking in iop->state[], if we end up allocating iop only > > > > at the time of writeback, than that might cause some performance degradation > > > > compared to, if we allocat iop at ->write_begin() and mark the required dirty > > > > bit ranges in ->write_end(). Like how we do in this patch series. > > > > (Ofcourse it is true only for bs < ps use case). > > > > > > > > [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/ > > > > > > You absolutely can allocate it in iomap_write_begin, but you can avoid > > > allocating it until writeback time if (pos, len) entirely overlap the > > > folio. ie: > > > > > > if (pos > folio_pos(folio) || > > > pos + len < folio_pos(folio) + folio_size(folio)) > > > iop = iomap_page_create(iter->inode, folio, iter->flags, false); > > > > Thanks for the suggestion. However do you think it will be better if this is > > introduced along with lazy allocation changes which Christoph was mentioning > > about? > > Why I am thinking that is because, with above approach we delay the allocation > > of iop until writeback, for entire folio overlap case. But then later > > in __iomap_write_begin(), we require iop if folio is not uptodate. > > Hence we again will have to do some checks to see if the iop is not allocated > > then allocate it (which is for entire folio overlap case). > > That somehow looked like an overkill for a very little gain in the context of > > this patch series. Kindly let me know your thoughts on this. > > Look at *why* __iomap_write_begin() allocates an iop. It's to read in the > blocks which are going to be partially-overwritten by the write. If the > write overlaps the entire folio, there are no parts which need to be read > in, and we can simply return. Yes that make sense. > Maybe we should make that more obvious: Yes, I think this maybe required. Because otherwise we might end up using uninitialized iop. generic/156 (funshare), can easily trigger that. Will spend sometime on the unshare path of iomap. > > if (folio_test_uptodate(folio)) > return 0; > if (pos <= folio_pos(folio) && > pos + len >= folio_pos(folio) + folio_size(folio)) > return 0; > folio_clear_error(folio); > > (I think pos must always be >= folio_pos(), so that <= could be ==, but > it doesn't hurt anything to use <=) Thanks for sharing this. -ritesh
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 356193e44cf0..e9c85fcf7a1f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -527,7 +527,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, size_t len, struct folio *folio) { const struct iomap *srcmap = iomap_iter_srcmap(iter); - struct iomap_page *iop; + struct iomap_page *iop = iomap_page_create(iter->inode, folio, + iter->flags); loff_t block_size = i_blocksize(iter->inode); loff_t block_start = round_down(pos, block_size); loff_t block_end = round_up(pos + len, block_size); @@ -539,7 +540,6 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, return 0; folio_clear_error(folio); - iop = iomap_page_create(iter->inode, folio, iter->flags); if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) return -EAGAIN;
Before this commit[1], we used to call iomap_page_create() before checking folio_test_uptodate() in __iomap_write_begin(). The problem is that commit[1] moved iop creation later i.e. after checking for whether the folio is uptodate. And if the folio is uptodate, it simply returns and doesn't allocate a iop. Now what can happen is that during __iomap_write_begin() for bs < ps, there can be a folio which is marked uptodate but does not have a iomap_page structure allocated. (I think one of the reason it can happen is due to memory pressure, we can end up freeing folio->private resource). Thus the iop structure will only gets allocated at the time of writeback in iomap_writepage_map(). This I think, was a not problem till now since we anyway only track uptodate status in iop (no support of tracking dirty bitmap status which later patches will add), and we also end up setting all the bits in iomap_page_create(), if the page is uptodate. [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/ Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/iomap/buffered-io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)