diff mbox series

[1/2] iomap: Do not unshare exents beyond EOF

Message ID 20240905102425.1106040-1-sunjunchao2870@gmail.com (mailing list archive)
State New
Headers show
Series [1/2] iomap: Do not unshare exents beyond EOF | expand

Commit Message

Julian Sun Sept. 5, 2024, 10:24 a.m. UTC
Attempting to unshare extents beyond EOF will trigger
the need zeroing case, which in turn triggers a warning.
Therefore, let's skip the unshare process if extents are
beyond EOF.

Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
Inspired-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
---
 fs/iomap/buffered-io.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Brian Foster Sept. 6, 2024, 7:11 p.m. UTC | #1
On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> Attempting to unshare extents beyond EOF will trigger
> the need zeroing case, which in turn triggers a warning.
> Therefore, let's skip the unshare process if extents are
> beyond EOF.
> 
> Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> Inspired-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> ---
>  fs/iomap/buffered-io.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f420c53d86ac..8898d5ec606f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>  	/* don't bother with holes or unwritten extents */
>  	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
>  		return length;
> +	/* don't try to unshare any extents beyond EOF. */
> +	if (pos > i_size_read(iter->inode))
> +		return length;

Hi Julian,

What about if pos starts within EOF and the operation extends beyond it?
I ask because I think I've reproduced this scenario, though it is a bit
tricky and has dependencies...

For one, it seems to depend on the cowblocks patch I recently posted
here [1] (though I don't think this is necessarily a problem with the
patch, it just depends on keeping COW fork blocks around after the
unshare). With that, I reproduce via fsx with unshare range support [2]
using the ops file appended below [3] on a -bsize=1k XFS fs.

I haven't quite characterized the full sequence other than it looks like
the unshare walks across EOF with a shared data fork block and COW fork
delalloc, presumably finds the post-eof part of the folio !uptodate (so
iomap_adjust_read_range() doesn't skip it), and then trips over the
warning and error return associated with the folio zeroing in
__iomap_write_begin().

This all kind of has me wondering.. do we know the purpose of this
warning/error in the first place? It seems like it's more of an
"unexpected situation" than a specific problem. E.g., assuming the same
page were mmap'd, I _think_ the read fault path would do the same sort
of zeroing such that the unshare would see a fully uptodate folio and
carry on as normal. I added the mapread op to the opsfile below to give
that a quick test (remove the "skip" text to enable it), and it seems to
prevent the error, but I've not confirmed whether that theory is
actually what's happening.

FWIW, I also wonder if another way to handle this would be to just
restrict the range of iomap_file_unshare() to within EOF. IOW if a
caller passes a range beyond EOF, just process whatever part of the
range falls within EOF. It seems iomap isn't responsible for the file
extending aspect of the fallocate unshare command anyways.

Thoughts?

Brian

[1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@redhat.com/
[2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@redhat.com/
[3] fsx ops file:

fallocate 0x3bc00 0x400 0
write 0x3bc00 0x800 0x3c000
clone_range 0x3bc00 0x400 0x0 0x3c400
skip mapread 0x3c000 0x400 0x3c400
fallocate 0x3bc00 0xc00 0x3c400 unshare
Julian Sun Sept. 9, 2024, 12:15 p.m. UTC | #2
Hi Brian,

Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
>
> On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> > Attempting to unshare extents beyond EOF will trigger
> > the need zeroing case, which in turn triggers a warning.
> > Therefore, let's skip the unshare process if extents are
> > beyond EOF.
> >
> > Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> > Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> > Inspired-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > ---
> >  fs/iomap/buffered-io.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index f420c53d86ac..8898d5ec606f 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> >       /* don't bother with holes or unwritten extents */
> >       if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> >               return length;
> > +     /* don't try to unshare any extents beyond EOF. */
> > +     if (pos > i_size_read(iter->inode))
> > +             return length;
>
> Hi Julian,
>
>
> > What about if pos starts within EOF and the operation extends beyond it?

Extents within EOF will be unshared as usual. Details are below.
>
> > I ask because I think I've reproduced this scenario, though it is a bit
> > tricky and has dependencies...
> >
> > For one, it seems to depend on the cowblocks patch I recently posted
> > here [1] (though I don't think this is necessarily a problem with the
> > patch, it just depends on keeping COW fork blocks around after the
> > unshare). With that, I reproduce via fsx with unshare range support [2]
> > using the ops file appended below [3] on a -bsize=1k XFS fs.
> >
> > I haven't quite characterized the full sequence other than it looks like
> > the unshare walks across EOF with a shared data fork block and COW fork
> > delalloc, presumably finds the post-eof part of the folio !uptodate (so
> > iomap_adjust_read_range() doesn't skip it), and then trips over the
> > warning and error return associated with the folio zeroing in
> > __iomap_write_begin().

The scenario has already been reproduced by syzbot[1]. The reproducer
provided by syzbot constructed the following extent maps for a file of
size 0xE00 before fallocate unshare:

0 - 4k: shared between two files
4k - 6k: hole beyond EOF, not shared
6k - 8k: delalloc extends

Then the reproducer attempted to unshare the extent between 0 and
0x2000 bytes, but the file size is 0xE00. This is likely the scenario
you were referring to?

Eventually the unshare code does:
first map: 0 - 4k - unshare successfully.
second map: 4k - 6k - hole, skip. Beyond EOF.
third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
Fires warnings because UNSHARE.

During the first call to iomap_unshare_iter(), iomap_length() returned
4k, so 4k bytes were unshared.
See discuss here[2] for more details.
>
> This all kind of has me wondering.. do we know the purpose of this
> warning/error in the first place? It seems like it's more of an
> "unexpected situation" than a specific problem. E.g., assuming the same
> page were mmap'd, I _think_ the read fault path would do the same sort
> of zeroing such that the unshare would see a fully uptodate folio and
> carry on as normal. I added the mapread op to the opsfile below to give
> that a quick test (remove the "skip" text to enable it), and it seems to
> prevent the error, but I've not confirmed whether that theory is
> actually what's happening.
>
>
> > FWIW, I also wonder if another way to handle this would be to just
> > restrict the range of iomap_file_unshare() to within EOF. IOW if a
> > caller passes a range beyond EOF, just process whatever part of the
> > range falls within EOF. It seems iomap isn't responsible for the file
> > extending aspect of the fallocate unshare command anyways.

It already does 'just process whatever part of the range falls within EOF'.
Check the above case.

I'm not sure if I fully understand what you mean. This patch does not
prevent unsharing extents within the EOF. This patch checks if pos is
beyond EOF, instead of checking if pos + length is beyond EOF. So the
extents within EOF should be unshared as usual.

BTW, maybe the check here should be
                  if (pos >= i_size_read(iter->inode))

If there is any misunderstanding, please let me know, thanks.

[1]: https://lore.kernel.org/all/0000000000008964f1061f8c32b6@google.com/T/
[2]: https://lore.kernel.org/all/20240903054808.126799-1-sunjunchao2870@gmail.com/

>
> Thoughts?
>
> Brian
>
> [1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@redhat.com/
> [2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@redhat.com/
> [3] fsx ops file:
>
> fallocate 0x3bc00 0x400 0
> write 0x3bc00 0x800 0x3c000
> clone_range 0x3bc00 0x400 0x0 0x3c400
> skip mapread 0x3c000 0x400 0x3c400
> fallocate 0x3bc00 0xc00 0x3c400 unshare
>


Thanks,
Brian Foster Sept. 9, 2024, 1:28 p.m. UTC | #3
On Mon, Sep 09, 2024 at 08:15:43PM +0800, Julian Sun wrote:
> Hi Brian,
> 
> Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
> >
> > On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> > > Attempting to unshare extents beyond EOF will trigger
> > > the need zeroing case, which in turn triggers a warning.
> > > Therefore, let's skip the unshare process if extents are
> > > beyond EOF.
> > >
> > > Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > > Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> > > Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> > > Inspired-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > > ---
> > >  fs/iomap/buffered-io.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index f420c53d86ac..8898d5ec606f 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> > >       /* don't bother with holes or unwritten extents */
> > >       if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> > >               return length;
> > > +     /* don't try to unshare any extents beyond EOF. */
> > > +     if (pos > i_size_read(iter->inode))
> > > +             return length;
> >
> > Hi Julian,
> >
> >
> > > What about if pos starts within EOF and the operation extends beyond it?
> 
> Extents within EOF will be unshared as usual. Details are below.
> >
> > > I ask because I think I've reproduced this scenario, though it is a bit
> > > tricky and has dependencies...
> > >
> > > For one, it seems to depend on the cowblocks patch I recently posted
> > > here [1] (though I don't think this is necessarily a problem with the
> > > patch, it just depends on keeping COW fork blocks around after the
> > > unshare). With that, I reproduce via fsx with unshare range support [2]
> > > using the ops file appended below [3] on a -bsize=1k XFS fs.
> > >
> > > I haven't quite characterized the full sequence other than it looks like
> > > the unshare walks across EOF with a shared data fork block and COW fork
> > > delalloc, presumably finds the post-eof part of the folio !uptodate (so
> > > iomap_adjust_read_range() doesn't skip it), and then trips over the
> > > warning and error return associated with the folio zeroing in
> > > __iomap_write_begin().
> 
> The scenario has already been reproduced by syzbot[1]. The reproducer
> provided by syzbot constructed the following extent maps for a file of
> size 0xE00 before fallocate unshare:
> 
> 0 - 4k: shared between two files
> 4k - 6k: hole beyond EOF, not shared
> 6k - 8k: delalloc extends
> 
> Then the reproducer attempted to unshare the extent between 0 and
> 0x2000 bytes, but the file size is 0xE00. This is likely the scenario
> you were referring to?
> 

Yes, sort of..

> Eventually the unshare code does:
> first map: 0 - 4k - unshare successfully.
> second map: 4k - 6k - hole, skip. Beyond EOF.
> third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
> Fires warnings because UNSHARE.
> 
> During the first call to iomap_unshare_iter(), iomap_length() returned
> 4k, so 4k bytes were unshared.
> See discuss here[2] for more details.
> >
> > This all kind of has me wondering.. do we know the purpose of this
> > warning/error in the first place? It seems like it's more of an
> > "unexpected situation" than a specific problem. E.g., assuming the same
> > page were mmap'd, I _think_ the read fault path would do the same sort
> > of zeroing such that the unshare would see a fully uptodate folio and
> > carry on as normal. I added the mapread op to the opsfile below to give
> > that a quick test (remove the "skip" text to enable it), and it seems to
> > prevent the error, but I've not confirmed whether that theory is
> > actually what's happening.
> >
> >
> > > FWIW, I also wonder if another way to handle this would be to just
> > > restrict the range of iomap_file_unshare() to within EOF. IOW if a
> > > caller passes a range beyond EOF, just process whatever part of the
> > > range falls within EOF. It seems iomap isn't responsible for the file
> > > extending aspect of the fallocate unshare command anyways.
> 
> It already does 'just process whatever part of the range falls within EOF'.
> Check the above case.
> 
> I'm not sure if I fully understand what you mean. This patch does not
> prevent unsharing extents within the EOF. This patch checks if pos is
> beyond EOF, instead of checking if pos + length is beyond EOF. So the
> extents within EOF should be unshared as usual.
> 

I'm not concerned about preventing unsharing. I'm concerned that this
patch doesn't always prevent attempts to unshare post-eof ranges. I
think the difference here is that in the variant I was hitting, we end
up with a mapping that starts within EOF and ends beyond EOF, whereas
the syzbot variant produces a scenario where the problematic mapping
always starts beyond EOF.

So IOW, this patch works for the syzbot variant because it happens to
reproduce a situation where pos will be beyond EOF, but that is an
assumption that might not always be true. The fsx generated variant runs
a sequence that produces a mapping that spans across EOF, which means
that pos is within EOF at the start of unshare, so unshare proceeds to
walk across the EOF boundary, the corresponding EOF folio is not fully
uptodate, and thus write begin wants to do partial zeroing and
fails/warns.

I suspect that if the higher level range were trimmed to be within EOF
in iomap_file_unshare(), that would prevent this problem in either case.
Note that this was on a -bsize=1k fs, so what I'm not totally sure about
is whether skipping zeroing as such would be a problem with larger FSBs.
My initial thinking was this might not be possible since the EOF folio
should be fully uptodate in that case, but that probably requires some
thought/review/testing.

Brian

> BTW, maybe the check here should be
>                   if (pos >= i_size_read(iter->inode))
> 
> If there is any misunderstanding, please let me know, thanks.
> 
> [1]: https://lore.kernel.org/all/0000000000008964f1061f8c32b6@google.com/T/
> [2]: https://lore.kernel.org/all/20240903054808.126799-1-sunjunchao2870@gmail.com/
> 
> >
> > Thoughts?
> >
> > Brian
> >
> > [1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@redhat.com/
> > [2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@redhat.com/
> > [3] fsx ops file:
> >
> > fallocate 0x3bc00 0x400 0
> > write 0x3bc00 0x800 0x3c000
> > clone_range 0x3bc00 0x400 0x0 0x3c400
> > skip mapread 0x3c000 0x400 0x3c400
> > fallocate 0x3bc00 0xc00 0x3c400 unshare
> >
> 
> 
> Thanks,
> -- 
> Julian Sun <sunjunchao2870@gmail.com>
>
Julian Sun Sept. 9, 2024, 5:40 p.m. UTC | #4
Brian Foster <bfoster@redhat.com> 于2024年9月9日周一 21:27写道:
>
> On Mon, Sep 09, 2024 at 08:15:43PM +0800, Julian Sun wrote:
> > Hi Brian,
> >
> > Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
> > >
> > > On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> > > > Attempting to unshare extents beyond EOF will trigger
> > > > the need zeroing case, which in turn triggers a warning.
> > > > Therefore, let's skip the unshare process if extents are
> > > > beyond EOF.
> > > >
> > > > Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > > > Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> > > > Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> > > > Inspired-by: Dave Chinner <david@fromorbit.com>
> > > > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > > > ---
> > > >  fs/iomap/buffered-io.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > index f420c53d86ac..8898d5ec606f 100644
> > > > --- a/fs/iomap/buffered-io.c
> > > > +++ b/fs/iomap/buffered-io.c
> > > > @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> > > >       /* don't bother with holes or unwritten extents */
> > > >       if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> > > >               return length;
> > > > +     /* don't try to unshare any extents beyond EOF. */
> > > > +     if (pos > i_size_read(iter->inode))
> > > > +             return length;
> > >
> > > Hi Julian,
> > >
> > >
> > > > What about if pos starts within EOF and the operation extends beyond it?
> >
> > Extents within EOF will be unshared as usual. Details are below.
> > >
> > > > I ask because I think I've reproduced this scenario, though it is a bit
> > > > tricky and has dependencies...
> > > >
> > > > For one, it seems to depend on the cowblocks patch I recently posted
> > > > here [1] (though I don't think this is necessarily a problem with the
> > > > patch, it just depends on keeping COW fork blocks around after the
> > > > unshare). With that, I reproduce via fsx with unshare range support [2]
> > > > using the ops file appended below [3] on a -bsize=1k XFS fs.
> > > >
> > > > I haven't quite characterized the full sequence other than it looks like
> > > > the unshare walks across EOF with a shared data fork block and COW fork
> > > > delalloc, presumably finds the post-eof part of the folio !uptodate (so
> > > > iomap_adjust_read_range() doesn't skip it), and then trips over the
> > > > warning and error return associated with the folio zeroing in
> > > > __iomap_write_begin().
> >
> > The scenario has already been reproduced by syzbot[1]. The reproducer
> > provided by syzbot constructed the following extent maps for a file of
> > size 0xE00 before fallocate unshare:
> >
> > 0 - 4k: shared between two files
> > 4k - 6k: hole beyond EOF, not shared
> > 6k - 8k: delalloc extends
> >
> > Then the reproducer attempted to unshare the extent between 0 and
> > 0x2000 bytes, but the file size is 0xE00. This is likely the scenario
> > you were referring to?
> >
>
> Yes, sort of..
>
> > Eventually the unshare code does:
> > first map: 0 - 4k - unshare successfully.
> > second map: 4k - 6k - hole, skip. Beyond EOF.
> > third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
> > Fires warnings because UNSHARE.
> >
> > During the first call to iomap_unshare_iter(), iomap_length() returned
> > 4k, so 4k bytes were unshared.
> > See discuss here[2] for more details.
> > >
> > > This all kind of has me wondering.. do we know the purpose of this
> > > warning/error in the first place? It seems like it's more of an
> > > "unexpected situation" than a specific problem. E.g., assuming the same
> > > page were mmap'd, I _think_ the read fault path would do the same sort
> > > of zeroing such that the unshare would see a fully uptodate folio and
> > > carry on as normal. I added the mapread op to the opsfile below to give
> > > that a quick test (remove the "skip" text to enable it), and it seems to
> > > prevent the error, but I've not confirmed whether that theory is
> > > actually what's happening.
> > >
> > >
> > > > FWIW, I also wonder if another way to handle this would be to just
> > > > restrict the range of iomap_file_unshare() to within EOF. IOW if a
> > > > caller passes a range beyond EOF, just process whatever part of the
> > > > range falls within EOF. It seems iomap isn't responsible for the file
> > > > extending aspect of the fallocate unshare command anyways.
> >
> > It already does 'just process whatever part of the range falls within EOF'.
> > Check the above case.
> >
> > I'm not sure if I fully understand what you mean. This patch does not
> > prevent unsharing extents within the EOF. This patch checks if pos is
> > beyond EOF, instead of checking if pos + length is beyond EOF. So the
> > extents within EOF should be unshared as usual.
> >
>
> I'm not concerned about preventing unsharing. I'm concerned that this
> patch doesn't always prevent attempts to unshare post-eof ranges. I
> think the difference here is that in the variant I was hitting, we end
>
> > up with a mapping that starts within EOF and ends beyond EOF, whereas
> > the syzbot variant produces a scenario where the problematic mapping
> > always starts beyond EOF.

This is not true. In the case above, the syzbot did indeed unshare the
extents between 0-4k, which were started within EOF and ended beyond
EOF. The specific variants here are: pos:0 len:0x1000 EOF: 0xE00. And
the unshare code successfully unshared extents between 0 and 4k.

During the next loop in iomap_file_unshare(), the pos became 0x1000,
which is beyond EOF.  What this patch does is to skip the unshare
during the second loop.
Is there anything I misunderstand?
>
> So IOW, this patch works for the syzbot variant because it happens to
> reproduce a situation where pos will be beyond EOF, but that is an
>
> > assumption that might not always be true. The fsx generated variant runs
> > a sequence that produces a mapping that spans across EOF, which means
> > that pos is within EOF at the start of unshare, so unshare proceeds to
> > walk across the EOF boundary, the corresponding EOF folio is not fully
> > uptodate, and thus write begin wants to do partial zeroing and
> > fails/warns.

Yeah, it's exactly what the syzbot does.
>
> I suspect that if the higher level range were trimmed to be within EOF
> in iomap_file_unshare(), that would prevent this problem in either case.
> Note that this was on a -bsize=1k fs, so what I'm not totally sure about
> is whether skipping zeroing as such would be a problem with larger FSBs.
> My initial thinking was this might not be possible since the EOF folio
> should be fully uptodate in that case, but that probably requires some
> thought/review/testing.
>
> Brian
>
> > BTW, maybe the check here should be
> >                   if (pos >= i_size_read(iter->inode))
> >
> > If there is any misunderstanding, please let me know, thanks.
> >
> > [1]: https://lore.kernel.org/all/0000000000008964f1061f8c32b6@google.com/T/
> > [2]: https://lore.kernel.org/all/20240903054808.126799-1-sunjunchao2870@gmail.com/
> >
> > >
> > > Thoughts?
> > >
> > > Brian
> > >
> > > [1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@redhat.com/
> > > [2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@redhat.com/
> > > [3] fsx ops file:
> > >
> > > fallocate 0x3bc00 0x400 0
> > > write 0x3bc00 0x800 0x3c000
> > > clone_range 0x3bc00 0x400 0x0 0x3c400
> > > skip mapread 0x3c000 0x400 0x3c400
> > > fallocate 0x3bc00 0xc00 0x3c400 unshare
> > >
> >
> >
> > Thanks,
> > --
> > Julian Sun <sunjunchao2870@gmail.com>
> >
>

Thanks,
Brian Foster Sept. 9, 2024, 7:29 p.m. UTC | #5
On Tue, Sep 10, 2024 at 01:40:24AM +0800, Julian Sun wrote:
> Brian Foster <bfoster@redhat.com> 于2024年9月9日周一 21:27写道:
> >
> > On Mon, Sep 09, 2024 at 08:15:43PM +0800, Julian Sun wrote:
> > > Hi Brian,
> > >
> > > Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
> > > >
> > > > On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> > > > > Attempting to unshare extents beyond EOF will trigger
> > > > > the need zeroing case, which in turn triggers a warning.
> > > > > Therefore, let's skip the unshare process if extents are
> > > > > beyond EOF.
> > > > >
> > > > > Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > > > > Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> > > > > Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> > > > > Inspired-by: Dave Chinner <david@fromorbit.com>
> > > > > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > > > > ---
> > > > >  fs/iomap/buffered-io.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > > index f420c53d86ac..8898d5ec606f 100644
> > > > > --- a/fs/iomap/buffered-io.c
> > > > > +++ b/fs/iomap/buffered-io.c
> > > > > @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> > > > >       /* don't bother with holes or unwritten extents */
> > > > >       if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> > > > >               return length;
> > > > > +     /* don't try to unshare any extents beyond EOF. */
> > > > > +     if (pos > i_size_read(iter->inode))
> > > > > +             return length;
> > > >
> > > > Hi Julian,
> > > >
> > > >
> > > > > What about if pos starts within EOF and the operation extends beyond it?
> > >
> > > Extents within EOF will be unshared as usual. Details are below.
> > > >
> > > > > I ask because I think I've reproduced this scenario, though it is a bit
> > > > > tricky and has dependencies...
> > > > >
> > > > > For one, it seems to depend on the cowblocks patch I recently posted
> > > > > here [1] (though I don't think this is necessarily a problem with the
> > > > > patch, it just depends on keeping COW fork blocks around after the
> > > > > unshare). With that, I reproduce via fsx with unshare range support [2]
> > > > > using the ops file appended below [3] on a -bsize=1k XFS fs.
> > > > >
> > > > > I haven't quite characterized the full sequence other than it looks like
> > > > > the unshare walks across EOF with a shared data fork block and COW fork
> > > > > delalloc, presumably finds the post-eof part of the folio !uptodate (so
> > > > > iomap_adjust_read_range() doesn't skip it), and then trips over the
> > > > > warning and error return associated with the folio zeroing in
> > > > > __iomap_write_begin().
> > >
> > > The scenario has already been reproduced by syzbot[1]. The reproducer
> > > provided by syzbot constructed the following extent maps for a file of
> > > size 0xE00 before fallocate unshare:
> > >
> > > 0 - 4k: shared between two files
> > > 4k - 6k: hole beyond EOF, not shared
> > > 6k - 8k: delalloc extends
> > >
> > > Then the reproducer attempted to unshare the extent between 0 and
> > > 0x2000 bytes, but the file size is 0xE00. This is likely the scenario
> > > you were referring to?
> > >
> >
> > Yes, sort of..
> >
> > > Eventually the unshare code does:
> > > first map: 0 - 4k - unshare successfully.
> > > second map: 4k - 6k - hole, skip. Beyond EOF.
> > > third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
> > > Fires warnings because UNSHARE.
> > >
> > > During the first call to iomap_unshare_iter(), iomap_length() returned
> > > 4k, so 4k bytes were unshared.
> > > See discuss here[2] for more details.
> > > >
> > > > This all kind of has me wondering.. do we know the purpose of this
> > > > warning/error in the first place? It seems like it's more of an
> > > > "unexpected situation" than a specific problem. E.g., assuming the same
> > > > page were mmap'd, I _think_ the read fault path would do the same sort
> > > > of zeroing such that the unshare would see a fully uptodate folio and
> > > > carry on as normal. I added the mapread op to the opsfile below to give
> > > > that a quick test (remove the "skip" text to enable it), and it seems to
> > > > prevent the error, but I've not confirmed whether that theory is
> > > > actually what's happening.
> > > >
> > > >
> > > > > FWIW, I also wonder if another way to handle this would be to just
> > > > > restrict the range of iomap_file_unshare() to within EOF. IOW if a
> > > > > caller passes a range beyond EOF, just process whatever part of the
> > > > > range falls within EOF. It seems iomap isn't responsible for the file
> > > > > extending aspect of the fallocate unshare command anyways.
> > >
> > > It already does 'just process whatever part of the range falls within EOF'.
> > > Check the above case.
> > >
> > > I'm not sure if I fully understand what you mean. This patch does not
> > > prevent unsharing extents within the EOF. This patch checks if pos is
> > > beyond EOF, instead of checking if pos + length is beyond EOF. So the
> > > extents within EOF should be unshared as usual.
> > >
> >
> > I'm not concerned about preventing unsharing. I'm concerned that this
> > patch doesn't always prevent attempts to unshare post-eof ranges. I
> > think the difference here is that in the variant I was hitting, we end
> >
> > > up with a mapping that starts within EOF and ends beyond EOF, whereas
> > > the syzbot variant produces a scenario where the problematic mapping
> > > always starts beyond EOF.
> 
> This is not true. In the case above, the syzbot did indeed unshare the
> extents between 0-4k, which were started within EOF and ended beyond
> EOF. The specific variants here are: pos:0 len:0x1000 EOF: 0xE00. And
> the unshare code successfully unshared extents between 0 and 4k.
> 
> During the next loop in iomap_file_unshare(), the pos became 0x1000,
> which is beyond EOF.  What this patch does is to skip the unshare
> during the second loop.
> Is there anything I misunderstand?

Hmm, what block size? Does the associated mapping have at least one full
block beyond EOF? If you have a map at offset 0, length 0x1000 and EOF
at 0xE00, then unless you have 512b blocks it sounds like the EOF block
actually starts within EOF.

The variant I'm seeing is more like this.. consider a -bsize=1k fs, a
file size of 0x3c400, and an EOF mapping of (offset 0x3c000, length
0x4000). The EOF folio in this case is 4k in size and starts at the same
0x3c000 offset as the EOF mapping.

So with 1k blocks, the EOF mapping starts one block before EOF and
extends well beyond it. What happens in the test case is that
iomap_unshare_iter() is called with the EOF folio, pos 0x3c000, length
0x800, and where the block at offset 0x3c400 is not marked uptodate. pos
is thus within EOF, but the while loop in __iomap_write_begin() walks
past it and attempts to process one block beyond EOF.

Brian

> >
> > So IOW, this patch works for the syzbot variant because it happens to
> > reproduce a situation where pos will be beyond EOF, but that is an
> >
> > > assumption that might not always be true. The fsx generated variant runs
> > > a sequence that produces a mapping that spans across EOF, which means
> > > that pos is within EOF at the start of unshare, so unshare proceeds to
> > > walk across the EOF boundary, the corresponding EOF folio is not fully
> > > uptodate, and thus write begin wants to do partial zeroing and
> > > fails/warns.
> 
> Yeah, it's exactly what the syzbot does.
> >
> > I suspect that if the higher level range were trimmed to be within EOF
> > in iomap_file_unshare(), that would prevent this problem in either case.
> > Note that this was on a -bsize=1k fs, so what I'm not totally sure about
> > is whether skipping zeroing as such would be a problem with larger FSBs.
> > My initial thinking was this might not be possible since the EOF folio
> > should be fully uptodate in that case, but that probably requires some
> > thought/review/testing.
> >
> > Brian
> >
> > > BTW, maybe the check here should be
> > >                   if (pos >= i_size_read(iter->inode))
> > >
> > > If there is any misunderstanding, please let me know, thanks.
> > >
> > > [1]: https://lore.kernel.org/all/0000000000008964f1061f8c32b6@google.com/T/
> > > [2]: https://lore.kernel.org/all/20240903054808.126799-1-sunjunchao2870@gmail.com/
> > >
> > > >
> > > > Thoughts?
> > > >
> > > > Brian
> > > >
> > > > [1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@redhat.com/
> > > > [2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@redhat.com/
> > > > [3] fsx ops file:
> > > >
> > > > fallocate 0x3bc00 0x400 0
> > > > write 0x3bc00 0x800 0x3c000
> > > > clone_range 0x3bc00 0x400 0x0 0x3c400
> > > > skip mapread 0x3c000 0x400 0x3c400
> > > > fallocate 0x3bc00 0xc00 0x3c400 unshare
> > > >
> > >
> > >
> > > Thanks,
> > > --
> > > Julian Sun <sunjunchao2870@gmail.com>
> > >
> >
> 
> Thanks,
> -- 
> Julian Sun <sunjunchao2870@gmail.com>
>
Dave Chinner Sept. 10, 2024, 12:44 a.m. UTC | #6
On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> Attempting to unshare extents beyond EOF will trigger
> the need zeroing case, which in turn triggers a warning.
> Therefore, let's skip the unshare process if extents are
> beyond EOF.
> 
> Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> Inspired-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> ---
>  fs/iomap/buffered-io.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f420c53d86ac..8898d5ec606f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>  	/* don't bother with holes or unwritten extents */
>  	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
>  		return length;
> +	/* don't try to unshare any extents beyond EOF. */
> +	if (pos > i_size_read(iter->inode))
> +		return length;
>  
>  	do {
>  		struct folio *folio;

iomap isn't the place to do this. The high level fallocate code in
the filesystem should be trimming unshare length to EOF long before
we get anywhere near the iomap layer.

-Dave.
Julian Sun Sept. 10, 2024, 7:03 a.m. UTC | #7
On Mon, 2024-09-09 at 15:29 -0400, Brian Foster wrote:
> On Tue, Sep 10, 2024 at 01:40:24AM +0800, Julian Sun wrote:
> > Brian Foster <bfoster@redhat.com> 于2024年9月9日周一 21:27写道:
> > > 
> > > On Mon, Sep 09, 2024 at 08:15:43PM +0800, Julian Sun wrote:
> > > > Hi Brian,
> > > > 
> > > > Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
> > > > > 
> > > > > On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> > > > > > Attempting to unshare extents beyond EOF will trigger
> > > > > > the need zeroing case, which in turn triggers a warning.
> > > > > > Therefore, let's skip the unshare process if extents are
> > > > > > beyond EOF.
> > > > > > 
> > > > > > Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > > > > > Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> > > > > > Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> > > > > > Inspired-by: Dave Chinner <david@fromorbit.com>
> > > > > > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > > > > > ---
> > > > > >  fs/iomap/buffered-io.c | 3 +++
> > > > > >  1 file changed, 3 insertions(+)
> > > > > > 
> > > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > > > index f420c53d86ac..8898d5ec606f 100644
> > > > > > --- a/fs/iomap/buffered-io.c
> > > > > > +++ b/fs/iomap/buffered-io.c
> > > > > > @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> > > > > >       /* don't bother with holes or unwritten extents */
> > > > > >       if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> > > > > >               return length;
> > > > > > +     /* don't try to unshare any extents beyond EOF. */
> > > > > > +     if (pos > i_size_read(iter->inode))
> > > > > > +             return length;
> > > > > 
> > > > > Hi Julian,
> > > > > 
> > > > > 
> > > > > > What about if pos starts within EOF and the operation extends beyond it?
> > > > 
> > > > Extents within EOF will be unshared as usual. Details are below.
> > > > > 
> > > > > > I ask because I think I've reproduced this scenario, though it is a bit
> > > > > > tricky and has dependencies...
> > > > > > 
> > > > > > For one, it seems to depend on the cowblocks patch I recently posted
> > > > > > here [1] (though I don't think this is necessarily a problem with the
> > > > > > patch, it just depends on keeping COW fork blocks around after the
> > > > > > unshare). With that, I reproduce via fsx with unshare range support [2]
> > > > > > using the ops file appended below [3] on a -bsize=1k XFS fs.
> > > > > > 
> > > > > > I haven't quite characterized the full sequence other than it looks like
> > > > > > the unshare walks across EOF with a shared data fork block and COW fork
> > > > > > delalloc, presumably finds the post-eof part of the folio !uptodate (so
> > > > > > iomap_adjust_read_range() doesn't skip it), and then trips over the
> > > > > > warning and error return associated with the folio zeroing in
> > > > > > __iomap_write_begin().
> > > > 
> > > > The scenario has already been reproduced by syzbot[1]. The reproducer
> > > > provided by syzbot constructed the following extent maps for a file of
> > > > size 0xE00 before fallocate unshare:
> > > > 
> > > > 0 - 4k: shared between two files
> > > > 4k - 6k: hole beyond EOF, not shared
> > > > 6k - 8k: delalloc extends
> > > > 
> > > > Then the reproducer attempted to unshare the extent between 0 and
> > > > 0x2000 bytes, but the file size is 0xE00. This is likely the scenario
> > > > you were referring to?
> > > > 
> > > 
> > > Yes, sort of..
> > > 
> > > > Eventually the unshare code does:
> > > > first map: 0 - 4k - unshare successfully.
> > > > second map: 4k - 6k - hole, skip. Beyond EOF.
> > > > third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
> > > > Fires warnings because UNSHARE.
> > > > 
> > > > During the first call to iomap_unshare_iter(), iomap_length() returned
> > > > 4k, so 4k bytes were unshared.
> > > > See discuss here[2] for more details.
> > > > > 
> > > > > This all kind of has me wondering.. do we know the purpose of this
> > > > > warning/error in the first place? It seems like it's more of an
> > > > > "unexpected situation" than a specific problem. E.g., assuming the same
> > > > > page were mmap'd, I _think_ the read fault path would do the same sort
> > > > > of zeroing such that the unshare would see a fully uptodate folio and
> > > > > carry on as normal. I added the mapread op to the opsfile below to give
> > > > > that a quick test (remove the "skip" text to enable it), and it seems to
> > > > > prevent the error, but I've not confirmed whether that theory is
> > > > > actually what's happening.
> > > > > 
> > > > > 
> > > > > > FWIW, I also wonder if another way to handle this would be to just
> > > > > > restrict the range of iomap_file_unshare() to within EOF. IOW if a
> > > > > > caller passes a range beyond EOF, just process whatever part of the
> > > > > > range falls within EOF. It seems iomap isn't responsible for the file
> > > > > > extending aspect of the fallocate unshare command anyways.
> > > > 
> > > > It already does 'just process whatever part of the range falls within EOF'.
> > > > Check the above case.
> > > > 
> > > > I'm not sure if I fully understand what you mean. This patch does not
> > > > prevent unsharing extents within the EOF. This patch checks if pos is
> > > > beyond EOF, instead of checking if pos + length is beyond EOF. So the
> > > > extents within EOF should be unshared as usual.
> > > > 
> > > 
> > > I'm not concerned about preventing unsharing. I'm concerned that this
> > > patch doesn't always prevent attempts to unshare post-eof ranges. I
> > > think the difference here is that in the variant I was hitting, we end
> > > 
> > > > up with a mapping that starts within EOF and ends beyond EOF, whereas
> > > > the syzbot variant produces a scenario where the problematic mapping
> > > > always starts beyond EOF.
> > 
> > This is not true. In the case above, the syzbot did indeed unshare the
> > extents between 0-4k, which were started within EOF and ended beyond
> > EOF. The specific variants here are: pos:0 len:0x1000 EOF: 0xE00. And
> > the unshare code successfully unshared extents between 0 and 4k.
> > 
> > During the next loop in iomap_file_unshare(), the pos became 0x1000,
> > which is beyond EOF.  What this patch does is to skip the unshare
> > during the second loop.
> > Is there anything I misunderstand?
> 
> Hmm, what block size? Does the associated mapping have at least one full
> block beyond EOF? If you have a map at offset 0, length 0x1000 and EOF
> at 0xE00, then unless you have 512b blocks it sounds like the EOF block
> actually starts within EOF.

The block size here is 2k, and there isn't a full block beyond EOF within 
this extent map.
> 
> The variant I'm seeing is more like this.. consider a -bsize=1k fs, a
> file size of 0x3c400, and an EOF mapping of (offset 0x3c000, length
> 0x4000). The EOF folio in this case is 4k in size and starts at the same
> 0x3c000 offset as the EOF mapping.
> 
> So with 1k blocks, the EOF mapping starts one block before EOF and
> extends well beyond it. What happens in the test case is that
> iomap_unshare_iter() is called with the EOF folio, pos 0x3c000, length
> 0x800, and where the block at offset 0x3c400 is not marked uptodate. pos
> is thus within EOF, but the while loop in __iomap_write_begin() walks
> past it and attempts to process one block beyond EOF.

Ok, so the key point here is that there is a full block beyond EOF within
the associated extent map, which is different with the scenario reproduced 
by syzbot.
According to the Dave's comments, the trimming behavior seems like should 
be done in filesystem(e.g.,xfs), instead of iomap. I will reconsider this scenario.

Thanks for your comments and review.
> 
> Brian
> 
> > > 
> > > So IOW, this patch works for the syzbot variant because it happens to
> > > reproduce a situation where pos will be beyond EOF, but that is an
> > > 
> > > > assumption that might not always be true. The fsx generated variant runs
> > > > a sequence that produces a mapping that spans across EOF, which means
> > > > that pos is within EOF at the start of unshare, so unshare proceeds to
> > > > walk across the EOF boundary, the corresponding EOF folio is not fully
> > > > uptodate, and thus write begin wants to do partial zeroing and
> > > > fails/warns.
> > 
> > Yeah, it's exactly what the syzbot does.
> > > 
> > > I suspect that if the higher level range were trimmed to be within EOF
> > > in iomap_file_unshare(), that would prevent this problem in either case.
> > > Note that this was on a -bsize=1k fs, so what I'm not totally sure about
> > > is whether skipping zeroing as such would be a problem with larger FSBs.
> > > My initial thinking was this might not be possible since the EOF folio
> > > should be fully uptodate in that case, but that probably requires some
> > > thought/review/testing.
> > > 
> > > Brian
> > > 
> > > > BTW, maybe the check here should be
> > > >                   if (pos >= i_size_read(iter->inode))
> > > > 
> > > > If there is any misunderstanding, please let me know, thanks.
> > > > 
> > > > [1]: https://lore.kernel.org/all/0000000000008964f1061f8c32b6@google.com/T/
> > > > [2]: https://lore.kernel.org/all/20240903054808.126799-1-sunjunchao2870@gmail.com/
> > > > 
> > > > > 
> > > > > Thoughts?
> > > > > 
> > > > > Brian
> > > > > 
> > > > > [1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@redhat.com/
> > > > > [2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@redhat.com/
> > > > > [3] fsx ops file:
> > > > > 
> > > > > fallocate 0x3bc00 0x400 0
> > > > > write 0x3bc00 0x800 0x3c000
> > > > > clone_range 0x3bc00 0x400 0x0 0x3c400
> > > > > skip mapread 0x3c000 0x400 0x3c400
> > > > > fallocate 0x3bc00 0xc00 0x3c400 unshare
> > > > > 
> > > > 
> > > > 
> > > > Thanks,
> > > > --
> > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > 
> > > 
> > 
> > Thanks,
> > -- 
> > Julian Sun <sunjunchao2870@gmail.com>
> > 
> 

Thanks,
Brian Foster Sept. 10, 2024, 12:30 p.m. UTC | #8
On Tue, Sep 10, 2024 at 03:03:18PM +0800, Julian Sun wrote:
> On Mon, 2024-09-09 at 15:29 -0400, Brian Foster wrote:
> > On Tue, Sep 10, 2024 at 01:40:24AM +0800, Julian Sun wrote:
> > > Brian Foster <bfoster@redhat.com> 于2024年9月9日周一 21:27写道:
> > > > 
> > > > On Mon, Sep 09, 2024 at 08:15:43PM +0800, Julian Sun wrote:
> > > > > Hi Brian,
> > > > > 
> > > > > Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
> > > > > > 
> > > > > > On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> > > > > > > Attempting to unshare extents beyond EOF will trigger
> > > > > > > the need zeroing case, which in turn triggers a warning.
> > > > > > > Therefore, let's skip the unshare process if extents are
> > > > > > > beyond EOF.
> > > > > > > 
> > > > > > > Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > > > > > > Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> > > > > > > Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> > > > > > > Inspired-by: Dave Chinner <david@fromorbit.com>
> > > > > > > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > > > > > > ---
> > > > > > >  fs/iomap/buffered-io.c | 3 +++
> > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > > > > index f420c53d86ac..8898d5ec606f 100644
> > > > > > > --- a/fs/iomap/buffered-io.c
> > > > > > > +++ b/fs/iomap/buffered-io.c
> > > > > > > @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> > > > > > >       /* don't bother with holes or unwritten extents */
> > > > > > >       if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> > > > > > >               return length;
> > > > > > > +     /* don't try to unshare any extents beyond EOF. */
> > > > > > > +     if (pos > i_size_read(iter->inode))
> > > > > > > +             return length;
> > > > > > 
> > > > > > Hi Julian,
> > > > > > 
> > > > > > 
> > > > > > > What about if pos starts within EOF and the operation extends beyond it?
> > > > > 
> > > > > Extents within EOF will be unshared as usual. Details are below.
> > > > > > 
> > > > > > > I ask because I think I've reproduced this scenario, though it is a bit
> > > > > > > tricky and has dependencies...
> > > > > > > 
> > > > > > > For one, it seems to depend on the cowblocks patch I recently posted
> > > > > > > here [1] (though I don't think this is necessarily a problem with the
> > > > > > > patch, it just depends on keeping COW fork blocks around after the
> > > > > > > unshare). With that, I reproduce via fsx with unshare range support [2]
> > > > > > > using the ops file appended below [3] on a -bsize=1k XFS fs.
> > > > > > > 
> > > > > > > I haven't quite characterized the full sequence other than it looks like
> > > > > > > the unshare walks across EOF with a shared data fork block and COW fork
> > > > > > > delalloc, presumably finds the post-eof part of the folio !uptodate (so
> > > > > > > iomap_adjust_read_range() doesn't skip it), and then trips over the
> > > > > > > warning and error return associated with the folio zeroing in
> > > > > > > __iomap_write_begin().
> > > > > 
> > > > > The scenario has already been reproduced by syzbot[1]. The reproducer
> > > > > provided by syzbot constructed the following extent maps for a file of
> > > > > size 0xE00 before fallocate unshare:
> > > > > 
> > > > > 0 - 4k: shared between two files
> > > > > 4k - 6k: hole beyond EOF, not shared
> > > > > 6k - 8k: delalloc extends
> > > > > 
> > > > > Then the reproducer attempted to unshare the extent between 0 and
> > > > > 0x2000 bytes, but the file size is 0xE00. This is likely the scenario
> > > > > you were referring to?
> > > > > 
> > > > 
> > > > Yes, sort of..
> > > > 
> > > > > Eventually the unshare code does:
> > > > > first map: 0 - 4k - unshare successfully.
> > > > > second map: 4k - 6k - hole, skip. Beyond EOF.
> > > > > third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
> > > > > Fires warnings because UNSHARE.
> > > > > 
> > > > > During the first call to iomap_unshare_iter(), iomap_length() returned
> > > > > 4k, so 4k bytes were unshared.
> > > > > See discuss here[2] for more details.
> > > > > > 
> > > > > > This all kind of has me wondering.. do we know the purpose of this
> > > > > > warning/error in the first place? It seems like it's more of an
> > > > > > "unexpected situation" than a specific problem. E.g., assuming the same
> > > > > > page were mmap'd, I _think_ the read fault path would do the same sort
> > > > > > of zeroing such that the unshare would see a fully uptodate folio and
> > > > > > carry on as normal. I added the mapread op to the opsfile below to give
> > > > > > that a quick test (remove the "skip" text to enable it), and it seems to
> > > > > > prevent the error, but I've not confirmed whether that theory is
> > > > > > actually what's happening.
> > > > > > 
> > > > > > 
> > > > > > > FWIW, I also wonder if another way to handle this would be to just
> > > > > > > restrict the range of iomap_file_unshare() to within EOF. IOW if a
> > > > > > > caller passes a range beyond EOF, just process whatever part of the
> > > > > > > range falls within EOF. It seems iomap isn't responsible for the file
> > > > > > > extending aspect of the fallocate unshare command anyways.
> > > > > 
> > > > > It already does 'just process whatever part of the range falls within EOF'.
> > > > > Check the above case.
> > > > > 
> > > > > I'm not sure if I fully understand what you mean. This patch does not
> > > > > prevent unsharing extents within the EOF. This patch checks if pos is
> > > > > beyond EOF, instead of checking if pos + length is beyond EOF. So the
> > > > > extents within EOF should be unshared as usual.
> > > > > 
> > > > 
> > > > I'm not concerned about preventing unsharing. I'm concerned that this
> > > > patch doesn't always prevent attempts to unshare post-eof ranges. I
> > > > think the difference here is that in the variant I was hitting, we end
> > > > 
> > > > > up with a mapping that starts within EOF and ends beyond EOF, whereas
> > > > > the syzbot variant produces a scenario where the problematic mapping
> > > > > always starts beyond EOF.
> > > 
> > > This is not true. In the case above, the syzbot did indeed unshare the
> > > extents between 0-4k, which were started within EOF and ended beyond
> > > EOF. The specific variants here are: pos:0 len:0x1000 EOF: 0xE00. And
> > > the unshare code successfully unshared extents between 0 and 4k.
> > > 
> > > During the next loop in iomap_file_unshare(), the pos became 0x1000,
> > > which is beyond EOF.  What this patch does is to skip the unshare
> > > during the second loop.
> > > Is there anything I misunderstand?
> > 
> > Hmm, what block size? Does the associated mapping have at least one full
> > block beyond EOF? If you have a map at offset 0, length 0x1000 and EOF
> > at 0xE00, then unless you have 512b blocks it sounds like the EOF block
> > actually starts within EOF.
> 
> The block size here is 2k, and there isn't a full block beyond EOF within 
> this extent map.

Ok. That likely explains the difference in behavior. The fsx variant has
a mapping that starts within EOF and has at least one post-EOF block
that unshare attempts to process.

> > 
> > The variant I'm seeing is more like this.. consider a -bsize=1k fs, a
> > file size of 0x3c400, and an EOF mapping of (offset 0x3c000, length
> > 0x4000). The EOF folio in this case is 4k in size and starts at the same
> > 0x3c000 offset as the EOF mapping.
> > 
> > So with 1k blocks, the EOF mapping starts one block before EOF and
> > extends well beyond it. What happens in the test case is that
> > iomap_unshare_iter() is called with the EOF folio, pos 0x3c000, length
> > 0x800, and where the block at offset 0x3c400 is not marked uptodate. pos
> > is thus within EOF, but the while loop in __iomap_write_begin() walks
> > past it and attempts to process one block beyond EOF.
> 
> Ok, so the key point here is that there is a full block beyond EOF within
> the associated extent map, which is different with the scenario reproduced 
> by syzbot.
> According to the Dave's comments, the trimming behavior seems like should 
> be done in filesystem(e.g.,xfs), instead of iomap. I will reconsider this scenario.
> 

Seems reasonable, but I don't agree that is a suitable fix for iomap. To
be clear, it's perfectly fine for the fs to trim the range however it
sees fit (i.e. no shared blocks beyond EOF in XFS), but we should also
recognize that iomap is a generic layer and unshare is currently
implemented to trip over itself, warn and fail if passed a post-eof
range.

I still suspect that just making unshare work correctly around i_size
might be the more elegant long term solution, but that is not totally
clear. IMO, as long as unshare is written to intentionally trip over
itself for post-eof ranges, it should either trim the range or check for
valid parameters and document the limitation. Otherwise we just leave a
landmine for the next caller to have to work through the same problems,
which is particularly subtle since the higher level fallocate unshare
mode supports post-eof ranges. Just my .02.

Brian

> Thanks for your comments and review.
> > 
> > Brian
> > 
> > > > 
> > > > So IOW, this patch works for the syzbot variant because it happens to
> > > > reproduce a situation where pos will be beyond EOF, but that is an
> > > > 
> > > > > assumption that might not always be true. The fsx generated variant runs
> > > > > a sequence that produces a mapping that spans across EOF, which means
> > > > > that pos is within EOF at the start of unshare, so unshare proceeds to
> > > > > walk across the EOF boundary, the corresponding EOF folio is not fully
> > > > > uptodate, and thus write begin wants to do partial zeroing and
> > > > > fails/warns.
> > > 
> > > Yeah, it's exactly what the syzbot does.
> > > > 
> > > > I suspect that if the higher level range were trimmed to be within EOF
> > > > in iomap_file_unshare(), that would prevent this problem in either case.
> > > > Note that this was on a -bsize=1k fs, so what I'm not totally sure about
> > > > is whether skipping zeroing as such would be a problem with larger FSBs.
> > > > My initial thinking was this might not be possible since the EOF folio
> > > > should be fully uptodate in that case, but that probably requires some
> > > > thought/review/testing.
> > > > 
> > > > Brian
> > > > 
> > > > > BTW, maybe the check here should be
> > > > >                   if (pos >= i_size_read(iter->inode))
> > > > > 
> > > > > If there is any misunderstanding, please let me know, thanks.
> > > > > 
> > > > > [1]: https://lore.kernel.org/all/0000000000008964f1061f8c32b6@google.com/T/
> > > > > [2]: https://lore.kernel.org/all/20240903054808.126799-1-sunjunchao2870@gmail.com/
> > > > > 
> > > > > > 
> > > > > > Thoughts?
> > > > > > 
> > > > > > Brian
> > > > > > 
> > > > > > [1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@redhat.com/
> > > > > > [2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@redhat.com/
> > > > > > [3] fsx ops file:
> > > > > > 
> > > > > > fallocate 0x3bc00 0x400 0
> > > > > > write 0x3bc00 0x800 0x3c000
> > > > > > clone_range 0x3bc00 0x400 0x0 0x3c400
> > > > > > skip mapread 0x3c000 0x400 0x3c400
> > > > > > fallocate 0x3bc00 0xc00 0x3c400 unshare
> > > > > > 
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > --
> > > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > > 
> > > > 
> > > 
> > > Thanks,
> > > -- 
> > > Julian Sun <sunjunchao2870@gmail.com>
> > > 
> > 
> 
> Thanks,
> -- 
> Julian Sun <sunjunchao2870@gmail.com>
>
Julian Sun Sept. 10, 2024, 1:25 p.m. UTC | #9
Brian Foster <bfoster@redhat.com> 于2024年9月10日周二 20:29写道:
>
> On Tue, Sep 10, 2024 at 03:03:18PM +0800, Julian Sun wrote:
> > On Mon, 2024-09-09 at 15:29 -0400, Brian Foster wrote:
> > > On Tue, Sep 10, 2024 at 01:40:24AM +0800, Julian Sun wrote:
> > > > Brian Foster <bfoster@redhat.com> 于2024年9月9日周一 21:27写道:
> > > > >
> > > > > On Mon, Sep 09, 2024 at 08:15:43PM +0800, Julian Sun wrote:
> > > > > > Hi Brian,
> > > > > >
> > > > > > Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
> > > > > > >
> > > > > > > On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> > > > > > > > Attempting to unshare extents beyond EOF will trigger
> > > > > > > > the need zeroing case, which in turn triggers a warning.
> > > > > > > > Therefore, let's skip the unshare process if extents are
> > > > > > > > beyond EOF.
> > > > > > > >
> > > > > > > > Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > > > > > > > Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> > > > > > > > Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> > > > > > > > Inspired-by: Dave Chinner <david@fromorbit.com>
> > > > > > > > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > > > > > > > ---
> > > > > > > >  fs/iomap/buffered-io.c | 3 +++
> > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > > > > > index f420c53d86ac..8898d5ec606f 100644
> > > > > > > > --- a/fs/iomap/buffered-io.c
> > > > > > > > +++ b/fs/iomap/buffered-io.c
> > > > > > > > @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> > > > > > > >       /* don't bother with holes or unwritten extents */
> > > > > > > >       if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> > > > > > > >               return length;
> > > > > > > > +     /* don't try to unshare any extents beyond EOF. */
> > > > > > > > +     if (pos > i_size_read(iter->inode))
> > > > > > > > +             return length;
> > > > > > >
> > > > > > > Hi Julian,
> > > > > > >
> > > > > > >
> > > > > > > > What about if pos starts within EOF and the operation extends beyond it?
> > > > > >
> > > > > > Extents within EOF will be unshared as usual. Details are below.
> > > > > > >
> > > > > > > > I ask because I think I've reproduced this scenario, though it is a bit
> > > > > > > > tricky and has dependencies...
> > > > > > > >
> > > > > > > > For one, it seems to depend on the cowblocks patch I recently posted
> > > > > > > > here [1] (though I don't think this is necessarily a problem with the
> > > > > > > > patch, it just depends on keeping COW fork blocks around after the
> > > > > > > > unshare). With that, I reproduce via fsx with unshare range support [2]
> > > > > > > > using the ops file appended below [3] on a -bsize=1k XFS fs.
> > > > > > > >
> > > > > > > > I haven't quite characterized the full sequence other than it looks like
> > > > > > > > the unshare walks across EOF with a shared data fork block and COW fork
> > > > > > > > delalloc, presumably finds the post-eof part of the folio !uptodate (so
> > > > > > > > iomap_adjust_read_range() doesn't skip it), and then trips over the
> > > > > > > > warning and error return associated with the folio zeroing in
> > > > > > > > __iomap_write_begin().
> > > > > >
> > > > > > The scenario has already been reproduced by syzbot[1]. The reproducer
> > > > > > provided by syzbot constructed the following extent maps for a file of
> > > > > > size 0xE00 before fallocate unshare:
> > > > > >
> > > > > > 0 - 4k: shared between two files
> > > > > > 4k - 6k: hole beyond EOF, not shared
> > > > > > 6k - 8k: delalloc extends
> > > > > >
> > > > > > Then the reproducer attempted to unshare the extent between 0 and
> > > > > > 0x2000 bytes, but the file size is 0xE00. This is likely the scenario
> > > > > > you were referring to?
> > > > > >
> > > > >
> > > > > Yes, sort of..
> > > > >
> > > > > > Eventually the unshare code does:
> > > > > > first map: 0 - 4k - unshare successfully.
> > > > > > second map: 4k - 6k - hole, skip. Beyond EOF.
> > > > > > third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
> > > > > > Fires warnings because UNSHARE.
> > > > > >
> > > > > > During the first call to iomap_unshare_iter(), iomap_length() returned
> > > > > > 4k, so 4k bytes were unshared.
> > > > > > See discuss here[2] for more details.
> > > > > > >
> > > > > > > This all kind of has me wondering.. do we know the purpose of this
> > > > > > > warning/error in the first place? It seems like it's more of an
> > > > > > > "unexpected situation" than a specific problem. E.g., assuming the same
> > > > > > > page were mmap'd, I _think_ the read fault path would do the same sort
> > > > > > > of zeroing such that the unshare would see a fully uptodate folio and
> > > > > > > carry on as normal. I added the mapread op to the opsfile below to give
> > > > > > > that a quick test (remove the "skip" text to enable it), and it seems to
> > > > > > > prevent the error, but I've not confirmed whether that theory is
> > > > > > > actually what's happening.
> > > > > > >
> > > > > > >
> > > > > > > > FWIW, I also wonder if another way to handle this would be to just
> > > > > > > > restrict the range of iomap_file_unshare() to within EOF. IOW if a
> > > > > > > > caller passes a range beyond EOF, just process whatever part of the
> > > > > > > > range falls within EOF. It seems iomap isn't responsible for the file
> > > > > > > > extending aspect of the fallocate unshare command anyways.
> > > > > >
> > > > > > It already does 'just process whatever part of the range falls within EOF'.
> > > > > > Check the above case.
> > > > > >
> > > > > > I'm not sure if I fully understand what you mean. This patch does not
> > > > > > prevent unsharing extents within the EOF. This patch checks if pos is
> > > > > > beyond EOF, instead of checking if pos + length is beyond EOF. So the
> > > > > > extents within EOF should be unshared as usual.
> > > > > >
> > > > >
> > > > > I'm not concerned about preventing unsharing. I'm concerned that this
> > > > > patch doesn't always prevent attempts to unshare post-eof ranges. I
> > > > > think the difference here is that in the variant I was hitting, we end
> > > > >
> > > > > > up with a mapping that starts within EOF and ends beyond EOF, whereas
> > > > > > the syzbot variant produces a scenario where the problematic mapping
> > > > > > always starts beyond EOF.
> > > >
> > > > This is not true. In the case above, the syzbot did indeed unshare the
> > > > extents between 0-4k, which were started within EOF and ended beyond
> > > > EOF. The specific variants here are: pos:0 len:0x1000 EOF: 0xE00. And
> > > > the unshare code successfully unshared extents between 0 and 4k.
> > > >
> > > > During the next loop in iomap_file_unshare(), the pos became 0x1000,
> > > > which is beyond EOF.  What this patch does is to skip the unshare
> > > > during the second loop.
> > > > Is there anything I misunderstand?
> > >
> > > Hmm, what block size? Does the associated mapping have at least one full
> > > block beyond EOF? If you have a map at offset 0, length 0x1000 and EOF
> > > at 0xE00, then unless you have 512b blocks it sounds like the EOF block
> > > actually starts within EOF.
> >
> > The block size here is 2k, and there isn't a full block beyond EOF within
> > this extent map.
>
> Ok. That likely explains the difference in behavior. The fsx variant has
> a mapping that starts within EOF and has at least one post-EOF block
> that unshare attempts to process.
>
> > >
> > > The variant I'm seeing is more like this.. consider a -bsize=1k fs, a
> > > file size of 0x3c400, and an EOF mapping of (offset 0x3c000, length
> > > 0x4000). The EOF folio in this case is 4k in size and starts at the same
> > > 0x3c000 offset as the EOF mapping.
> > >
> > > So with 1k blocks, the EOF mapping starts one block before EOF and
> > > extends well beyond it. What happens in the test case is that
> > > iomap_unshare_iter() is called with the EOF folio, pos 0x3c000, length
> > > 0x800, and where the block at offset 0x3c400 is not marked uptodate. pos
> > > is thus within EOF, but the while loop in __iomap_write_begin() walks
> > > past it and attempts to process one block beyond EOF.
> >
> > Ok, so the key point here is that there is a full block beyond EOF within
> > the associated extent map, which is different with the scenario reproduced
> > by syzbot.
> > According to the Dave's comments, the trimming behavior seems like should
> > be done in filesystem(e.g.,xfs), instead of iomap. I will reconsider this scenario.
> >
>
> Seems reasonable, but I don't agree that is a suitable fix for iomap. To
> be clear, it's perfectly fine for the fs to trim the range however it
> sees fit (i.e. no shared blocks beyond EOF in XFS), but we should also
> recognize that iomap is a generic layer and unshare is currently
> implemented to trip over itself, warn and fail if passed a post-eof
> range.
>
>
> > I still suspect that just making unshare work correctly around i_size
> > might be the more elegant long term solution, but that is not totally
> > clear. IMO, as long as unshare is written to intentionally trip over
> > itself for post-eof ranges, it should either trim the range or check for
> > valid parameters and document the limitation. Otherwise we just leave a
> > landmine for the next caller to have to work through the same problems,
> > which is particularly subtle since the higher level fallocate unshare
> > mode supports post-eof ranges. Just my .02.

Yeah, totally agreed. I prefer to do the trimming in the vfs layer
just like what generic_copy_file_checks() does, instead of a specific
file system. Maybe we need a helper function
generic_fallocate_checks(). But it requires more thought and testing.

>
> Brian
>
> > Thanks for your comments and review.
> > >
> > > Brian
> > >
> > > > >
> > > > > So IOW, this patch works for the syzbot variant because it happens to
> > > > > reproduce a situation where pos will be beyond EOF, but that is an
> > > > >
> > > > > > assumption that might not always be true. The fsx generated variant runs
> > > > > > a sequence that produces a mapping that spans across EOF, which means
> > > > > > that pos is within EOF at the start of unshare, so unshare proceeds to
> > > > > > walk across the EOF boundary, the corresponding EOF folio is not fully
> > > > > > uptodate, and thus write begin wants to do partial zeroing and
> > > > > > fails/warns.
> > > >
> > > > Yeah, it's exactly what the syzbot does.
> > > > >
> > > > > I suspect that if the higher level range were trimmed to be within EOF
> > > > > in iomap_file_unshare(), that would prevent this problem in either case.
> > > > > Note that this was on a -bsize=1k fs, so what I'm not totally sure about
> > > > > is whether skipping zeroing as such would be a problem with larger FSBs.
> > > > > My initial thinking was this might not be possible since the EOF folio
> > > > > should be fully uptodate in that case, but that probably requires some
> > > > > thought/review/testing.
> > > > >
> > > > > Brian
> > > > >
> > > > > > BTW, maybe the check here should be
> > > > > >                   if (pos >= i_size_read(iter->inode))
> > > > > >
> > > > > > If there is any misunderstanding, please let me know, thanks.
> > > > > >
> > > > > > [1]: https://lore.kernel.org/all/0000000000008964f1061f8c32b6@google.com/T/
> > > > > > [2]: https://lore.kernel.org/all/20240903054808.126799-1-sunjunchao2870@gmail.com/
> > > > > >
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > Brian
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@redhat.com/
> > > > > > > [2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@redhat.com/
> > > > > > > [3] fsx ops file:
> > > > > > >
> > > > > > > fallocate 0x3bc00 0x400 0
> > > > > > > write 0x3bc00 0x800 0x3c000
> > > > > > > clone_range 0x3bc00 0x400 0x0 0x3c400
> > > > > > > skip mapread 0x3c000 0x400 0x3c400
> > > > > > > fallocate 0x3bc00 0xc00 0x3c400 unshare
> > > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > --
> > > > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > > >
> > > > >
> > > >
> > > > Thanks,
> > > > --
> > > > Julian Sun <sunjunchao2870@gmail.com>
> > > >
> > >
> >
> > Thanks,
> > --
> > Julian Sun <sunjunchao2870@gmail.com>
> >
>
Brian Foster Sept. 10, 2024, 1:56 p.m. UTC | #10
On Tue, Sep 10, 2024 at 09:25:41PM +0800, Julian Sun wrote:
> Brian Foster <bfoster@redhat.com> 于2024年9月10日周二 20:29写道:
> >
> > On Tue, Sep 10, 2024 at 03:03:18PM +0800, Julian Sun wrote:
> > > On Mon, 2024-09-09 at 15:29 -0400, Brian Foster wrote:
> > > > On Tue, Sep 10, 2024 at 01:40:24AM +0800, Julian Sun wrote:
> > > > > Brian Foster <bfoster@redhat.com> 于2024年9月9日周一 21:27写道:
> > > > > >
> > > > > > On Mon, Sep 09, 2024 at 08:15:43PM +0800, Julian Sun wrote:
> > > > > > > Hi Brian,
> > > > > > >
> > > > > > > Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
> > > > > > > >
> > > > > > > > On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun wrote:
> > > > > > > > > Attempting to unshare extents beyond EOF will trigger
> > > > > > > > > the need zeroing case, which in turn triggers a warning.
> > > > > > > > > Therefore, let's skip the unshare process if extents are
> > > > > > > > > beyond EOF.
> > > > > > > > >
> > > > > > > > > Reported-and-tested-by: syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > > > > > > > > Closes: https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf306e5a0
> > > > > > > > > Fixes: 32a38a499104 ("iomap: use write_begin to read pages to unshare")
> > > > > > > > > Inspired-by: Dave Chinner <david@fromorbit.com>
> > > > > > > > > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > > > > > > > > ---
> > > > > > > > >  fs/iomap/buffered-io.c | 3 +++
> > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > > > > > > index f420c53d86ac..8898d5ec606f 100644
> > > > > > > > > --- a/fs/iomap/buffered-io.c
> > > > > > > > > +++ b/fs/iomap/buffered-io.c
> > > > > > > > > @@ -1340,6 +1340,9 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> > > > > > > > >       /* don't bother with holes or unwritten extents */
> > > > > > > > >       if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> > > > > > > > >               return length;
> > > > > > > > > +     /* don't try to unshare any extents beyond EOF. */
> > > > > > > > > +     if (pos > i_size_read(iter->inode))
> > > > > > > > > +             return length;
> > > > > > > >
> > > > > > > > Hi Julian,
> > > > > > > >
> > > > > > > >
> > > > > > > > > What about if pos starts within EOF and the operation extends beyond it?
> > > > > > >
> > > > > > > Extents within EOF will be unshared as usual. Details are below.
> > > > > > > >
> > > > > > > > > I ask because I think I've reproduced this scenario, though it is a bit
> > > > > > > > > tricky and has dependencies...
> > > > > > > > >
> > > > > > > > > For one, it seems to depend on the cowblocks patch I recently posted
> > > > > > > > > here [1] (though I don't think this is necessarily a problem with the
> > > > > > > > > patch, it just depends on keeping COW fork blocks around after the
> > > > > > > > > unshare). With that, I reproduce via fsx with unshare range support [2]
> > > > > > > > > using the ops file appended below [3] on a -bsize=1k XFS fs.
> > > > > > > > >
> > > > > > > > > I haven't quite characterized the full sequence other than it looks like
> > > > > > > > > the unshare walks across EOF with a shared data fork block and COW fork
> > > > > > > > > delalloc, presumably finds the post-eof part of the folio !uptodate (so
> > > > > > > > > iomap_adjust_read_range() doesn't skip it), and then trips over the
> > > > > > > > > warning and error return associated with the folio zeroing in
> > > > > > > > > __iomap_write_begin().
> > > > > > >
> > > > > > > The scenario has already been reproduced by syzbot[1]. The reproducer
> > > > > > > provided by syzbot constructed the following extent maps for a file of
> > > > > > > size 0xE00 before fallocate unshare:
> > > > > > >
> > > > > > > 0 - 4k: shared between two files
> > > > > > > 4k - 6k: hole beyond EOF, not shared
> > > > > > > 6k - 8k: delalloc extends
> > > > > > >
> > > > > > > Then the reproducer attempted to unshare the extent between 0 and
> > > > > > > 0x2000 bytes, but the file size is 0xE00. This is likely the scenario
> > > > > > > you were referring to?
> > > > > > >
> > > > > >
> > > > > > Yes, sort of..
> > > > > >
> > > > > > > Eventually the unshare code does:
> > > > > > > first map: 0 - 4k - unshare successfully.
> > > > > > > second map: 4k - 6k - hole, skip. Beyond EOF.
> > > > > > > third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
> > > > > > > Fires warnings because UNSHARE.
> > > > > > >
> > > > > > > During the first call to iomap_unshare_iter(), iomap_length() returned
> > > > > > > 4k, so 4k bytes were unshared.
> > > > > > > See discuss here[2] for more details.
> > > > > > > >
> > > > > > > > This all kind of has me wondering.. do we know the purpose of this
> > > > > > > > warning/error in the first place? It seems like it's more of an
> > > > > > > > "unexpected situation" than a specific problem. E.g., assuming the same
> > > > > > > > page were mmap'd, I _think_ the read fault path would do the same sort
> > > > > > > > of zeroing such that the unshare would see a fully uptodate folio and
> > > > > > > > carry on as normal. I added the mapread op to the opsfile below to give
> > > > > > > > that a quick test (remove the "skip" text to enable it), and it seems to
> > > > > > > > prevent the error, but I've not confirmed whether that theory is
> > > > > > > > actually what's happening.
> > > > > > > >
> > > > > > > >
> > > > > > > > > FWIW, I also wonder if another way to handle this would be to just
> > > > > > > > > restrict the range of iomap_file_unshare() to within EOF. IOW if a
> > > > > > > > > caller passes a range beyond EOF, just process whatever part of the
> > > > > > > > > range falls within EOF. It seems iomap isn't responsible for the file
> > > > > > > > > extending aspect of the fallocate unshare command anyways.
> > > > > > >
> > > > > > > It already does 'just process whatever part of the range falls within EOF'.
> > > > > > > Check the above case.
> > > > > > >
> > > > > > > I'm not sure if I fully understand what you mean. This patch does not
> > > > > > > prevent unsharing extents within the EOF. This patch checks if pos is
> > > > > > > beyond EOF, instead of checking if pos + length is beyond EOF. So the
> > > > > > > extents within EOF should be unshared as usual.
> > > > > > >
> > > > > >
> > > > > > I'm not concerned about preventing unsharing. I'm concerned that this
> > > > > > patch doesn't always prevent attempts to unshare post-eof ranges. I
> > > > > > think the difference here is that in the variant I was hitting, we end
> > > > > >
> > > > > > > up with a mapping that starts within EOF and ends beyond EOF, whereas
> > > > > > > the syzbot variant produces a scenario where the problematic mapping
> > > > > > > always starts beyond EOF.
> > > > >
> > > > > This is not true. In the case above, the syzbot did indeed unshare the
> > > > > extents between 0-4k, which were started within EOF and ended beyond
> > > > > EOF. The specific variants here are: pos:0 len:0x1000 EOF: 0xE00. And
> > > > > the unshare code successfully unshared extents between 0 and 4k.
> > > > >
> > > > > During the next loop in iomap_file_unshare(), the pos became 0x1000,
> > > > > which is beyond EOF.  What this patch does is to skip the unshare
> > > > > during the second loop.
> > > > > Is there anything I misunderstand?
> > > >
> > > > Hmm, what block size? Does the associated mapping have at least one full
> > > > block beyond EOF? If you have a map at offset 0, length 0x1000 and EOF
> > > > at 0xE00, then unless you have 512b blocks it sounds like the EOF block
> > > > actually starts within EOF.
> > >
> > > The block size here is 2k, and there isn't a full block beyond EOF within
> > > this extent map.
> >
> > Ok. That likely explains the difference in behavior. The fsx variant has
> > a mapping that starts within EOF and has at least one post-EOF block
> > that unshare attempts to process.
> >
> > > >
> > > > The variant I'm seeing is more like this.. consider a -bsize=1k fs, a
> > > > file size of 0x3c400, and an EOF mapping of (offset 0x3c000, length
> > > > 0x4000). The EOF folio in this case is 4k in size and starts at the same
> > > > 0x3c000 offset as the EOF mapping.
> > > >
> > > > So with 1k blocks, the EOF mapping starts one block before EOF and
> > > > extends well beyond it. What happens in the test case is that
> > > > iomap_unshare_iter() is called with the EOF folio, pos 0x3c000, length
> > > > 0x800, and where the block at offset 0x3c400 is not marked uptodate. pos
> > > > is thus within EOF, but the while loop in __iomap_write_begin() walks
> > > > past it and attempts to process one block beyond EOF.
> > >
> > > Ok, so the key point here is that there is a full block beyond EOF within
> > > the associated extent map, which is different with the scenario reproduced
> > > by syzbot.
> > > According to the Dave's comments, the trimming behavior seems like should
> > > be done in filesystem(e.g.,xfs), instead of iomap. I will reconsider this scenario.
> > >
> >
> > Seems reasonable, but I don't agree that is a suitable fix for iomap. To
> > be clear, it's perfectly fine for the fs to trim the range however it
> > sees fit (i.e. no shared blocks beyond EOF in XFS), but we should also
> > recognize that iomap is a generic layer and unshare is currently
> > implemented to trip over itself, warn and fail if passed a post-eof
> > range.
> >
> >
> > > I still suspect that just making unshare work correctly around i_size
> > > might be the more elegant long term solution, but that is not totally
> > > clear. IMO, as long as unshare is written to intentionally trip over
> > > itself for post-eof ranges, it should either trim the range or check for
> > > valid parameters and document the limitation. Otherwise we just leave a
> > > landmine for the next caller to have to work through the same problems,
> > > which is particularly subtle since the higher level fallocate unshare
> > > mode supports post-eof ranges. Just my .02.
> 
> Yeah, totally agreed. I prefer to do the trimming in the vfs layer
> just like what generic_copy_file_checks() does, instead of a specific
> file system. Maybe we need a helper function
> generic_fallocate_checks(). But it requires more thought and testing.
> 

I would agree with that if not for the subtle difference between
fallocate unshare and the iomap implementation. fallocate unshare is
basically just a behavior modifier for how shared blocks are handled by
mode=0 preallocation, so AIUI it fully supports post-eof ranges. It can
extend the file size or leave it unchanged (based on whether KEEP_SIZE
is set) and preallocate beyond EOF.

IOW, it might be perfectly valid for a caller to run an fallocate
unshare across EOF that happens to unshare various shared blocks within
EOF, and then preallocate and extend the file size as part of the same
command. The functional responsibility just happens to be split between
iomap and the fs.

Brian

> >
> > Brian
> >
> > > Thanks for your comments and review.
> > > >
> > > > Brian
> > > >
> > > > > >
> > > > > > So IOW, this patch works for the syzbot variant because it happens to
> > > > > > reproduce a situation where pos will be beyond EOF, but that is an
> > > > > >
> > > > > > > assumption that might not always be true. The fsx generated variant runs
> > > > > > > a sequence that produces a mapping that spans across EOF, which means
> > > > > > > that pos is within EOF at the start of unshare, so unshare proceeds to
> > > > > > > walk across the EOF boundary, the corresponding EOF folio is not fully
> > > > > > > uptodate, and thus write begin wants to do partial zeroing and
> > > > > > > fails/warns.
> > > > >
> > > > > Yeah, it's exactly what the syzbot does.
> > > > > >
> > > > > > I suspect that if the higher level range were trimmed to be within EOF
> > > > > > in iomap_file_unshare(), that would prevent this problem in either case.
> > > > > > Note that this was on a -bsize=1k fs, so what I'm not totally sure about
> > > > > > is whether skipping zeroing as such would be a problem with larger FSBs.
> > > > > > My initial thinking was this might not be possible since the EOF folio
> > > > > > should be fully uptodate in that case, but that probably requires some
> > > > > > thought/review/testing.
> > > > > >
> > > > > > Brian
> > > > > >
> > > > > > > BTW, maybe the check here should be
> > > > > > >                   if (pos >= i_size_read(iter->inode))
> > > > > > >
> > > > > > > If there is any misunderstanding, please let me know, thanks.
> > > > > > >
> > > > > > > [1]: https://lore.kernel.org/all/0000000000008964f1061f8c32b6@google.com/T/
> > > > > > > [2]: https://lore.kernel.org/all/20240903054808.126799-1-sunjunchao2870@gmail.com/
> > > > > > >
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > Brian
> > > > > > > >
> > > > > > > > [1] https://lore.kernel.org/linux-xfs/20240906114051.120743-1-bfoster@redhat.com/
> > > > > > > > [2] https://lore.kernel.org/fstests/20240906185606.136402-1-bfoster@redhat.com/
> > > > > > > > [3] fsx ops file:
> > > > > > > >
> > > > > > > > fallocate 0x3bc00 0x400 0
> > > > > > > > write 0x3bc00 0x800 0x3c000
> > > > > > > > clone_range 0x3bc00 0x400 0x0 0x3c400
> > > > > > > > skip mapread 0x3c000 0x400 0x3c400
> > > > > > > > fallocate 0x3bc00 0xc00 0x3c400 unshare
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > --
> > > > > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > > > >
> > > > > >
> > > > >
> > > > > Thanks,
> > > > > --
> > > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > >
> > > >
> > >
> > > Thanks,
> > > --
> > > Julian Sun <sunjunchao2870@gmail.com>
> > >
> >
> 
> 
> -- 
> Julian Sun <sunjunchao2870@gmail.com>
>
Julian Sun Sept. 11, 2024, 4:01 a.m. UTC | #11
On Tue, 2024-09-10 at 09:56 -0400, Brian Foster wrote:
> On Tue, Sep 10, 2024 at 09:25:41PM +0800, Julian Sun wrote:
> > Brian Foster <bfoster@redhat.com> 于2024年9月10日周二 20:29写道:
> > > 
> > > On Tue, Sep 10, 2024 at 03:03:18PM +0800, Julian Sun wrote:
> > > > On Mon, 2024-09-09 at 15:29 -0400, Brian Foster wrote:
> > > > > On Tue, Sep 10, 2024 at 01:40:24AM +0800, Julian Sun wrote:
> > > > > > Brian Foster <bfoster@redhat.com> 于2024年9月9日周一 21:27写道:
> > > > > > > 
> > > > > > > On Mon, Sep 09, 2024 at 08:15:43PM +0800, Julian Sun wrote:
> > > > > > > > Hi Brian,
> > > > > > > > 
> > > > > > > > Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
> > > > > > > > > 
> > > > > > > > > On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun
> > > > > > > > > wrote:
> > > > > > > > > > Attempting to unshare extents beyond EOF will trigger
> > > > > > > > > > the need zeroing case, which in turn triggers a
> > > > > > > > > > warning.
> > > > > > > > > > Therefore, let's skip the unshare process if extents
> > > > > > > > > > are
> > > > > > > > > > beyond EOF.
> > > > > > > > > > 
> > > > > > > > > > Reported-and-tested-by:
> > > > > > > > > > syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > > > > > > > > > Closes:
> > > > > > > > > > https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf30
> > > > > > > > > > 6e5a0
> > > > > > > > > > Fixes: 32a38a499104 ("iomap: use write_begin to read
> > > > > > > > > > pages to unshare")
> > > > > > > > > > Inspired-by: Dave Chinner <david@fromorbit.com>
> > > > > > > > > > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > > > > > > > > > ---
> > > > > > > > > >  fs/iomap/buffered-io.c | 3 +++
> > > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/fs/iomap/buffered-io.c
> > > > > > > > > > b/fs/iomap/buffered-io.c
> > > > > > > > > > index f420c53d86ac..8898d5ec606f 100644
> > > > > > > > > > --- a/fs/iomap/buffered-io.c
> > > > > > > > > > +++ b/fs/iomap/buffered-io.c
> > > > > > > > > > @@ -1340,6 +1340,9 @@ static loff_t
> > > > > > > > > > iomap_unshare_iter(struct iomap_iter *iter)
> > > > > > > > > >       /* don't bother with holes or unwritten extents
> > > > > > > > > > */
> > > > > > > > > >       if (srcmap->type == IOMAP_HOLE || srcmap->type ==
> > > > > > > > > > IOMAP_UNWRITTEN)
> > > > > > > > > >               return length;
> > > > > > > > > > +     /* don't try to unshare any extents beyond EOF.
> > > > > > > > > > */
> > > > > > > > > > +     if (pos > i_size_read(iter->inode))
> > > > > > > > > > +             return length;
> > > > > > > > > 
> > > > > > > > > Hi Julian,
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > What about if pos starts within EOF and the operation
> > > > > > > > > > extends beyond it?
> > > > > > > > 
> > > > > > > > Extents within EOF will be unshared as usual. Details are
> > > > > > > > below.
> > > > > > > > > 
> > > > > > > > > > I ask because I think I've reproduced this scenario,
> > > > > > > > > > though it is a bit
> > > > > > > > > > tricky and has dependencies...
> > > > > > > > > > 
> > > > > > > > > > For one, it seems to depend on the cowblocks patch I
> > > > > > > > > > recently posted
> > > > > > > > > > here [1] (though I don't think this is necessarily a
> > > > > > > > > > problem with the
> > > > > > > > > > patch, it just depends on keeping COW fork blocks
> > > > > > > > > > around after the
> > > > > > > > > > unshare). With that, I reproduce via fsx with unshare
> > > > > > > > > > range support [2]
> > > > > > > > > > using the ops file appended below [3] on a -bsize=1k
> > > > > > > > > > XFS fs.
> > > > > > > > > > 
> > > > > > > > > > I haven't quite characterized the full sequence other
> > > > > > > > > > than it looks like
> > > > > > > > > > the unshare walks across EOF with a shared data fork
> > > > > > > > > > block and COW fork
> > > > > > > > > > delalloc, presumably finds the post-eof part of the
> > > > > > > > > > folio !uptodate (so
> > > > > > > > > > iomap_adjust_read_range() doesn't skip it), and then
> > > > > > > > > > trips over the
> > > > > > > > > > warning and error return associated with the folio
> > > > > > > > > > zeroing in
> > > > > > > > > > __iomap_write_begin().
> > > > > > > > 
> > > > > > > > The scenario has already been reproduced by syzbot[1]. The
> > > > > > > > reproducer
> > > > > > > > provided by syzbot constructed the following extent maps
> > > > > > > > for a file of
> > > > > > > > size 0xE00 before fallocate unshare:
> > > > > > > > 
> > > > > > > > 0 - 4k: shared between two files
> > > > > > > > 4k - 6k: hole beyond EOF, not shared
> > > > > > > > 6k - 8k: delalloc extends
> > > > > > > > 
> > > > > > > > Then the reproducer attempted to unshare the extent between
> > > > > > > > 0 and
> > > > > > > > 0x2000 bytes, but the file size is 0xE00. This is likely
> > > > > > > > the scenario
> > > > > > > > you were referring to?
> > > > > > > > 
> > > > > > > 
> > > > > > > Yes, sort of..
> > > > > > > 
> > > > > > > > Eventually the unshare code does:
> > > > > > > > first map: 0 - 4k - unshare successfully.
> > > > > > > > second map: 4k - 6k - hole, skip. Beyond EOF.
> > > > > > > > third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
> > > > > > > > Fires warnings because UNSHARE.
> > > > > > > > 
> > > > > > > > During the first call to iomap_unshare_iter(),
> > > > > > > > iomap_length() returned
> > > > > > > > 4k, so 4k bytes were unshared.
> > > > > > > > See discuss here[2] for more details.
> > > > > > > > > 
> > > > > > > > > This all kind of has me wondering.. do we know the
> > > > > > > > > purpose of this
> > > > > > > > > warning/error in the first place? It seems like it's more
> > > > > > > > > of an
> > > > > > > > > "unexpected situation" than a specific problem. E.g.,
> > > > > > > > > assuming the same
> > > > > > > > > page were mmap'd, I _think_ the read fault path would do
> > > > > > > > > the same sort
> > > > > > > > > of zeroing such that the unshare would see a fully
> > > > > > > > > uptodate folio and
> > > > > > > > > carry on as normal. I added the mapread op to the opsfile
> > > > > > > > > below to give
> > > > > > > > > that a quick test (remove the "skip" text to enable it),
> > > > > > > > > and it seems to
> > > > > > > > > prevent the error, but I've not confirmed whether that
> > > > > > > > > theory is
> > > > > > > > > actually what's happening.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > FWIW, I also wonder if another way to handle this would
> > > > > > > > > > be to just
> > > > > > > > > > restrict the range of iomap_file_unshare() to within
> > > > > > > > > > EOF. IOW if a
> > > > > > > > > > caller passes a range beyond EOF, just process whatever
> > > > > > > > > > part of the
> > > > > > > > > > range falls within EOF. It seems iomap isn't
> > > > > > > > > > responsible for the file
> > > > > > > > > > extending aspect of the fallocate unshare command
> > > > > > > > > > anyways.
> > > > > > > > 
> > > > > > > > It already does 'just process whatever part of the range
> > > > > > > > falls within EOF'.
> > > > > > > > Check the above case.
> > > > > > > > 
> > > > > > > > I'm not sure if I fully understand what you mean. This
> > > > > > > > patch does not
> > > > > > > > prevent unsharing extents within the EOF. This patch checks
> > > > > > > > if pos is
> > > > > > > > beyond EOF, instead of checking if pos + length is beyond
> > > > > > > > EOF. So the
> > > > > > > > extents within EOF should be unshared as usual.
> > > > > > > > 
> > > > > > > 
> > > > > > > I'm not concerned about preventing unsharing. I'm concerned
> > > > > > > that this
> > > > > > > patch doesn't always prevent attempts to unshare post-eof
> > > > > > > ranges. I
> > > > > > > think the difference here is that in the variant I was
> > > > > > > hitting, we end
> > > > > > > 
> > > > > > > > up with a mapping that starts within EOF and ends beyond
> > > > > > > > EOF, whereas
> > > > > > > > the syzbot variant produces a scenario where the
> > > > > > > > problematic mapping
> > > > > > > > always starts beyond EOF.
> > > > > > 
> > > > > > This is not true. In the case above, the syzbot did indeed
> > > > > > unshare the
> > > > > > extents between 0-4k, which were started within EOF and ended
> > > > > > beyond
> > > > > > EOF. The specific variants here are: pos:0 len:0x1000 EOF:
> > > > > > 0xE00. And
> > > > > > the unshare code successfully unshared extents between 0 and
> > > > > > 4k.
> > > > > > 
> > > > > > During the next loop in iomap_file_unshare(), the pos became
> > > > > > 0x1000,
> > > > > > which is beyond EOF.  What this patch does is to skip the
> > > > > > unshare
> > > > > > during the second loop.
> > > > > > Is there anything I misunderstand?
> > > > > 
> > > > > Hmm, what block size? Does the associated mapping have at least
> > > > > one full
> > > > > block beyond EOF? If you have a map at offset 0, length 0x1000
> > > > > and EOF
> > > > > at 0xE00, then unless you have 512b blocks it sounds like the EOF
> > > > > block
> > > > > actually starts within EOF.
> > > > 
> > > > The block size here is 2k, and there isn't a full block beyond EOF
> > > > within
> > > > this extent map.
> > > 
> > > Ok. That likely explains the difference in behavior. The fsx variant
> > > has
> > > a mapping that starts within EOF and has at least one post-EOF block
> > > that unshare attempts to process.
> > > 
> > > > > 
> > > > > The variant I'm seeing is more like this.. consider a -bsize=1k
> > > > > fs, a
> > > > > file size of 0x3c400, and an EOF mapping of (offset 0x3c000,
> > > > > length
> > > > > 0x4000). The EOF folio in this case is 4k in size and starts at
> > > > > the same
> > > > > 0x3c000 offset as the EOF mapping.
> > > > > 
> > > > > So with 1k blocks, the EOF mapping starts one block before EOF
> > > > > and
> > > > > extends well beyond it. What happens in the test case is that
> > > > > iomap_unshare_iter() is called with the EOF folio, pos 0x3c000,
> > > > > length
> > > > > 0x800, and where the block at offset 0x3c400 is not marked
> > > > > uptodate. pos
> > > > > is thus within EOF, but the while loop in __iomap_write_begin()
> > > > > walks
> > > > > past it and attempts to process one block beyond EOF.
> > > > 
> > > > Ok, so the key point here is that there is a full block beyond EOF
> > > > within
> > > > the associated extent map, which is different with the scenario
> > > > reproduced
> > > > by syzbot.
> > > > According to the Dave's comments, the trimming behavior seems like
> > > > should
> > > > be done in filesystem(e.g.,xfs), instead of iomap. I will
> > > > reconsider this scenario.
> > > > 
> > > 
> > > Seems reasonable, but I don't agree that is a suitable fix for iomap.
> > > To
> > > be clear, it's perfectly fine for the fs to trim the range however it
> > > sees fit (i.e. no shared blocks beyond EOF in XFS), but we should
> > > also
> > > recognize that iomap is a generic layer and unshare is currently
> > > implemented to trip over itself, warn and fail if passed a post-eof
> > > range.
> > > 
> > > 
> > > > I still suspect that just making unshare work correctly around
> > > > i_size
> > > > might be the more elegant long term solution, but that is not
> > > > totally
> > > > clear. IMO, as long as unshare is written to intentionally trip
> > > > over
> > > > itself for post-eof ranges, it should either trim the range or
> > > > check for
> > > > valid parameters and document the limitation. Otherwise we just
> > > > leave a
> > > > landmine for the next caller to have to work through the same
> > > > problems,
> > > > which is particularly subtle since the higher level fallocate
> > > > unshare
> > > > mode supports post-eof ranges. Just my .02.
> > 
> > Yeah, totally agreed. I prefer to do the trimming in the vfs layer
> > just like what generic_copy_file_checks() does, instead of a specific
> > file system. Maybe we need a helper function
> > generic_fallocate_checks(). But it requires more thought and testing.
> > 
> 
> I would agree with that if not for the subtle difference between
> fallocate unshare and the iomap implementation. fallocate unshare is
> basically just a behavior modifier for how shared blocks are handled by
> mode=0 preallocation, so AIUI it fully supports post-eof ranges. It can
> extend the file size or leave it unchanged (based on whether KEEP_SIZE
> is set) and preallocate beyond EOF.
> 
> IOW, it might be perfectly valid for a caller to run an fallocate
> unshare across EOF that happens to unshare various shared blocks within
> EOF, and then preallocate and extend the file size as part of the same
> command. The functional responsibility just happens to be split between
> iomap and the fs.

Yeah, precisely. It seems that fallocate unshare do support post-eof
ranges, but iomap_unshare_iter() doesn't expect any blocks beyond eof.
Maybe the semantics here is that any blocks beyond eof shouldn't be shared,
but I can not confirm that. Also, it's possible that blocks beyond eof
which may belong to an extent acrossing eof may be passed to
iomap_unshare_iter() in some cases. It seems we should figure out the
purpose of this warning. 

Hi, Christoph. Do you remember why the warning was introduced? Looking
forward to your reply after you return from vacation.


> 
> Brian
> 
> > > 
> > > Brian
> > > 
> > > > Thanks for your comments and review.
> > > > > 
> > > > > Brian
> > > > > 
> > > > > > > 
> > > > > > > So IOW, this patch works for the syzbot variant because it
> > > > > > > happens to
> > > > > > > reproduce a situation where pos will be beyond EOF, but that
> > > > > > > is an
> > > > > > > 
> > > > > > > > assumption that might not always be true. The fsx generated
> > > > > > > > variant runs
> > > > > > > > a sequence that produces a mapping that spans across EOF,
> > > > > > > > which means
> > > > > > > > that pos is within EOF at the start of unshare, so unshare
> > > > > > > > proceeds to
> > > > > > > > walk across the EOF boundary, the corresponding EOF folio
> > > > > > > > is not fully
> > > > > > > > uptodate, and thus write begin wants to do partial zeroing
> > > > > > > > and
> > > > > > > > fails/warns.
> > > > > > 
> > > > > > Yeah, it's exactly what the syzbot does.
> > > > > > > 
> > > > > > > I suspect that if the higher level range were trimmed to be
> > > > > > > within EOF
> > > > > > > in iomap_file_unshare(), that would prevent this problem in
> > > > > > > either case.
> > > > > > > Note that this was on a -bsize=1k fs, so what I'm not totally
> > > > > > > sure about
> > > > > > > is whether skipping zeroing as such would be a problem with
> > > > > > > larger FSBs.
> > > > > > > My initial thinking was this might not be possible since the
> > > > > > > EOF folio
> > > > > > > should be fully uptodate in that case, but that probably
> > > > > > > requires some
> > > > > > > thought/review/testing.
> > > > > > > 
> > > > > > > Brian
> > > > > > > 
> > > > > > > > BTW, maybe the check here should be
> > > > > > > >                   if (pos >= i_size_read(iter->inode))
> > > > > > > > 
> > > > > > > > If there is any misunderstanding, please let me know,
> > > > > > > > thanks.
> > > > > > > > 
> > > > > > > > [1]:
> > > > > > > > https://lore.kernel.org/all/0000000000008964f1061f8c32b6@go
> > > > > > > > ogle.com/T/
> > > > > > > > [2]: https://lore.kernel.org/all/20240903054808.126799-1-
> > > > > > > > sunjunchao2870@gmail.com/
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Thoughts?
> > > > > > > > > 
> > > > > > > > > Brian
> > > > > > > > > 
> > > > > > > > > [1] https://lore.kernel.org/linux-
> > > > > > > > > xfs/20240906114051.120743-1-bfoster@redhat.com/
> > > > > > > > > [2]
> > > > > > > > > https://lore.kernel.org/fstests/20240906185606.136402-1-
> > > > > > > > > bfoster@redhat.com/
> > > > > > > > > [3] fsx ops file:
> > > > > > > > > 
> > > > > > > > > fallocate 0x3bc00 0x400 0
> > > > > > > > > write 0x3bc00 0x800 0x3c000
> > > > > > > > > clone_range 0x3bc00 0x400 0x0 0x3c400
> > > > > > > > > skip mapread 0x3c000 0x400 0x3c400
> > > > > > > > > fallocate 0x3bc00 0xc00 0x3c400 unshare
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > --
> > > > > > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > Thanks,
> > > > > > --
> > > > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > > > 
> > > > > 
> > > > 
> > > > Thanks,
> > > > --
> > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > 
> > > 
> > 
> > 
> > -- 
> > Julian Sun <sunjunchao2870@gmail.com>
> > 
> 

Thanks,
Darrick J. Wong Sept. 17, 2024, 8:39 p.m. UTC | #12
On Wed, Sep 11, 2024 at 12:01:34PM +0800, Julian Sun wrote:
> On Tue, 2024-09-10 at 09:56 -0400, Brian Foster wrote:
> > On Tue, Sep 10, 2024 at 09:25:41PM +0800, Julian Sun wrote:
> > > Brian Foster <bfoster@redhat.com> 于2024年9月10日周二 20:29写道:
> > > > 
> > > > On Tue, Sep 10, 2024 at 03:03:18PM +0800, Julian Sun wrote:
> > > > > On Mon, 2024-09-09 at 15:29 -0400, Brian Foster wrote:
> > > > > > On Tue, Sep 10, 2024 at 01:40:24AM +0800, Julian Sun wrote:
> > > > > > > Brian Foster <bfoster@redhat.com> 于2024年9月9日周一 21:27写道:
> > > > > > > > 
> > > > > > > > On Mon, Sep 09, 2024 at 08:15:43PM +0800, Julian Sun wrote:
> > > > > > > > > Hi Brian,
> > > > > > > > > 
> > > > > > > > > Brian Foster <bfoster@redhat.com> 于2024年9月7日周六 03:11写道:
> > > > > > > > > > 
> > > > > > > > > > On Thu, Sep 05, 2024 at 06:24:24PM +0800, Julian Sun
> > > > > > > > > > wrote:
> > > > > > > > > > > Attempting to unshare extents beyond EOF will trigger
> > > > > > > > > > > the need zeroing case, which in turn triggers a
> > > > > > > > > > > warning.
> > > > > > > > > > > Therefore, let's skip the unshare process if extents
> > > > > > > > > > > are
> > > > > > > > > > > beyond EOF.
> > > > > > > > > > > 
> > > > > > > > > > > Reported-and-tested-by:
> > > > > > > > > > > syzbot+296b1c84b9cbf306e5a0@syzkaller.appspotmail.com
> > > > > > > > > > > Closes:
> > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=296b1c84b9cbf30
> > > > > > > > > > > 6e5a0
> > > > > > > > > > > Fixes: 32a38a499104 ("iomap: use write_begin to read
> > > > > > > > > > > pages to unshare")
> > > > > > > > > > > Inspired-by: Dave Chinner <david@fromorbit.com>
> > > > > > > > > > > Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  fs/iomap/buffered-io.c | 3 +++
> > > > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/fs/iomap/buffered-io.c
> > > > > > > > > > > b/fs/iomap/buffered-io.c
> > > > > > > > > > > index f420c53d86ac..8898d5ec606f 100644
> > > > > > > > > > > --- a/fs/iomap/buffered-io.c
> > > > > > > > > > > +++ b/fs/iomap/buffered-io.c
> > > > > > > > > > > @@ -1340,6 +1340,9 @@ static loff_t
> > > > > > > > > > > iomap_unshare_iter(struct iomap_iter *iter)
> > > > > > > > > > >       /* don't bother with holes or unwritten extents
> > > > > > > > > > > */
> > > > > > > > > > >       if (srcmap->type == IOMAP_HOLE || srcmap->type ==
> > > > > > > > > > > IOMAP_UNWRITTEN)
> > > > > > > > > > >               return length;
> > > > > > > > > > > +     /* don't try to unshare any extents beyond EOF.
> > > > > > > > > > > */
> > > > > > > > > > > +     if (pos > i_size_read(iter->inode))
> > > > > > > > > > > +             return length;
> > > > > > > > > > 
> > > > > > > > > > Hi Julian,
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > What about if pos starts within EOF and the operation
> > > > > > > > > > > extends beyond it?
> > > > > > > > > 
> > > > > > > > > Extents within EOF will be unshared as usual. Details are
> > > > > > > > > below.
> > > > > > > > > > 
> > > > > > > > > > > I ask because I think I've reproduced this scenario,
> > > > > > > > > > > though it is a bit
> > > > > > > > > > > tricky and has dependencies...
> > > > > > > > > > > 
> > > > > > > > > > > For one, it seems to depend on the cowblocks patch I
> > > > > > > > > > > recently posted
> > > > > > > > > > > here [1] (though I don't think this is necessarily a
> > > > > > > > > > > problem with the
> > > > > > > > > > > patch, it just depends on keeping COW fork blocks
> > > > > > > > > > > around after the
> > > > > > > > > > > unshare). With that, I reproduce via fsx with unshare
> > > > > > > > > > > range support [2]
> > > > > > > > > > > using the ops file appended below [3] on a -bsize=1k
> > > > > > > > > > > XFS fs.
> > > > > > > > > > > 
> > > > > > > > > > > I haven't quite characterized the full sequence other
> > > > > > > > > > > than it looks like
> > > > > > > > > > > the unshare walks across EOF with a shared data fork
> > > > > > > > > > > block and COW fork
> > > > > > > > > > > delalloc, presumably finds the post-eof part of the
> > > > > > > > > > > folio !uptodate (so
> > > > > > > > > > > iomap_adjust_read_range() doesn't skip it), and then
> > > > > > > > > > > trips over the
> > > > > > > > > > > warning and error return associated with the folio
> > > > > > > > > > > zeroing in
> > > > > > > > > > > __iomap_write_begin().
> > > > > > > > > 
> > > > > > > > > The scenario has already been reproduced by syzbot[1]. The
> > > > > > > > > reproducer
> > > > > > > > > provided by syzbot constructed the following extent maps
> > > > > > > > > for a file of
> > > > > > > > > size 0xE00 before fallocate unshare:
> > > > > > > > > 
> > > > > > > > > 0 - 4k: shared between two files
> > > > > > > > > 4k - 6k: hole beyond EOF, not shared
> > > > > > > > > 6k - 8k: delalloc extends
> > > > > > > > > 
> > > > > > > > > Then the reproducer attempted to unshare the extent between
> > > > > > > > > 0 and
> > > > > > > > > 0x2000 bytes, but the file size is 0xE00. This is likely
> > > > > > > > > the scenario
> > > > > > > > > you were referring to?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Yes, sort of..
> > > > > > > > 
> > > > > > > > > Eventually the unshare code does:
> > > > > > > > > first map: 0 - 4k - unshare successfully.
> > > > > > > > > second map: 4k - 6k - hole, skip. Beyond EOF.
> > > > > > > > > third map: 6k - 8k - delalloc, beyond EOF so needs zeroing.
> > > > > > > > > Fires warnings because UNSHARE.
> > > > > > > > > 
> > > > > > > > > During the first call to iomap_unshare_iter(),
> > > > > > > > > iomap_length() returned
> > > > > > > > > 4k, so 4k bytes were unshared.
> > > > > > > > > See discuss here[2] for more details.
> > > > > > > > > > 
> > > > > > > > > > This all kind of has me wondering.. do we know the
> > > > > > > > > > purpose of this
> > > > > > > > > > warning/error in the first place? It seems like it's more
> > > > > > > > > > of an
> > > > > > > > > > "unexpected situation" than a specific problem. E.g.,
> > > > > > > > > > assuming the same
> > > > > > > > > > page were mmap'd, I _think_ the read fault path would do
> > > > > > > > > > the same sort
> > > > > > > > > > of zeroing such that the unshare would see a fully
> > > > > > > > > > uptodate folio and
> > > > > > > > > > carry on as normal. I added the mapread op to the opsfile
> > > > > > > > > > below to give
> > > > > > > > > > that a quick test (remove the "skip" text to enable it),
> > > > > > > > > > and it seems to
> > > > > > > > > > prevent the error, but I've not confirmed whether that
> > > > > > > > > > theory is
> > > > > > > > > > actually what's happening.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > FWIW, I also wonder if another way to handle this would
> > > > > > > > > > > be to just
> > > > > > > > > > > restrict the range of iomap_file_unshare() to within
> > > > > > > > > > > EOF. IOW if a
> > > > > > > > > > > caller passes a range beyond EOF, just process whatever
> > > > > > > > > > > part of the
> > > > > > > > > > > range falls within EOF. It seems iomap isn't
> > > > > > > > > > > responsible for the file
> > > > > > > > > > > extending aspect of the fallocate unshare command
> > > > > > > > > > > anyways.
> > > > > > > > > 
> > > > > > > > > It already does 'just process whatever part of the range
> > > > > > > > > falls within EOF'.
> > > > > > > > > Check the above case.
> > > > > > > > > 
> > > > > > > > > I'm not sure if I fully understand what you mean. This
> > > > > > > > > patch does not
> > > > > > > > > prevent unsharing extents within the EOF. This patch checks
> > > > > > > > > if pos is
> > > > > > > > > beyond EOF, instead of checking if pos + length is beyond
> > > > > > > > > EOF. So the
> > > > > > > > > extents within EOF should be unshared as usual.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > I'm not concerned about preventing unsharing. I'm concerned
> > > > > > > > that this
> > > > > > > > patch doesn't always prevent attempts to unshare post-eof
> > > > > > > > ranges. I
> > > > > > > > think the difference here is that in the variant I was
> > > > > > > > hitting, we end
> > > > > > > > 
> > > > > > > > > up with a mapping that starts within EOF and ends beyond
> > > > > > > > > EOF, whereas
> > > > > > > > > the syzbot variant produces a scenario where the
> > > > > > > > > problematic mapping
> > > > > > > > > always starts beyond EOF.
> > > > > > > 
> > > > > > > This is not true. In the case above, the syzbot did indeed
> > > > > > > unshare the
> > > > > > > extents between 0-4k, which were started within EOF and ended
> > > > > > > beyond
> > > > > > > EOF. The specific variants here are: pos:0 len:0x1000 EOF:
> > > > > > > 0xE00. And
> > > > > > > the unshare code successfully unshared extents between 0 and
> > > > > > > 4k.
> > > > > > > 
> > > > > > > During the next loop in iomap_file_unshare(), the pos became
> > > > > > > 0x1000,
> > > > > > > which is beyond EOF.  What this patch does is to skip the
> > > > > > > unshare
> > > > > > > during the second loop.
> > > > > > > Is there anything I misunderstand?
> > > > > > 
> > > > > > Hmm, what block size? Does the associated mapping have at least
> > > > > > one full
> > > > > > block beyond EOF? If you have a map at offset 0, length 0x1000
> > > > > > and EOF
> > > > > > at 0xE00, then unless you have 512b blocks it sounds like the EOF
> > > > > > block
> > > > > > actually starts within EOF.
> > > > > 
> > > > > The block size here is 2k, and there isn't a full block beyond EOF
> > > > > within
> > > > > this extent map.
> > > > 
> > > > Ok. That likely explains the difference in behavior. The fsx variant
> > > > has
> > > > a mapping that starts within EOF and has at least one post-EOF block
> > > > that unshare attempts to process.
> > > > 
> > > > > > 
> > > > > > The variant I'm seeing is more like this.. consider a -bsize=1k
> > > > > > fs, a
> > > > > > file size of 0x3c400, and an EOF mapping of (offset 0x3c000,
> > > > > > length
> > > > > > 0x4000). The EOF folio in this case is 4k in size and starts at
> > > > > > the same
> > > > > > 0x3c000 offset as the EOF mapping.
> > > > > > 
> > > > > > So with 1k blocks, the EOF mapping starts one block before EOF
> > > > > > and
> > > > > > extends well beyond it. What happens in the test case is that
> > > > > > iomap_unshare_iter() is called with the EOF folio, pos 0x3c000,
> > > > > > length
> > > > > > 0x800, and where the block at offset 0x3c400 is not marked
> > > > > > uptodate. pos
> > > > > > is thus within EOF, but the while loop in __iomap_write_begin()
> > > > > > walks
> > > > > > past it and attempts to process one block beyond EOF.
> > > > > 
> > > > > Ok, so the key point here is that there is a full block beyond EOF
> > > > > within
> > > > > the associated extent map, which is different with the scenario
> > > > > reproduced
> > > > > by syzbot.
> > > > > According to the Dave's comments, the trimming behavior seems like
> > > > > should
> > > > > be done in filesystem(e.g.,xfs), instead of iomap. I will
> > > > > reconsider this scenario.
> > > > > 
> > > > 
> > > > Seems reasonable, but I don't agree that is a suitable fix for iomap.
> > > > To
> > > > be clear, it's perfectly fine for the fs to trim the range however it
> > > > sees fit (i.e. no shared blocks beyond EOF in XFS), but we should
> > > > also
> > > > recognize that iomap is a generic layer and unshare is currently
> > > > implemented to trip over itself, warn and fail if passed a post-eof
> > > > range.
> > > > 
> > > > 
> > > > > I still suspect that just making unshare work correctly around
> > > > > i_size
> > > > > might be the more elegant long term solution, but that is not
> > > > > totally
> > > > > clear. IMO, as long as unshare is written to intentionally trip
> > > > > over
> > > > > itself for post-eof ranges, it should either trim the range or
> > > > > check for
> > > > > valid parameters and document the limitation. Otherwise we just
> > > > > leave a
> > > > > landmine for the next caller to have to work through the same
> > > > > problems,
> > > > > which is particularly subtle since the higher level fallocate
> > > > > unshare
> > > > > mode supports post-eof ranges. Just my .02.
> > > 
> > > Yeah, totally agreed. I prefer to do the trimming in the vfs layer
> > > just like what generic_copy_file_checks() does, instead of a specific
> > > file system. Maybe we need a helper function
> > > generic_fallocate_checks(). But it requires more thought and testing.
> > > 
> > 
> > I would agree with that if not for the subtle difference between
> > fallocate unshare and the iomap implementation. fallocate unshare is
> > basically just a behavior modifier for how shared blocks are handled by
> > mode=0 preallocation, so AIUI it fully supports post-eof ranges. It can
> > extend the file size or leave it unchanged (based on whether KEEP_SIZE
> > is set) and preallocate beyond EOF.
> > 
> > IOW, it might be perfectly valid for a caller to run an fallocate
> > unshare across EOF that happens to unshare various shared blocks within
> > EOF, and then preallocate and extend the file size as part of the same
> > command. The functional responsibility just happens to be split between
> > iomap and the fs.
> 
> Yeah, precisely. It seems that fallocate unshare do support post-eof
> ranges,

Correct.

>         but iomap_unshare_iter() doesn't expect any blocks beyond eof.
> Maybe the semantics here is that any blocks beyond eof shouldn't be shared,

Correct, there cannot be blocks completely beyond EOF that are also
shared.

> but I can not confirm that. Also, it's possible that blocks beyond eof
> which may belong to an extent acrossing eof may be passed to
> iomap_unshare_iter() in some cases.

Yes, this is possible, but pos and len passed to iomap_file_unshare
should never cross EOF.

> iomap_unshare_iter() in some cases. It seems we should figure out the
> purpose of this warning. 
> 
> Hi, Christoph. Do you remember why the warning was introduced? Looking
> forward to your reply after you return from vacation.

I'm not hch, but hth.

--D

> 
> > 
> > Brian
> > 
> > > > 
> > > > Brian
> > > > 
> > > > > Thanks for your comments and review.
> > > > > > 
> > > > > > Brian
> > > > > > 
> > > > > > > > 
> > > > > > > > So IOW, this patch works for the syzbot variant because it
> > > > > > > > happens to
> > > > > > > > reproduce a situation where pos will be beyond EOF, but that
> > > > > > > > is an
> > > > > > > > 
> > > > > > > > > assumption that might not always be true. The fsx generated
> > > > > > > > > variant runs
> > > > > > > > > a sequence that produces a mapping that spans across EOF,
> > > > > > > > > which means
> > > > > > > > > that pos is within EOF at the start of unshare, so unshare
> > > > > > > > > proceeds to
> > > > > > > > > walk across the EOF boundary, the corresponding EOF folio
> > > > > > > > > is not fully
> > > > > > > > > uptodate, and thus write begin wants to do partial zeroing
> > > > > > > > > and
> > > > > > > > > fails/warns.
> > > > > > > 
> > > > > > > Yeah, it's exactly what the syzbot does.
> > > > > > > > 
> > > > > > > > I suspect that if the higher level range were trimmed to be
> > > > > > > > within EOF
> > > > > > > > in iomap_file_unshare(), that would prevent this problem in
> > > > > > > > either case.
> > > > > > > > Note that this was on a -bsize=1k fs, so what I'm not totally
> > > > > > > > sure about
> > > > > > > > is whether skipping zeroing as such would be a problem with
> > > > > > > > larger FSBs.
> > > > > > > > My initial thinking was this might not be possible since the
> > > > > > > > EOF folio
> > > > > > > > should be fully uptodate in that case, but that probably
> > > > > > > > requires some
> > > > > > > > thought/review/testing.
> > > > > > > > 
> > > > > > > > Brian
> > > > > > > > 
> > > > > > > > > BTW, maybe the check here should be
> > > > > > > > >                   if (pos >= i_size_read(iter->inode))
> > > > > > > > > 
> > > > > > > > > If there is any misunderstanding, please let me know,
> > > > > > > > > thanks.
> > > > > > > > > 
> > > > > > > > > [1]:
> > > > > > > > > https://lore.kernel.org/all/0000000000008964f1061f8c32b6@go
> > > > > > > > > ogle.com/T/
> > > > > > > > > [2]: https://lore.kernel.org/all/20240903054808.126799-1-
> > > > > > > > > sunjunchao2870@gmail.com/
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Thoughts?
> > > > > > > > > > 
> > > > > > > > > > Brian
> > > > > > > > > > 
> > > > > > > > > > [1] https://lore.kernel.org/linux-
> > > > > > > > > > xfs/20240906114051.120743-1-bfoster@redhat.com/
> > > > > > > > > > [2]
> > > > > > > > > > https://lore.kernel.org/fstests/20240906185606.136402-1-
> > > > > > > > > > bfoster@redhat.com/
> > > > > > > > > > [3] fsx ops file:
> > > > > > > > > > 
> > > > > > > > > > fallocate 0x3bc00 0x400 0
> > > > > > > > > > write 0x3bc00 0x800 0x3c000
> > > > > > > > > > clone_range 0x3bc00 0x400 0x0 0x3c400
> > > > > > > > > > skip mapread 0x3c000 0x400 0x3c400
> > > > > > > > > > fallocate 0x3bc00 0xc00 0x3c400 unshare
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > --
> > > > > > > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > --
> > > > > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > > > > 
> > > > > > 
> > > > > 
> > > > > Thanks,
> > > > > --
> > > > > Julian Sun <sunjunchao2870@gmail.com>
> > > > > 
> > > > 
> > > 
> > > 
> > > -- 
> > > Julian Sun <sunjunchao2870@gmail.com>
> > > 
> > 
> 
> Thanks,
> -- 
> Julian Sun <sunjunchao2870@gmail.com>
>
diff mbox series

Patch

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f420c53d86ac..8898d5ec606f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1340,6 +1340,9 @@  static loff_t iomap_unshare_iter(struct iomap_iter *iter)
 	/* don't bother with holes or unwritten extents */
 	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
 		return length;
+	/* don't try to unshare any extents beyond EOF. */
+	if (pos > i_size_read(iter->inode))
+		return length;
 
 	do {
 		struct folio *folio;