diff mbox series

Question about commit b450672fb66b ("iomap: sub-block dio needs to zeroout beyond EOF")

Message ID 8b1ba3a1-7ecc-6e1f-c944-26a51baa9747@huawei.com (mailing list archive)
State Accepted, archived
Headers show
Series Question about commit b450672fb66b ("iomap: sub-block dio needs to zeroout beyond EOF") | expand

Commit Message

Hou Tao May 17, 2019, 2:41 a.m. UTC
Hi,

I don't understand why the commit b450672fb66b ("iomap: sub-block dio needs to zeroout beyond EOF") is needed here:


If need_zeroout is false, it means the block neither is a unwritten block nor
a newly-mapped block, but that also means the block must had been a unwritten block
or a newly-mapped block before this write, so the block must have been zeroed, correct ?

It also introduces unnecessary sub-block zeroing if we repeat the same sub-block write.

I also have tried to reproduce the problem by using fsx as noted in the commit message,
but cann't reproduce it. Maybe I do it in the wrong way:

$ ./ltp/fsx -d -g H -H -z -C -I -w 1024 -F -r 1024 -t 4096 -Z /tmp/xfs/fsx

The XFS related with /tmp/xfs is formatted with "-b size=4096". I also try "-b size=1024",
but still no luck.

Could someone explain the scenario in which the extra block zeroing is needed ? Thanks.

Regards,
Tao

Comments

Dave Chinner May 17, 2019, 6:59 a.m. UTC | #1
On Fri, May 17, 2019 at 10:41:44AM +0800, Hou Tao wrote:
> Hi,
> 
> I don't understand why the commit b450672fb66b ("iomap: sub-block dio needs to zeroout beyond EOF") is needed here:
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 72f3864a2e6b..77c214194edf 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1677,7 +1677,14 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
>                 dio->submit.cookie = submit_bio(bio);
>         } while (nr_pages);
> 
> -       if (need_zeroout) {
> +       /*
> +        * We need to zeroout the tail of a sub-block write if the extent type
> +        * requires zeroing or the write extends beyond EOF. If we don't zero
> +        * the block tail in the latter case, we can expose stale data via mmap
> +        * reads of the EOF block.
> +        */
> +       if (need_zeroout ||
> +           ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) {
>                 /* zero out from the end of the write to the end of the block */
>                 pad = pos & (fs_block_size - 1);
>                 if (pad)
> 
> If need_zeroout is false, it means the block neither is a unwritten block nor
> a newly-mapped block, but that also means the block must had been a unwritten block
> or a newly-mapped block before this write, so the block must have been zeroed, correct ?

No. One the contrary, it's a direct IO write to beyond the end of
the file which means the block has not been zeroed at all. If it is
an unwritten extent, it most definitely does not contain zeroes
(unwritten extents are a flag in the extent metadata, not zeroed
disk space) and so it doesn't matter it is written or unwritten we
must zero it before we update the file size.

Why? Because if we then mmap the page that spans EOF, whatever is on
disk beyond EOF is exposed to the user process. Hence if we don't
zero the tail of the block beyond EOF during DIO writes then we can
leak stale information to unprivileged users....

Cheers,

Dave.
Hou Tao May 17, 2019, 12:56 p.m. UTC | #2
Hi,

On 2019/5/17 14:59, Dave Chinner wrote:
> On Fri, May 17, 2019 at 10:41:44AM +0800, Hou Tao wrote:
>> Hi,
>>
>> I don't understand why the commit b450672fb66b ("iomap: sub-block dio needs to zeroout beyond EOF") is needed here:
>>
>> diff --git a/fs/iomap.c b/fs/iomap.c
>> index 72f3864a2e6b..77c214194edf 100644
>> --- a/fs/iomap.c
>> +++ b/fs/iomap.c
>> @@ -1677,7 +1677,14 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
>>                 dio->submit.cookie = submit_bio(bio);
>>         } while (nr_pages);
>>
>> -       if (need_zeroout) {
>> +       /*
>> +        * We need to zeroout the tail of a sub-block write if the extent type
>> +        * requires zeroing or the write extends beyond EOF. If we don't zero
>> +        * the block tail in the latter case, we can expose stale data via mmap
>> +        * reads of the EOF block.
>> +        */
>> +       if (need_zeroout ||
>> +           ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) {
>>                 /* zero out from the end of the write to the end of the block */
>>                 pad = pos & (fs_block_size - 1);
>>                 if (pad)
>>
>> If need_zeroout is false, it means the block neither is a unwritten block nor
>> a newly-mapped block, but that also means the block must had been a unwritten block
>> or a newly-mapped block before this write, so the block must have been zeroed, correct ?
> 
> No. One the contrary, it's a direct IO write to beyond the end of
> the file which means the block has not been zeroed at all. If it is
> an unwritten extent, it most definitely does not contain zeroes
> (unwritten extents are a flag in the extent metadata, not zeroed
> disk space) and so it doesn't matter it is written or unwritten we
> must zero it before we update the file size.
> 
Ah, I still can not understand the reason why "the block has not been zeroed at all".
Do you mean the scenario in which the past-EOF write tries to write an already mapped
block and the past-EOF part of this block has not been zeroed ?

Because if the past-EOF write tries to write to a new allocated block, then IOMAP_F_NEW
must have been set in iomap->flags and need_zeroout will be true. If it tries to write
to an unwritten block, need_zeroout will also be true.

If it tries to write a block in which the past-EOF part has not been zeroed, even without
the past-EOF direct write, the data exposure is still possible, right ?
If not, could you please give a specific example on how this happens ? Thanks.

Regards,
Tao

> Why? Because if we then mmap the page that spans EOF, whatever is on
> disk beyond EOF is exposed to the user process. Hence if we don't
> zero the tail of the block beyond EOF during DIO writes then we can
> leak stale information to unprivileged users....
> 
> Cheers,
> 
> Dave.
>
Dave Chinner May 18, 2019, 3:10 a.m. UTC | #3
On Fri, May 17, 2019 at 08:56:35PM +0800, Hou Tao wrote:
> Hi,
> 
> On 2019/5/17 14:59, Dave Chinner wrote:
> > On Fri, May 17, 2019 at 10:41:44AM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> I don't understand why the commit b450672fb66b ("iomap: sub-block dio needs to zeroout beyond EOF") is needed here:
> >>
> >> diff --git a/fs/iomap.c b/fs/iomap.c
> >> index 72f3864a2e6b..77c214194edf 100644
> >> --- a/fs/iomap.c
> >> +++ b/fs/iomap.c
> >> @@ -1677,7 +1677,14 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
> >>                 dio->submit.cookie = submit_bio(bio);
> >>         } while (nr_pages);
> >>
> >> -       if (need_zeroout) {
> >> +       /*
> >> +        * We need to zeroout the tail of a sub-block write if the extent type
> >> +        * requires zeroing or the write extends beyond EOF. If we don't zero
> >> +        * the block tail in the latter case, we can expose stale data via mmap
> >> +        * reads of the EOF block.
> >> +        */
> >> +       if (need_zeroout ||
> >> +           ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) {
> >>                 /* zero out from the end of the write to the end of the block */
> >>                 pad = pos & (fs_block_size - 1);
> >>                 if (pad)
> >>
> >> If need_zeroout is false, it means the block neither is a unwritten block nor
> >> a newly-mapped block, but that also means the block must had been a unwritten block
> >> or a newly-mapped block before this write, so the block must have been zeroed, correct ?
> > 
> > No. One the contrary, it's a direct IO write to beyond the end of
> > the file which means the block has not been zeroed at all. If it is
> > an unwritten extent, it most definitely does not contain zeroes
> > (unwritten extents are a flag in the extent metadata, not zeroed
> > disk space) and so it doesn't matter it is written or unwritten we
> > must zero it before we update the file size.
> > 
> Ah, I still can not understand the reason why "the block has not been zeroed at all".
> Do you mean the scenario in which the past-EOF write tries to write an already mapped
> block and the past-EOF part of this block has not been zeroed ?

Yup. Speculative delayed allocation beyond EOF triggered by mmap()
or buffered writes() that has then been allocated by writeback, then
we do a dio that extends EOF into that space.

> Because if the past-EOF write tries to write to a new allocated block, then IOMAP_F_NEW
> must have been set in iomap->flags and need_zeroout will be true. If it tries to write
> to an unwritten block, need_zeroout will also be true.

But if it tries to write to an already allocated, written block,
need_zeroout is not true and it will need to zero.
> 
> If it tries to write a block in which the past-EOF part has not been zeroed, even without
> the past-EOF direct write, the data exposure is still possible, right ?

Not if we zero both sides of the dio correctly before we extend the
file size on disk.

Cheers,

Dave.
diff mbox series

Patch

diff --git a/fs/iomap.c b/fs/iomap.c
index 72f3864a2e6b..77c214194edf 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1677,7 +1677,14 @@  iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
                dio->submit.cookie = submit_bio(bio);
        } while (nr_pages);

-       if (need_zeroout) {
+       /*
+        * We need to zeroout the tail of a sub-block write if the extent type
+        * requires zeroing or the write extends beyond EOF. If we don't zero
+        * the block tail in the latter case, we can expose stale data via mmap
+        * reads of the EOF block.
+        */
+       if (need_zeroout ||
+           ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) {
                /* zero out from the end of the write to the end of the block */
                pad = pos & (fs_block_size - 1);
                if (pad)