Message ID | 20190731141245.7230-5-cmaiolino@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | New ->fiemap infrastructure and ->bmap removal | expand |
On Wed, Jul 31, 2019 at 04:12:40PM +0200, Carlos Maiolino wrote: > Now we have the possibility of proper error return in bmap, use bmap() > function in ioctl_fibmap() instead of calling ->bmap method directly. > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > --- > > Changelog: > > V4: > - Ensure ioctl_fibmap() returns 0 in case of error returned from > bmap(). Otherwise we'll be changing the user interface (which > returns 0 in case of error) > V3: > - Rename usr_blk to ur_block > > V2: > - Use a local sector_t variable to asign the block number > instead of using direct casting. > > fs/ioctl.c | 29 +++++++++++++++++++---------- > 1 file changed, 19 insertions(+), 10 deletions(-) > > diff --git a/fs/ioctl.c b/fs/ioctl.c > index fef3a6bf7c78..6b589c873bc2 100644 > --- a/fs/ioctl.c > +++ b/fs/ioctl.c > @@ -53,19 +53,28 @@ EXPORT_SYMBOL(vfs_ioctl); > > static int ioctl_fibmap(struct file *filp, int __user *p) > { > - struct address_space *mapping = filp->f_mapping; > - int res, block; > + struct inode *inode = file_inode(filp); > + int error, ur_block; > + sector_t block; > > - /* do we support this mess? */ > - if (!mapping->a_ops->bmap) > - return -EINVAL; > if (!capable(CAP_SYS_RAWIO)) > return -EPERM; > - res = get_user(block, p); > - if (res) > - return res; > - res = mapping->a_ops->bmap(mapping, block); > - return put_user(res, p); > + > + error = get_user(ur_block, p); > + if (error) > + return error; > + > + block = ur_block; > + error = bmap(inode, &block); > + > + if (error) > + ur_block = 0; > + else > + ur_block = block; What happens if ur_block > INT_MAX? Shouldn't we return zero (i.e. error) instead of truncating the value? Maybe the code does this somewhere else? Here seemed like the obvious place for an overflow check as we go from sector_t to int. --D > + > + error = put_user(ur_block, p); > + > + return error; > } > > /** > -- > 2.20.1 >
Hi Darrick. > > + return error; > > + > > + block = ur_block; > > + error = bmap(inode, &block); > > + > > + if (error) > > + ur_block = 0; > > + else > > + ur_block = block; > > What happens if ur_block > INT_MAX? Shouldn't we return zero (i.e. > error) instead of truncating the value? Maybe the code does this > somewhere else? Here seemed like the obvious place for an overflow > check as we go from sector_t to int. > The behavior should still be the same. It will get truncated, unfortunately. I don't think we can actually change this behavior and return zero instead of truncating it. > --D > > > + > > + error = put_user(ur_block, p); > > + > > + return error; > > } > > > > /** > > -- > > 2.20.1 > >
On Fri, Aug 02, 2019 at 11:19:39AM +0200, Carlos Maiolino wrote: > Hi Darrick. > > > > + return error; > > > + > > > + block = ur_block; > > > + error = bmap(inode, &block); > > > + > > > + if (error) > > > + ur_block = 0; > > > + else > > > + ur_block = block; > > > > What happens if ur_block > INT_MAX? Shouldn't we return zero (i.e. > > error) instead of truncating the value? Maybe the code does this > > somewhere else? Here seemed like the obvious place for an overflow > > check as we go from sector_t to int. > > > > The behavior should still be the same. It will get truncated, unfortunately. I > don't think we can actually change this behavior and return zero instead of > truncating it. But that's even worse, because the programs that rely on FIBMAP will now receive *incorrect* results that may point at a different file and definitely do not point at the correct file block. Note also that the iomap (and therefore xfs) implementation WARNs on integer overflow and returns 0 (error) to prevent an incorrect access. --D > > --D > > > > > + > > > + error = put_user(ur_block, p); > > > + > > > + return error; > > > } > > > > > > /** > > > -- > > > 2.20.1 > > > > > -- > Carlos
On Fri, Aug 02, 2019 at 08:14:00AM -0700, Darrick J. Wong wrote: > On Fri, Aug 02, 2019 at 11:19:39AM +0200, Carlos Maiolino wrote: > > Hi Darrick. > > > > > > + return error; > > > > + > > > > + block = ur_block; > > > > + error = bmap(inode, &block); > > > > + > > > > + if (error) > > > > + ur_block = 0; > > > > + else > > > > + ur_block = block; > > > > > > What happens if ur_block > INT_MAX? Shouldn't we return zero (i.e. > > > error) instead of truncating the value? Maybe the code does this > > > somewhere else? Here seemed like the obvious place for an overflow > > > check as we go from sector_t to int. > > > > > > > The behavior should still be the same. It will get truncated, unfortunately. I > > don't think we can actually change this behavior and return zero instead of > > truncating it. > > But that's even worse, because the programs that rely on FIBMAP will now > receive *incorrect* results that may point at a different file and > definitely do not point at the correct file block. How is this worse? This is exactly what happens today, on the original FIBMAP implementation. Maybe I am not seeing something or having a different thinking you have, but this is the behavior we have now, without my patches. And we can't really change it; the user view of this implementation. That's why I didn't try to change the result, so the truncation still happens. > > Note also that the iomap (and therefore xfs) implementation WARNs on > integer overflow and returns 0 (error) to prevent an incorrect access. It does not really prevent anything. It just issue a warning saying the result will be truncated, in an attempt to notify the FIBMAP interface user that he/she can't trust the result, but it does not prevent a truncated result to be returned. And IIRC, iomap is the only interface now that cares about issuing a warning. I think the *best* we could do here, is to make the new bmap() to issue the same kind of WARN() iomap does, but we can't really change the end result. > > --D > > > > --D > > > > > > > + > > > > + error = put_user(ur_block, p); > > > > + > > > > + return error; > > > > } > > > > > > > > /** > > > > -- > > > > 2.20.1 > > > > > > > > -- > > Carlos
On Mon, Aug 05, 2019 at 12:27:30PM +0200, Carlos Maiolino wrote: > On Fri, Aug 02, 2019 at 08:14:00AM -0700, Darrick J. Wong wrote: > > On Fri, Aug 02, 2019 at 11:19:39AM +0200, Carlos Maiolino wrote: > > > Hi Darrick. > > > > > > > > + return error; > > > > > + > > > > > + block = ur_block; > > > > > + error = bmap(inode, &block); > > > > > + > > > > > + if (error) > > > > > + ur_block = 0; > > > > > + else > > > > > + ur_block = block; > > > > > > > > What happens if ur_block > INT_MAX? Shouldn't we return zero (i.e. > > > > error) instead of truncating the value? Maybe the code does this > > > > somewhere else? Here seemed like the obvious place for an overflow > > > > check as we go from sector_t to int. > > > > > > > > > > The behavior should still be the same. It will get truncated, unfortunately. I > > > don't think we can actually change this behavior and return zero instead of > > > truncating it. > > > > But that's even worse, because the programs that rely on FIBMAP will now > > receive *incorrect* results that may point at a different file and > > definitely do not point at the correct file block. > > How is this worse? This is exactly what happens today, on the original FIBMAP > implementation. Ok, I wasn't being 110% careful with my words. Delete "will now" from the sentence above. > Maybe I am not seeing something or having a different thinking you have, but > this is the behavior we have now, without my patches. And we can't really change > it; the user view of this implementation. > That's why I didn't try to change the result, so the truncation still happens. I understand that we're not generally supposed to change existing userspace interfaces, but the fact remains that allowing truncated responses causes *filesystem corruption*. We know that the most well known FIBMAP callers are bootloaders, and we know what they do with the information they get -- they use it to record the block map of boot files. So if the IPL/grub/whatever installer queries the boot file and the boot file is at block 12345678901 (a 34-bit number), this interface truncates that to 3755744309 (a 32-bit number) and that's where the bootloader will think its boot files are. The installation succeeds, the user reboots and *kaboom* the system no longer boots because the contents of block 3755744309 is not a bootloader. Worse yet, grub1 used FIBMAP data to record the location of the grub environment file and installed itself between the MBR and the start of partition 1. If the environment file is at offset 1234578901, grub will write status data to its environment file (which it thinks is at 3755744309) and *KABOOM* we've just destroyed whatever was in that block. Far better for the bootloader installation script to hit an error and force the admin to deal with the situation than for the system to become unbootable. That's *why* the (newer) iomap bmap implementation does not return truncated mappings, even though the classic implementation does. The classic code returning truncated results is a broken behavior. > > Note also that the iomap (and therefore xfs) implementation WARNs on > > integer overflow and returns 0 (error) to prevent an incorrect access. > > It does not really prevent anything. It just issue a warning saying the result > will be truncated, in an attempt to notify the FIBMAP interface user that he/she > can't trust the result, but it does not prevent a truncated result to be I disagree; the iomap bmap implementation /does/ prevent truncated responses: : static loff_t : iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length, : void *data, struct iomap *iomap) : { : sector_t *bno = data, addr; : : if (iomap->type == IOMAP_MAPPED) { : addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits; : if (addr > INT_MAX) : WARN(1, "would truncate bmap result\n"); Notice how we don't set *bno here? : else : *bno = addr; And only set it in the case that there isn't an integer overflow? : } : return 0; : } : : /* legacy ->bmap interface. 0 is the error return (!) */ : sector_t : iomap_bmap(struct address_space *mapping, sector_t bno, : const struct iomap_ops *ops) : { : struct inode *inode = mapping->host; : loff_t pos = bno << inode->i_blkbits; : unsigned blocksize = i_blocksize(inode); : : if (filemap_write_and_wait(mapping)) : return 0; : : bno = 0; We initialize bno to zero here... : iomap_apply(inode, pos, blocksize, 0, ops, &bno, iomap_bmap_actor); ...then pass bno's address to the apply function to pass to iomap_bmap_actor, so either the _actor function set bno or in the case of overflow it left it set to zero. : return bno; : } > returned. And IIRC, iomap is the only interface now that cares about issuing a > warning. > > I think the *best* we could do here, is to make the new bmap() to issue the same > kind of WARN() iomap does, but we can't really change the end result. I'd rather we break legacy code than corrupt filesystems. --D > > > > > --D > > > > > > --D > > > > > > > > > + > > > > > + error = put_user(ur_block, p); > > > > > + > > > > > + return error; > > > > > } > > > > > > > > > > /** > > > > > -- > > > > > 2.20.1 > > > > > > > > > > > -- > > > Carlos > > -- > Carlos
On Mon, Aug 05, 2019 at 08:12:58AM -0700, Darrick J. Wong wrote: > > returned. And IIRC, iomap is the only interface now that cares about issuing a > > warning. > > > > I think the *best* we could do here, is to make the new bmap() to issue the same > > kind of WARN() iomap does, but we can't really change the end result. > > I'd rather we break legacy code than corrupt filesystems. This particular patch should keep existing behavior as is, as the intent is to not change functionality. Throwing in another patch to have saner error behavior now that we have a saner in-kernel interface that cleary documents what it is breaking and why on the other hand sounds like a very good idea.
I'll keep my reply to your email short here, because hch's follow-up email summarizes what I wanted to say, so I'll reply to his email directly. > : return bno; > : } > > > returned. And IIRC, iomap is the only interface now that cares about issuing a > > warning. > > > > I think the *best* we could do here, is to make the new bmap() to issue the same > > kind of WARN() iomap does, but we can't really change the end result. > > I'd rather we break legacy code than corrupt filesystems. By now, I wish to apologize Darrick, it was my fault to not pay enough attention to iomap's bmap implementation as you mentioned. Will keep the discussion topic on next e-mail.
On Tue, Aug 06, 2019 at 07:38:40AM +0200, Christoph Hellwig wrote: > On Mon, Aug 05, 2019 at 08:12:58AM -0700, Darrick J. Wong wrote: > > > returned. And IIRC, iomap is the only interface now that cares about issuing a > > > warning. > > > > > > I think the *best* we could do here, is to make the new bmap() to issue the same > > > kind of WARN() iomap does, but we can't really change the end result. > > > > I'd rather we break legacy code than corrupt filesystems. > Yes, I have the same feeling, but this patchset does not have the goal to fix the broken api. > This particular patch should keep existing behavior as is, as the intent > is to not change functionality. Throwing in another patch to have saner > error behavior now that we have a saner in-kernel interface that cleary > documents what it is breaking and why on the other hand sounds like a > very good idea. I totally agree here, and to be honest, I think such change should be in a different patchset rather than a new patch in this series. I can do it for sure, but this discussion IMHO should be done not only here in linux-fsdevel, but also in linux-api, which well, I don't think cc'ing this whole patchset there will do any good other than keep the change discussion more complicated than it should be. I'd rather finish the design and implementation of this patchset, and I'll follow-up it, once it's all set, with a new patch to change the truncation behavior, it will make the discussion way easier than mixing up subjects. What you guys think? Cheers
On Tue, Aug 06, 2019 at 02:07:24PM +0200, Carlos Maiolino wrote: > On Tue, Aug 06, 2019 at 07:38:40AM +0200, Christoph Hellwig wrote: > > On Mon, Aug 05, 2019 at 08:12:58AM -0700, Darrick J. Wong wrote: > > > > returned. And IIRC, iomap is the only interface now that cares about issuing a > > > > warning. > > > > > > > > I think the *best* we could do here, is to make the new bmap() to issue the same > > > > kind of WARN() iomap does, but we can't really change the end result. > > > > > > I'd rather we break legacy code than corrupt filesystems. > > > > Yes, I have the same feeling, but this patchset does not have the goal to fix > the broken api. > > > This particular patch should keep existing behavior as is, as the intent > > is to not change functionality. Throwing in another patch to have saner > > error behavior now that we have a saner in-kernel interface that cleary > > documents what it is breaking and why on the other hand sounds like a > > very good idea. > > I totally agree here, and to be honest, I think such change should be in a > different patchset rather than a new patch in this series. I can do it for sure, > but this discussion IMHO should be done not only here in linux-fsdevel, but also > in linux-api, which well, I don't think cc'ing this whole patchset there will do > any good other than keep the change discussion more complicated than it should > be. I'd rather finish the design and implementation of this patchset, and I'll > follow-up it, once it's all set, with a new patch to change the truncation > behavior, it will make the discussion way easier than mixing up subjects. What > you guys think? I probably would've fixed the truncation behavior in the old code and based the fiemap-fibmap conversion on that so that anyone who wants to backport the behavior change to an old kernel has an easier time of it. But afterwards probably works just as well since I don't feel like tying ourselves in more knots over an old interface. ;) --D > Cheers > > > -- > Carlos
On Mon, Aug 05, 2019 at 08:12:58AM -0700, Darrick J. Wong wrote: > On Mon, Aug 05, 2019 at 12:27:30PM +0200, Carlos Maiolino wrote: > > On Fri, Aug 02, 2019 at 08:14:00AM -0700, Darrick J. Wong wrote: > > > On Fri, Aug 02, 2019 at 11:19:39AM +0200, Carlos Maiolino wrote: > > > > Hi Darrick. > > > > > > > > > > + return error; > > > > > > + > > > > > > + block = ur_block; > > > > > > + error = bmap(inode, &block); > > > > > > + > > > > > > + if (error) > > > > > > + ur_block = 0; > > > > > > + else > > > > > > + ur_block = block; > > > > > > > > > > What happens if ur_block > INT_MAX? Shouldn't we return zero (i.e. > > > > > error) instead of truncating the value? Maybe the code does this > > > > > somewhere else? Here seemed like the obvious place for an overflow > > > > > check as we go from sector_t to int. > > > > > > > > > > > > > The behavior should still be the same. It will get truncated, unfortunately. I > > > > don't think we can actually change this behavior and return zero instead of > > > > truncating it. > > > > > > But that's even worse, because the programs that rely on FIBMAP will now > > > receive *incorrect* results that may point at a different file and > > > definitely do not point at the correct file block. > > > > How is this worse? This is exactly what happens today, on the original FIBMAP > > implementation. > > Ok, I wasn't being 110% careful with my words. Delete "will now" from > the sentence above. > > > Maybe I am not seeing something or having a different thinking you have, but > > this is the behavior we have now, without my patches. And we can't really change > > it; the user view of this implementation. > > That's why I didn't try to change the result, so the truncation still happens. > > I understand that we're not generally supposed to change existing > userspace interfaces, but the fact remains that allowing truncated > responses causes *filesystem corruption*. > > We know that the most well known FIBMAP callers are bootloaders, and we > know what they do with the information they get -- they use it to record > the block map of boot files. So if the IPL/grub/whatever installer > queries the boot file and the boot file is at block 12345678901 (a > 34-bit number), this interface truncates that to 3755744309 (a 32-bit > number) and that's where the bootloader will think its boot files are. > The installation succeeds, the user reboots and *kaboom* the system no > longer boots because the contents of block 3755744309 is not a bootloader. > > Worse yet, grub1 used FIBMAP data to record the location of the grub > environment file and installed itself between the MBR and the start of > partition 1. If the environment file is at offset 1234578901, grub will > write status data to its environment file (which it thinks is at > 3755744309) and *KABOOM* we've just destroyed whatever was in that > block. > > Far better for the bootloader installation script to hit an error and > force the admin to deal with the situation than for the system to become > unbootable. That's *why* the (newer) iomap bmap implementation does not > return truncated mappings, even though the classic implementation does. > > The classic code returning truncated results is a broken behavior. How long as it been broken for? And if we do fix it, I'd just like for a nice commit lot describing potential risks of not applying it. *If* the issue exists as-is today, the above contains a lot of information for addressing potential issues, even if theoretical. Luis
On Tue, Aug 06, 2019 at 10:41:38PM +0000, Luis Chamberlain wrote: > On Mon, Aug 05, 2019 at 08:12:58AM -0700, Darrick J. Wong wrote: > > On Mon, Aug 05, 2019 at 12:27:30PM +0200, Carlos Maiolino wrote: > > > On Fri, Aug 02, 2019 at 08:14:00AM -0700, Darrick J. Wong wrote: > > > > On Fri, Aug 02, 2019 at 11:19:39AM +0200, Carlos Maiolino wrote: > > > > > Hi Darrick. > > > > > > > > > > > > + return error; > > > > > > > + > > > > > > > + block = ur_block; > > > > > > > + error = bmap(inode, &block); > > > > > > > + > > > > > > > + if (error) > > > > > > > + ur_block = 0; > > > > > > > + else > > > > > > > + ur_block = block; > > > > > > > > > > > > What happens if ur_block > INT_MAX? Shouldn't we return zero (i.e. > > > > > > error) instead of truncating the value? Maybe the code does this > > > > > > somewhere else? Here seemed like the obvious place for an overflow > > > > > > check as we go from sector_t to int. > > > > > > > > > > > > > > > > The behavior should still be the same. It will get truncated, unfortunately. I > > > > > don't think we can actually change this behavior and return zero instead of > > > > > truncating it. > > > > > > > > But that's even worse, because the programs that rely on FIBMAP will now > > > > receive *incorrect* results that may point at a different file and > > > > definitely do not point at the correct file block. > > > > > > How is this worse? This is exactly what happens today, on the original FIBMAP > > > implementation. > > > > Ok, I wasn't being 110% careful with my words. Delete "will now" from > > the sentence above. > > > > > Maybe I am not seeing something or having a different thinking you have, but > > > this is the behavior we have now, without my patches. And we can't really change > > > it; the user view of this implementation. > > > That's why I didn't try to change the result, so the truncation still happens. > > > > I understand that we're not generally supposed to change existing > > userspace interfaces, but the fact remains that allowing truncated > > responses causes *filesystem corruption*. > > > > We know that the most well known FIBMAP callers are bootloaders, and we > > know what they do with the information they get -- they use it to record > > the block map of boot files. So if the IPL/grub/whatever installer > > queries the boot file and the boot file is at block 12345678901 (a > > 34-bit number), this interface truncates that to 3755744309 (a 32-bit > > number) and that's where the bootloader will think its boot files are. > > The installation succeeds, the user reboots and *kaboom* the system no > > longer boots because the contents of block 3755744309 is not a bootloader. > > > > Worse yet, grub1 used FIBMAP data to record the location of the grub > > environment file and installed itself between the MBR and the start of > > partition 1. If the environment file is at offset 1234578901, grub will > > write status data to its environment file (which it thinks is at > > 3755744309) and *KABOOM* we've just destroyed whatever was in that > > block. > > > > Far better for the bootloader installation script to hit an error and > > force the admin to deal with the situation than for the system to become > > unbootable. That's *why* the (newer) iomap bmap implementation does not > > return truncated mappings, even though the classic implementation does. > > > > The classic code returning truncated results is a broken behavior. > > How long as it been broken for? Probably since the beginning (ext2). > And if we do fix it, I'd just like for > a nice commit lot describing potential risks of not applying it. *If* > the issue exists as-is today, the above contains a lot of information > for addressing potential issues, even if theoretical. I think a lot of the filesystems avoid the problem either by not supporting > INT_MAX blocks in the first place or by detecting the truncation in the fs-specific ->bmap method, so that might be why we haven't been deluged by corruption reports. --D > Luis
> > > > > Maybe I am not seeing something or having a different thinking you have, but > > > this is the behavior we have now, without my patches. And we can't really change > > > it; the user view of this implementation. > > > That's why I didn't try to change the result, so the truncation still happens. > > > > I understand that we're not generally supposed to change existing > > userspace interfaces, but the fact remains that allowing truncated > > responses causes *filesystem corruption*. > > > > We know that the most well known FIBMAP callers are bootloaders, and we > > know what they do with the information they get -- they use it to record > > the block map of boot files. So if the IPL/grub/whatever installer > > queries the boot file and the boot file is at block 12345678901 (a > > 34-bit number), this interface truncates that to 3755744309 (a 32-bit > > number) and that's where the bootloader will think its boot files are. > > The installation succeeds, the user reboots and *kaboom* the system no > > longer boots because the contents of block 3755744309 is not a bootloader. > > > > Worse yet, grub1 used FIBMAP data to record the location of the grub > > environment file and installed itself between the MBR and the start of > > partition 1. If the environment file is at offset 1234578901, grub will > > write status data to its environment file (which it thinks is at > > 3755744309) and *KABOOM* we've just destroyed whatever was in that > > block. > > > > Far better for the bootloader installation script to hit an error and > > force the admin to deal with the situation than for the system to become > > unbootable. That's *why* the (newer) iomap bmap implementation does not > > return truncated mappings, even though the classic implementation does. > > > > The classic code returning truncated results is a broken behavior. > > How long as it been broken for? And if we do fix it, I'd just like for > a nice commit lot describing potential risks of not applying it. *If* > the issue exists as-is today, the above contains a lot of information > for addressing potential issues, even if theoretical. > It's broken since forever. This has always been the FIBMAP behavior. > Luis
On Tue, Aug 06, 2019 at 07:48:00AM -0700, Darrick J. Wong wrote: > On Tue, Aug 06, 2019 at 02:07:24PM +0200, Carlos Maiolino wrote: > > On Tue, Aug 06, 2019 at 07:38:40AM +0200, Christoph Hellwig wrote: > > > On Mon, Aug 05, 2019 at 08:12:58AM -0700, Darrick J. Wong wrote: > > > > > returned. And IIRC, iomap is the only interface now that cares about issuing a > > > > > warning. > > > > > > > > > > I think the *best* we could do here, is to make the new bmap() to issue the same > > > > > kind of WARN() iomap does, but we can't really change the end result. > > > > > > > > I'd rather we break legacy code than corrupt filesystems. > > > > > > > Yes, I have the same feeling, but this patchset does not have the goal to fix > > the broken api. > > > > > This particular patch should keep existing behavior as is, as the intent > > > is to not change functionality. Throwing in another patch to have saner > > > error behavior now that we have a saner in-kernel interface that cleary > > > documents what it is breaking and why on the other hand sounds like a > > > very good idea. > > > > I totally agree here, and to be honest, I think such change should be in a > > different patchset rather than a new patch in this series. I can do it for sure, > > but this discussion IMHO should be done not only here in linux-fsdevel, but also > > in linux-api, which well, I don't think cc'ing this whole patchset there will do > > any good other than keep the change discussion more complicated than it should > > be. I'd rather finish the design and implementation of this patchset, and I'll > > follow-up it, once it's all set, with a new patch to change the truncation > > behavior, it will make the discussion way easier than mixing up subjects. What > > you guys think? > > I probably would've fixed the truncation behavior in the old code and > based the fiemap-fibmap conversion on that so that anyone who wants to > backport the behavior change to an old kernel has an easier time of it. > Well, another problem in fixing it in the old code, is that bmap() can't properly return errors :P After this patchset, bmap() will be able to return errors, so we can easily fix it, once we won't need to 'guess' what a zero return mean from bmap() > But afterwards probably works just as well since I don't feel like tying > ourselves in more knots over an old interface. ;) > > --D > > > Cheers > > > > > > -- > > Carlos
On Aug 8, 2019, at 1:12 AM, Carlos Maiolino <cmaiolino@redhat.com> wrote: > >>> >>>> Maybe I am not seeing something or having a different thinking you have, but >>>> this is the behavior we have now, without my patches. And we can't really change >>>> it; the user view of this implementation. >>>> That's why I didn't try to change the result, so the truncation still happens. >>> >>> I understand that we're not generally supposed to change existing >>> userspace interfaces, but the fact remains that allowing truncated >>> responses causes *filesystem corruption*. >>> >>> We know that the most well known FIBMAP callers are bootloaders, and we >>> know what they do with the information they get -- they use it to record >>> the block map of boot files. So if the IPL/grub/whatever installer >>> queries the boot file and the boot file is at block 12345678901 (a >>> 34-bit number), this interface truncates that to 3755744309 (a 32-bit >>> number) and that's where the bootloader will think its boot files are. >>> The installation succeeds, the user reboots and *kaboom* the system no >>> longer boots because the contents of block 3755744309 is not a bootloader. >>> >>> Worse yet, grub1 used FIBMAP data to record the location of the grub >>> environment file and installed itself between the MBR and the start of >>> partition 1. If the environment file is at offset 1234578901, grub will >>> write status data to its environment file (which it thinks is at >>> 3755744309) and *KABOOM* we've just destroyed whatever was in that >>> block. >>> >>> Far better for the bootloader installation script to hit an error and >>> force the admin to deal with the situation than for the system to become >>> unbootable. That's *why* the (newer) iomap bmap implementation does not >>> return truncated mappings, even though the classic implementation does. >>> >>> The classic code returning truncated results is a broken behavior. >> >> How long as it been broken for? And if we do fix it, I'd just like for >> a nice commit lot describing potential risks of not applying it. *If* >> the issue exists as-is today, the above contains a lot of information >> for addressing potential issues, even if theoretical. >> > > It's broken since forever. This has always been the FIBMAP behavior. It's been broken since forever, but only for filesystems larger than 4TB or 16TB (2^32 blocks), which are only becoming commonplace for root disks recently. Also, doesn't LILO have a limit on the location of the kernel image, in the first 1GB or similar? So maybe this is not an issue that FIBMAP users ever hit in practise anyway, but I agree that it doesn't make sense to return bad data (32-bit wrapped block numbers) and 0 should be returned in such cases. Cheers, Andreas
Meh... Sorry andreas, your reply became disconnected from the thread, and I think I didn't reply. On Thu, Aug 08, 2019 at 12:53:25PM -0600, Andreas Dilger wrote: > On Aug 8, 2019, at 1:12 AM, Carlos Maiolino <cmaiolino@redhat.com> wrote: > > > >>> > >>>> Maybe I am not seeing something or having a different thinking you have, but > >>>> this is the behavior we have now, without my patches. And we can't really change > >>>> it; the user view of this implementation. > >>>> That's why I didn't try to change the result, so the truncation still happens. > >>> > >>> I understand that we're not generally supposed to change existing > >>> userspace interfaces, but the fact remains that allowing truncated > >>> responses causes *filesystem corruption*. > >>> > >>> We know that the most well known FIBMAP callers are bootloaders, and we > >>> know what they do with the information they get -- they use it to record > >>> the block map of boot files. So if the IPL/grub/whatever installer > >>> queries the boot file and the boot file is at block 12345678901 (a > >>> 34-bit number), this interface truncates that to 3755744309 (a 32-bit > >>> number) and that's where the bootloader will think its boot files are. > >>> The installation succeeds, the user reboots and *kaboom* the system no > >>> longer boots because the contents of block 3755744309 is not a bootloader. > >>> > >>> Worse yet, grub1 used FIBMAP data to record the location of the grub > >>> environment file and installed itself between the MBR and the start of > >>> partition 1. If the environment file is at offset 1234578901, grub will > >>> write status data to its environment file (which it thinks is at > >>> 3755744309) and *KABOOM* we've just destroyed whatever was in that > >>> block. > >>> > >>> Far better for the bootloader installation script to hit an error and > >>> force the admin to deal with the situation than for the system to become > >>> unbootable. That's *why* the (newer) iomap bmap implementation does not > >>> return truncated mappings, even though the classic implementation does. > >>> > >>> The classic code returning truncated results is a broken behavior. > >> > >> How long as it been broken for? And if we do fix it, I'd just like for > >> a nice commit lot describing potential risks of not applying it. *If* > >> the issue exists as-is today, the above contains a lot of information > >> for addressing potential issues, even if theoretical. > >> > > > > It's broken since forever. This has always been the FIBMAP behavior. > > It's been broken since forever, but only for filesystems larger than 4TB or > 16TB (2^32 blocks), which are only becoming commonplace for root disks recently. > Also, doesn't LILO have a limit on the location of the kernel image, in the > first 1GB or similar? > > So maybe this is not an issue that FIBMAP users ever hit in practise anyway, > but I agree that it doesn't make sense to return bad data (32-bit wrapped block > numbers) and 0 should be returned in such cases. > Thanks for the input, but TBH I don't use LILO for a long time, and I don't remember exactly how it works. Anyway, I have 2 bugs to fix in this code, after I can get this series in, one is the overflow we'll probably need kernel-api approval, and another one is the acceptance of negative values into FIBMAP, which we have no protection at all. I'll fix both once I can get the main series in. Cheers > > Cheers, Andreas > > > > >
diff --git a/fs/ioctl.c b/fs/ioctl.c index fef3a6bf7c78..6b589c873bc2 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -53,19 +53,28 @@ EXPORT_SYMBOL(vfs_ioctl); static int ioctl_fibmap(struct file *filp, int __user *p) { - struct address_space *mapping = filp->f_mapping; - int res, block; + struct inode *inode = file_inode(filp); + int error, ur_block; + sector_t block; - /* do we support this mess? */ - if (!mapping->a_ops->bmap) - return -EINVAL; if (!capable(CAP_SYS_RAWIO)) return -EPERM; - res = get_user(block, p); - if (res) - return res; - res = mapping->a_ops->bmap(mapping, block); - return put_user(res, p); + + error = get_user(ur_block, p); + if (error) + return error; + + block = ur_block; + error = bmap(inode, &block); + + if (error) + ur_block = 0; + else + ur_block = block; + + error = put_user(ur_block, p); + + return error; } /**
Now we have the possibility of proper error return in bmap, use bmap() function in ioctl_fibmap() instead of calling ->bmap method directly. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> --- Changelog: V4: - Ensure ioctl_fibmap() returns 0 in case of error returned from bmap(). Otherwise we'll be changing the user interface (which returns 0 in case of error) V3: - Rename usr_blk to ur_block V2: - Use a local sector_t variable to asign the block number instead of using direct casting. fs/ioctl.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-)