Message ID | 20190723155020.17338-1-lhenriques@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] ceph: fix directories inode i_blkbits initialization | expand |
On Tue, 2019-07-23 at 16:50 +0100, Luis Henriques wrote: > When filling an inode with info from the MDS, i_blkbits is being > initialized using fl_stripe_unit, which contains the stripe unit in > bytes. Unfortunately, this doesn't make sense for directories as they > have fl_stripe_unit set to '0'. This means that i_blkbits will be set > to 0xff, causing an UBSAN undefined behaviour in i_blocksize(): > > UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12 > shift exponent 255 is too large for 32-bit type 'int' > > Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit > is zero. > > Signed-off-by: Luis Henriques <lhenriques@suse.com> > --- > fs/ceph/inode.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > Hi Jeff, > > To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use > here, but for sure the one currently being used isn't correct if the > inode is a directory. Using stripe units seems to be a bug that has > been there since the beginning, but it definitely became bigger problem > after commit 69448867abcb ("fs: shave 8 bytes off of struct inode"). > > This fix could also be moved into the 'switch' statement later in that > function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph: > set special inode's blocksize to page size"). Let me know which version > you would prefer. > What happens with (e.g.) named pipes or symlinks? Do those inodes also get this bogus value? Assuming that they do, I'd probably prefer this patch since it'd fix things for all inode types, not just directories. > Cheers, > -- > Luis > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > index 791f84a13bb8..0e6d6db848b7 100644 > --- a/fs/ceph/inode.c > +++ b/fs/ceph/inode.c > @@ -800,7 +800,12 @@ static int fill_inode(struct inode *inode, struct page *locked_page, > > /* update inode */ > inode->i_rdev = le32_to_cpu(info->rdev); > - inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; > + /* directories have fl_stripe_unit set to zero */ > + if (le32_to_cpu(info->layout.fl_stripe_unit)) > + inode->i_blkbits = > + fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; > + else > + inode->i_blkbits = CEPH_BLOCK_SHIFT; > > __ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files); >
"Jeff Layton" <jlayton@kernel.org> writes: > On Tue, 2019-07-23 at 16:50 +0100, Luis Henriques wrote: >> When filling an inode with info from the MDS, i_blkbits is being >> initialized using fl_stripe_unit, which contains the stripe unit in >> bytes. Unfortunately, this doesn't make sense for directories as they >> have fl_stripe_unit set to '0'. This means that i_blkbits will be set >> to 0xff, causing an UBSAN undefined behaviour in i_blocksize(): >> >> UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12 >> shift exponent 255 is too large for 32-bit type 'int' >> >> Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit >> is zero. >> >> Signed-off-by: Luis Henriques <lhenriques@suse.com> >> --- >> fs/ceph/inode.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> Hi Jeff, >> >> To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use >> here, but for sure the one currently being used isn't correct if the >> inode is a directory. Using stripe units seems to be a bug that has >> been there since the beginning, but it definitely became bigger problem >> after commit 69448867abcb ("fs: shave 8 bytes off of struct inode"). >> >> This fix could also be moved into the 'switch' statement later in that >> function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph: >> set special inode's blocksize to page size"). Let me know which version >> you would prefer. >> > > What happens with (e.g.) named pipes or symlinks? Do those inodes also > get this bogus value? Assuming that they do, I'd probably prefer this > patch since it'd fix things for all inode types, not just directories. I tested symlinks and they seem to be handled correctly (i.e. the stripe units seems to be the same as the target file). Regarding pipes, I didn't test them, but from the code it should be set to PAGE_SHIFT (see the above mentioned commit 5ba72e607cdb). Anyway, I can change the code to do *all* the i_blkbits initialization inside the switch statement. Something like: switch (inode->i_mode & S_IFMT) { case S_IFIFO: case S_IFBLK: case S_IFCHR: case S_IFSOCK: inode->i_blkbits = PAGE_SHIFT; ... case S_IFREG: inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; ... case S_IFLNK: inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; ... case S_IFDIR: inode->i_blkbits = CEPH_BLOCK_SHIFT; ... default: pr_err(); ... } This would add some code duplication (S_IFREG and S_IFLNK cases), but maybe it's a bit more clear. The other option would be obviously to leave the initialization outside the switch and only change the i_blkbits value in the S_IF{IFO,BLK,CHR,SOCK,DIR} cases. Cheers,
Luis Henriques <lhenriques@suse.com> writes: > "Jeff Layton" <jlayton@kernel.org> writes: > >> On Tue, 2019-07-23 at 16:50 +0100, Luis Henriques wrote: >>> When filling an inode with info from the MDS, i_blkbits is being >>> initialized using fl_stripe_unit, which contains the stripe unit in >>> bytes. Unfortunately, this doesn't make sense for directories as they >>> have fl_stripe_unit set to '0'. This means that i_blkbits will be set >>> to 0xff, causing an UBSAN undefined behaviour in i_blocksize(): >>> >>> UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12 >>> shift exponent 255 is too large for 32-bit type 'int' >>> >>> Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit >>> is zero. >>> >>> Signed-off-by: Luis Henriques <lhenriques@suse.com> >>> --- >>> fs/ceph/inode.c | 7 ++++++- >>> 1 file changed, 6 insertions(+), 1 deletion(-) >>> >>> Hi Jeff, >>> >>> To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use >>> here, but for sure the one currently being used isn't correct if the >>> inode is a directory. Using stripe units seems to be a bug that has >>> been there since the beginning, but it definitely became bigger problem >>> after commit 69448867abcb ("fs: shave 8 bytes off of struct inode"). >>> >>> This fix could also be moved into the 'switch' statement later in that >>> function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph: >>> set special inode's blocksize to page size"). Let me know which version >>> you would prefer. >>> >> >> What happens with (e.g.) named pipes or symlinks? Do those inodes also >> get this bogus value? Assuming that they do, I'd probably prefer this >> patch since it'd fix things for all inode types, not just directories. > > I tested symlinks and they seem to be handled correctly (i.e. the stripe > units seems to be the same as the target file). Regarding pipes, I > didn't test them, but from the code it should be set to PAGE_SHIFT (see > the above mentioned commit 5ba72e607cdb). Ok, after looking closer at the other inode types and running a few tests with extra debug code, it all seems to be sane -- only directories (root dir is an exception) will cause problems with i_blkbits being set to a bogus value. So, I'm sticking with my original RFC patch approach, which should be easy to apply to stable kernels. Cheers,
On Wed, 2019-07-24 at 11:04 +0100, Luis Henriques wrote: > Luis Henriques <lhenriques@suse.com> writes: > > > "Jeff Layton" <jlayton@kernel.org> writes: > > > > > On Tue, 2019-07-23 at 16:50 +0100, Luis Henriques wrote: > > > > When filling an inode with info from the MDS, i_blkbits is being > > > > initialized using fl_stripe_unit, which contains the stripe unit in > > > > bytes. Unfortunately, this doesn't make sense for directories as they > > > > have fl_stripe_unit set to '0'. This means that i_blkbits will be set > > > > to 0xff, causing an UBSAN undefined behaviour in i_blocksize(): > > > > > > > > UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12 > > > > shift exponent 255 is too large for 32-bit type 'int' > > > > > > > > Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit > > > > is zero. > > > > > > > > Signed-off-by: Luis Henriques <lhenriques@suse.com> > > > > --- > > > > fs/ceph/inode.c | 7 ++++++- > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > Hi Jeff, > > > > > > > > To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use > > > > here, but for sure the one currently being used isn't correct if the > > > > inode is a directory. Using stripe units seems to be a bug that has > > > > been there since the beginning, but it definitely became bigger problem > > > > after commit 69448867abcb ("fs: shave 8 bytes off of struct inode"). > > > > > > > > This fix could also be moved into the 'switch' statement later in that > > > > function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph: > > > > set special inode's blocksize to page size"). Let me know which version > > > > you would prefer. > > > > > > > > > > What happens with (e.g.) named pipes or symlinks? Do those inodes also > > > get this bogus value? Assuming that they do, I'd probably prefer this > > > patch since it'd fix things for all inode types, not just directories. > > > > I tested symlinks and they seem to be handled correctly (i.e. the stripe > > units seems to be the same as the target file). Regarding pipes, I > > didn't test them, but from the code it should be set to PAGE_SHIFT (see > > the above mentioned commit 5ba72e607cdb). > > Ok, after looking closer at the other inode types and running a few > tests with extra debug code, it all seems to be sane -- only directories > (root dir is an exception) will cause problems with i_blkbits being set > to a bogus value. So, I'm sticking with my original RFC patch approach, > which should be easy to apply to stable kernels. > > Cheers, Sounds good. I'll just plan to merge your RFC patch, after I run some more tests on it. Thanks!
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 791f84a13bb8..0e6d6db848b7 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -800,7 +800,12 @@ static int fill_inode(struct inode *inode, struct page *locked_page, /* update inode */ inode->i_rdev = le32_to_cpu(info->rdev); - inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; + /* directories have fl_stripe_unit set to zero */ + if (le32_to_cpu(info->layout.fl_stripe_unit)) + inode->i_blkbits = + fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1; + else + inode->i_blkbits = CEPH_BLOCK_SHIFT; __ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files);
When filling an inode with info from the MDS, i_blkbits is being initialized using fl_stripe_unit, which contains the stripe unit in bytes. Unfortunately, this doesn't make sense for directories as they have fl_stripe_unit set to '0'. This means that i_blkbits will be set to 0xff, causing an UBSAN undefined behaviour in i_blocksize(): UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12 shift exponent 255 is too large for 32-bit type 'int' Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit is zero. Signed-off-by: Luis Henriques <lhenriques@suse.com> --- fs/ceph/inode.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) Hi Jeff, To be honest, I'm not sure CEPH_BLOCK_SHIFT is the right value to use here, but for sure the one currently being used isn't correct if the inode is a directory. Using stripe units seems to be a bug that has been there since the beginning, but it definitely became bigger problem after commit 69448867abcb ("fs: shave 8 bytes off of struct inode"). This fix could also be moved into the 'switch' statement later in that function, in the S_IFDIR case, similar to commit 5ba72e607cdb ("ceph: set special inode's blocksize to page size"). Let me know which version you would prefer. Cheers, -- Luis