Message ID | ae9306ab10ce3d794c13b1836f5473e89562b98c.1578225806.git.chris@chrisdown.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: inode: shmem: Reduce risk of inum overflow | expand |
On Sun, Jan 05, 2020 at 12:06:05PM +0000, Chris Down wrote: > The default is still set to inode32 for backwards compatibility, but > system administrators can opt in to the new 64-bit inode numbers by > either: > > 1. Passing inode64 on the command line when mounting, or > 2. Configuring the kernel with CONFIG_TMPFS_INODE64=y > > The inode64 and inode32 names are used based on existing precedent from > XFS. Please don't copy this misfeature of XFS. The inode32/inode64 XFS options were a horrible hack made more than 20 years ago when NFSv2 was still in use and 64 bit inodes could not be used for NFSv2 exports. It was then continued to be used because 32bit NFSv3 clients were unable to handle 64 bit inodes. It took 15 years for us to be able to essentially deprecate inode32 (inode64 is the default behaviour), and we were very happy to get that albatross off our necks. In reality, almost everything out there in the world handles 64 bit inodes correctly including 32 bit machines and 32bit binaries on 64 bit machines. And, IMNSHO, there no excuse these days for 32 bit binaries that don't using the *64() syscall variants directly and hence support 64 bit inodes correctlyi out of the box on all platforms. I don't think we should be repeating past mistakes by trying to cater for broken 32 bit applications on 64 bit machines in this day and age. Cheers, Dave.
Dave Chinner writes: >It took 15 years for us to be able to essentially deprecate >inode32 (inode64 is the default behaviour), and we were very happy >to get that albatross off our necks. In reality, almost everything >out there in the world handles 64 bit inodes correctly >including 32 bit machines and 32bit binaries on 64 bit machines. >And, IMNSHO, there no excuse these days for 32 bit binaries that >don't using the *64() syscall variants directly and hence support >64 bit inodes correctlyi out of the box on all platforms. > >I don't think we should be repeating past mistakes by trying to >cater for broken 32 bit applications on 64 bit machines in this day >and age. I'm very glad to hear that. I strongly support moving to 64-bit inums in all cases if there is precedent that it's not a compatibility issue, but from the comments on my original[0] patch (especially that they strayed from the original patches' change to use ino_t directly into slab reuse), I'd been given the impression that it was known to be one. From my perspective I have no evidence that inode32 is needed other than the comment from Jeff above get_next_ino. If that turns out not to be a problem, I am more than happy to just wholesale migrate 64-bit inodes per-sb in tmpfs. 0: https://lore.kernel.org/patchwork/patch/1170963/
On Tue, Jan 07, 2020 at 12:16:43AM +0000, Chris Down wrote: > Dave Chinner writes: > > It took 15 years for us to be able to essentially deprecate > > inode32 (inode64 is the default behaviour), and we were very happy > > to get that albatross off our necks. In reality, almost everything > > out there in the world handles 64 bit inodes correctly > > including 32 bit machines and 32bit binaries on 64 bit machines. > > And, IMNSHO, there no excuse these days for 32 bit binaries that > > don't using the *64() syscall variants directly and hence support > > 64 bit inodes correctlyi out of the box on all platforms. > > > > I don't think we should be repeating past mistakes by trying to > > cater for broken 32 bit applications on 64 bit machines in this day > > and age. > > I'm very glad to hear that. I strongly support moving to 64-bit inums in all > cases if there is precedent that it's not a compatibility issue, but from > the comments on my original[0] patch (especially that they strayed from the > original patches' change to use ino_t directly into slab reuse), I'd been > given the impression that it was known to be one. > > From my perspective I have no evidence that inode32 is needed other than the > comment from Jeff above get_next_ino. If that turns out not to be a problem, > I am more than happy to just wholesale migrate 64-bit inodes per-sb in > tmpfs. Well, that's my comment above about 32 bit apps using non-LFS compliant interfaces in this day and age. It's essentially a legacy interface these days, and anyone trying to access a modern linux filesystem (btrfs, XFS, ext4, etc) ion 64 bit systems need to handle 64 bit inodes because they all can create >32bit inode numbers in their default configurations. Cheers, Dave.
On Tue, Jan 7, 2020 at 2:40 AM Dave Chinner <david@fromorbit.com> wrote: > > On Tue, Jan 07, 2020 at 12:16:43AM +0000, Chris Down wrote: > > Dave Chinner writes: > > > It took 15 years for us to be able to essentially deprecate > > > inode32 (inode64 is the default behaviour), and we were very happy > > > to get that albatross off our necks. In reality, almost everything > > > out there in the world handles 64 bit inodes correctly > > > including 32 bit machines and 32bit binaries on 64 bit machines. > > > And, IMNSHO, there no excuse these days for 32 bit binaries that > > > don't using the *64() syscall variants directly and hence support > > > 64 bit inodes correctlyi out of the box on all platforms. > > > > > > I don't think we should be repeating past mistakes by trying to > > > cater for broken 32 bit applications on 64 bit machines in this day > > > and age. > > > > I'm very glad to hear that. I strongly support moving to 64-bit inums in all > > cases if there is precedent that it's not a compatibility issue, but from > > the comments on my original[0] patch (especially that they strayed from the > > original patches' change to use ino_t directly into slab reuse), I'd been > > given the impression that it was known to be one. > > > > From my perspective I have no evidence that inode32 is needed other than the > > comment from Jeff above get_next_ino. If that turns out not to be a problem, > > I am more than happy to just wholesale migrate 64-bit inodes per-sb in > > tmpfs. > > Well, that's my comment above about 32 bit apps using non-LFS > compliant interfaces in this day and age. It's essentially a legacy > interface these days, and anyone trying to access a modern linux > filesystem (btrfs, XFS, ext4, etc) ion 64 bit systems need to handle > 64 bit inodes because they all can create >32bit inode numbers > in their default configurations. > Chris, Following Dave's comment, let's keep the config option, but make it default to Y and remove the mount option for now. If somebody shouts, mount option could be re-added later. This way at least we leave an option for workaround for an unlikely breakage. Thanks, Amir.
On Tue, 7 Jan 2020, Amir Goldstein wrote: > On Tue, Jan 7, 2020 at 2:40 AM Dave Chinner <david@fromorbit.com> wrote: > > On Tue, Jan 07, 2020 at 12:16:43AM +0000, Chris Down wrote: > > > Dave Chinner writes: > > > > It took 15 years for us to be able to essentially deprecate > > > > inode32 (inode64 is the default behaviour), and we were very happy > > > > to get that albatross off our necks. In reality, almost everything > > > > out there in the world handles 64 bit inodes correctly > > > > including 32 bit machines and 32bit binaries on 64 bit machines. > > > > And, IMNSHO, there no excuse these days for 32 bit binaries that > > > > don't using the *64() syscall variants directly and hence support > > > > 64 bit inodes correctlyi out of the box on all platforms. Interesting take on it. I'd all along imagined we would have to resort to a mount option for safety, but Dave is right that I was too focused on avoiding tmpfs regressions, without properly realizing that people were very unlikely to have written such tools for tmpfs in particular, but written them for all filesystems, and already encountered and fixed such EOVERFLOWs for other filesystems. Hmm, though how readily does XFS actually reach the high inos on ordinary users' systems? > > > > > > > > I don't think we should be repeating past mistakes by trying to > > > > cater for broken 32 bit applications on 64 bit machines in this day > > > > and age. > > > > > > I'm very glad to hear that. I strongly support moving to 64-bit inums in all > > > cases if there is precedent that it's not a compatibility issue, but from > > > the comments on my original[0] patch (especially that they strayed from the > > > original patches' change to use ino_t directly into slab reuse), I'd been > > > given the impression that it was known to be one. > > > > > > From my perspective I have no evidence that inode32 is needed other than the > > > comment from Jeff above get_next_ino. If that turns out not to be a problem, > > > I am more than happy to just wholesale migrate 64-bit inodes per-sb in > > > tmpfs. > > > > Well, that's my comment above about 32 bit apps using non-LFS > > compliant interfaces in this day and age. It's essentially a legacy > > interface these days, and anyone trying to access a modern linux > > filesystem (btrfs, XFS, ext4, etc) ion 64 bit systems need to handle > > 64 bit inodes because they all can create >32bit inode numbers > > in their default configurations. I thought ext4 still deals in 32-bit inos, so ext4-world would not necessarily have caught up? Though, arguing the other way, IIUC 64-bit ino support comes bundled with file sizes over 2GB (on all architectures?), and it's hard to imagine who gets by with a 2GB file size limit nowadays. > > > > Chris, > > Following Dave's comment, let's keep the config option, but make it > default to Y and remove the mount option for now. > If somebody shouts, mount option could be re-added later. > This way at least we leave an option for workaround for an unlikely > breakage. Though I don't share Dave's strength of aversion to the inode32 and inode64 mount options, I do agree that it's preferable not to offer them if they're so very unlikely to be necessary. Do we even need the config option? We certainly do need to hold the patch with config option and mount options "in our back pocket", so that if a regression does get reported, then upstream and stable can respond to fix it quickly; but do we need more than that? My main concern is that a new EOVERFLOW might not be reported as such, and waste a lot of someone's time to track down. Hugh
On Tue, Jan 7, 2020 at 10:36 AM Hugh Dickins <hughd@google.com> wrote: > > On Tue, 7 Jan 2020, Amir Goldstein wrote: > > On Tue, Jan 7, 2020 at 2:40 AM Dave Chinner <david@fromorbit.com> wrote: > > > On Tue, Jan 07, 2020 at 12:16:43AM +0000, Chris Down wrote: > > > > Dave Chinner writes: > > > > > It took 15 years for us to be able to essentially deprecate > > > > > inode32 (inode64 is the default behaviour), and we were very happy > > > > > to get that albatross off our necks. In reality, almost everything > > > > > out there in the world handles 64 bit inodes correctly > > > > > including 32 bit machines and 32bit binaries on 64 bit machines. > > > > > And, IMNSHO, there no excuse these days for 32 bit binaries that > > > > > don't using the *64() syscall variants directly and hence support > > > > > 64 bit inodes correctlyi out of the box on all platforms. > > Interesting take on it. I'd all along imagined we would have to resort > to a mount option for safety, but Dave is right that I was too focused on > avoiding tmpfs regressions, without properly realizing that people were > very unlikely to have written such tools for tmpfs in particular, but > written them for all filesystems, and already encountered and fixed > such EOVERFLOWs for other filesystems. > > Hmm, though how readily does XFS actually reach the high inos on > ordinary users' systems? > Define 'ordinary' I my calculations are correct, with default mkfs.xfs any inode allocated from logical offset > 2TB on a volume has high ino bits set. Besides, a deployment with more than 4G inodes shouldn't be hard to find. Thanks, Amir.
On Tue, Jan 07, 2020 at 12:36:25AM -0800, Hugh Dickins wrote: > On Tue, 7 Jan 2020, Amir Goldstein wrote: > > On Tue, Jan 7, 2020 at 2:40 AM Dave Chinner <david@fromorbit.com> wrote: > > > On Tue, Jan 07, 2020 at 12:16:43AM +0000, Chris Down wrote: > > > > Dave Chinner writes: > > > > > It took 15 years for us to be able to essentially deprecate > > > > > inode32 (inode64 is the default behaviour), and we were very happy > > > > > to get that albatross off our necks. In reality, almost everything > > > > > out there in the world handles 64 bit inodes correctly > > > > > including 32 bit machines and 32bit binaries on 64 bit machines. > > > > > And, IMNSHO, there no excuse these days for 32 bit binaries that > > > > > don't using the *64() syscall variants directly and hence support > > > > > 64 bit inodes correctlyi out of the box on all platforms. > > Interesting take on it. I'd all along imagined we would have to resort > to a mount option for safety, but Dave is right that I was too focused on > avoiding tmpfs regressions, without properly realizing that people were > very unlikely to have written such tools for tmpfs in particular, but > written them for all filesystems, and already encountered and fixed > such EOVERFLOWs for other filesystems. > > Hmm, though how readily does XFS actually reach the high inos on > ordinary users' systems? Quite frequently these days - the threshold for exceeding 32 bits in the inode number is dependent on the inode size. Old v4 filesystems use 256 byte inodes by default, so they overflow 32bits when the filesystem size is >1TB. Current v5 filesystems use 512 byte inodes, so they overflow 32 bits on filesytsems >2TB. > > > > I'm very glad to hear that. I strongly support moving to 64-bit inums in all > > > > cases if there is precedent that it's not a compatibility issue, but from > > > > the comments on my original[0] patch (especially that they strayed from the > > > > original patches' change to use ino_t directly into slab reuse), I'd been > > > > given the impression that it was known to be one. > > > > > > > > From my perspective I have no evidence that inode32 is needed other than the > > > > comment from Jeff above get_next_ino. If that turns out not to be a problem, > > > > I am more than happy to just wholesale migrate 64-bit inodes per-sb in > > > > tmpfs. > > > > > > Well, that's my comment above about 32 bit apps using non-LFS > > > compliant interfaces in this day and age. It's essentially a legacy > > > interface these days, and anyone trying to access a modern linux > > > filesystem (btrfs, XFS, ext4, etc) ion 64 bit systems need to handle > > > 64 bit inodes because they all can create >32bit inode numbers > > > in their default configurations. > > I thought ext4 still deals in 32-bit inos, so ext4-world would not > necessarily have caught up? Hmmm - I might have got my wires crossed there - there *was* a project some time ago to increase the ext4 inode number size for large filesystems. Ah, yeah, there we go: https://lore.kernel.org/linux-ext4/20171101212455.47964-1-artem.blagodarenko@gmail.com/ I thought it had been rolled in with all the other "make stuff larger" features that ext4 has... > Though, arguing the other way, IIUC 64-bit > ino support comes bundled with file sizes over 2GB (on all architectures?), > and it's hard to imagine who gets by with a 2GB file size limit nowadays. Right, you need to define LFS support for >2GB file support and you get 64 bit inode support with that for free. It's only legacy binaries that haven't been rebuilt in the past 15 years that are an issue here, but there are so many other things that can go trivially wrong with such binaries I think that inode numbers are the least of the worries here... Cheers, Dave.
On Tue, Jan 07, 2020 at 12:12:00PM +0200, Amir Goldstein wrote: > On Tue, Jan 7, 2020 at 10:36 AM Hugh Dickins <hughd@google.com> wrote: > > > > On Tue, 7 Jan 2020, Amir Goldstein wrote: > > > On Tue, Jan 7, 2020 at 2:40 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Tue, Jan 07, 2020 at 12:16:43AM +0000, Chris Down wrote: > > > > > Dave Chinner writes: > > > > > > It took 15 years for us to be able to essentially deprecate > > > > > > inode32 (inode64 is the default behaviour), and we were very happy > > > > > > to get that albatross off our necks. In reality, almost everything > > > > > > out there in the world handles 64 bit inodes correctly > > > > > > including 32 bit machines and 32bit binaries on 64 bit machines. > > > > > > And, IMNSHO, there no excuse these days for 32 bit binaries that > > > > > > don't using the *64() syscall variants directly and hence support > > > > > > 64 bit inodes correctlyi out of the box on all platforms. > > > > Interesting take on it. I'd all along imagined we would have to resort > > to a mount option for safety, but Dave is right that I was too focused on > > avoiding tmpfs regressions, without properly realizing that people were > > very unlikely to have written such tools for tmpfs in particular, but > > written them for all filesystems, and already encountered and fixed > > such EOVERFLOWs for other filesystems. > > > > Hmm, though how readily does XFS actually reach the high inos on > > ordinary users' systems? > > > > Define 'ordinary' > I my calculations are correct, with default mkfs.xfs any inode allocated > from logical offset > 2TB on a volume has high ino bits set. > Besides, a deployment with more than 4G inodes shouldn't be hard to find. You don't need to allocate 4 billion inodes to get >32bit inodes in XFS - the inode number is an encoding of the physical location of the inode in the filesystem. Hence we just have to allocate the inode at a disk address higher than 2TB into the device and we overflow 32bits. e.g. make a sparse fs image file of 10TB, mount it, create 50 subdirs, then start creating zero length files spread across the 50 separate subdirectories. You should see >32bit inode numbers almost immediately. (i.e. as soon as we allocate inodes in AG 2 or higher) IOWs, there are *lots* of 64bit inode numbers out there on XFS filesystems.... Cheers, Dave.
On 7 Jan 2020, at 16:07, Dave Chinner wrote: > IOWs, there are *lots* of 64bit inode numbers out there on XFS > filesystems.... It's less likely in btrfs but +1 to all of Dave's comments. I'm happy to run a scan on machines in the fleet and see how many have 64 bit inodes (either buttery or x-y), but it's going to be a lot. -chris
On Tue, 7 Jan 2020, Chris Mason wrote: > On 7 Jan 2020, at 16:07, Dave Chinner wrote: > > > IOWs, there are *lots* of 64bit inode numbers out there on XFS > > filesystems.... > > It's less likely in btrfs but +1 to all of Dave's comments. I'm happy > to run a scan on machines in the fleet and see how many have 64 bit > inodes (either buttery or x-y), but it's going to be a lot. Dave, Amir, Chris, many thanks for the info you've filled in - and absolutely no need to run any scan on your fleet for this, I think we can be confident that even if fb had some 15-year-old tool in use on its fleet of 2GB-file filesystems, it would not be the one to insist on a kernel revert of 64-bit tmpfs inos. The picture looks clear now: while ChrisD does need to hold on to his config option and inode32/inode64 mount option patch, it is much better left out of the kernel until (very unlikely) proved necessary. Thanks, Hugh
Dave Chinner writes: > On Sun, Jan 05, 2020 at 12:06:05PM +0000, Chris Down wrote: > > The default is still set to inode32 for backwards compatibility, but > > system administrators can opt in to the new 64-bit inode numbers by > > either: > > > > 1. Passing inode64 on the command line when mounting, or > > 2. Configuring the kernel with CONFIG_TMPFS_INODE64=y > > > > The inode64 and inode32 names are used based on existing precedent from > > XFS. > > Please don't copy this misfeature of XFS. > > The inode32/inode64 XFS options were a horrible hack made more than > 20 years ago when NFSv2 was still in use and 64 bit inodes could > not be used for NFSv2 exports. It was then continued to be used > because 32bit NFSv3 clients were unable to handle 64 bit inodes. > > It took 15 years for us to be able to essentially deprecate > inode32 (inode64 is the default behaviour), and we were very happy > to get that albatross off our necks. In reality, almost everything > out there in the world handles 64 bit inodes correctly > including 32 bit machines and 32bit binaries on 64 bit machines. > And, IMNSHO, there no excuse these days for 32 bit binaries that > don't using the *64() syscall variants directly and hence support > 64 bit inodes correctlyi out of the box on all platforms. > > I don't think we should be repeating past mistakes by trying to > cater for broken 32 bit applications on 64 bit machines in this day > and age. Hi, It's unfortunately not true that everything handles this correctly. 32-bit binaries for games on Steam that use stat() without the 64 is so prevalent that I got tired of adding the LD_PRELOAD wrapper script and just patched out the EOVERFLOW return from glibc instead. (They obviously don't care about the inode value at all, and I don't use any other 32-bit binaries that do). This is probably a class of binaries you don't care very much about, and not very likely to be installed on a tmpfs that has wrapped around, but I thought it was worth mentioning that they do exist anyway.
On Wed, 2020-01-08 at 03:24 -0800, Hugh Dickins wrote: > On Tue, 7 Jan 2020, Chris Mason wrote: > > On 7 Jan 2020, at 16:07, Dave Chinner wrote: > > > > > IOWs, there are *lots* of 64bit inode numbers out there on XFS > > > filesystems.... > > > > It's less likely in btrfs but +1 to all of Dave's comments. I'm happy > > to run a scan on machines in the fleet and see how many have 64 bit > > inodes (either buttery or x-y), but it's going to be a lot. > > Dave, Amir, Chris, many thanks for the info you've filled in - > and absolutely no need to run any scan on your fleet for this, > I think we can be confident that even if fb had some 15-year-old tool > in use on its fleet of 2GB-file filesystems, it would not be the one > to insist on a kernel revert of 64-bit tmpfs inos. > > The picture looks clear now: while ChrisD does need to hold on to his > config option and inode32/inode64 mount option patch, it is much better > left out of the kernel until (very unlikely) proved necessary. This approach seems like the best course to me. FWIW, at the time we capped this at 32-bits (2007), 64-bit machines were really just becoming widely available, and it was quite common to run 32-bit, non-LFS apps on a 64-bit kernel. Users were hitting spurious EOVERFLOW errors all over the place so this seemed like the best way to address it. The world has changed a lot since then though, and one would hope that almost everything these days is compiled with FILE_OFFSET_BITS=64. Fingers crossed!
Hi Hugh, Dave, Hugh Dickins writes: >Dave, Amir, Chris, many thanks for the info you've filled in - >and absolutely no need to run any scan on your fleet for this, >I think we can be confident that even if fb had some 15-year-old tool >in use on its fleet of 2GB-file filesystems, it would not be the one >to insist on a kernel revert of 64-bit tmpfs inos. > >The picture looks clear now: while ChrisD does need to hold on to his >config option and inode32/inode64 mount option patch, it is much better >left out of the kernel until (very unlikely) proved necessary. Based on Mikael's comment above about Steam binaries, and the lack of likelihood that they can be rebuilt, I'm inclined to still keep inode{64,32}, but make legacy behaviour require explicit opt-in. That is: - Default it to inode64 - Remove the Kconfig option - Only print it as an option if tmpfs was explicitly mounted with inode32 The reason I suggest keeping this is that I'm mildly concerned that the kind of users who might be impacted by this change due to 32-bit _FILE_OFFSET_BITS -- like the not-too-uncommon case that Mikael brings up -- seem unlikely to be the kind of people that would find it in an rc. Other than that, the first patch could be similar to how it is now, incorporating Hugh's improvements to the first patch to put everything under the same stat_lock in shmem_reserve_inode. What do you folks think? Thanks, Chris
On Wed, 8 Jan 2020, Mikael Magnusson wrote: > > It's unfortunately not true that everything handles this correctly. > 32-bit binaries for games on Steam that use stat() without the 64 is > so prevalent that I got tired of adding the LD_PRELOAD wrapper script > and just patched out the EOVERFLOW return from glibc instead. (They > obviously don't care about the inode value at all, and I don't use any > other 32-bit binaries that do). This is probably a class of binaries > you don't care very much about, and not very likely to be installed on > a tmpfs that has wrapped around, but I thought it was worth mentioning > that they do exist anyway. Thank you for alerting us to reality, Mikael: not what any of us wanted to hear, but we do care about them, and it was well worth mentioning. Hugh
On Fri, 10 Jan 2020, Chris Down wrote: > Hugh Dickins writes: > > Dave, Amir, Chris, many thanks for the info you've filled in - > > and absolutely no need to run any scan on your fleet for this, > > I think we can be confident that even if fb had some 15-year-old tool > > in use on its fleet of 2GB-file filesystems, it would not be the one > > to insist on a kernel revert of 64-bit tmpfs inos. > > > > The picture looks clear now: while ChrisD does need to hold on to his > > config option and inode32/inode64 mount option patch, it is much better > > left out of the kernel until (very unlikely) proved necessary. > > Based on Mikael's comment above about Steam binaries, and the lack of > likelihood that they can be rebuilt, I'm inclined to still keep inode{64,32}, > but make legacy behaviour require explicit opt-in. That is: > > - Default it to inode64 > - Remove the Kconfig option > - Only print it as an option if tmpfs was explicitly mounted with inode32 > > The reason I suggest keeping this is that I'm mildly concerned that the kind > of users who might be impacted by this change due to 32-bit _FILE_OFFSET_BITS > -- like the not-too-uncommon case that Mikael brings up -- seem unlikely to > be the kind of people that would find it in an rc. Okay. None of us are thrilled with it, but I agree that Mikael's observation should override our developer's preference. So the "inode64" option will be accepted but redundant on mounting, but exists for use as a remount option after mounting or remounting with "inode32": allowing the admin to switch temporarily to mask off the high ino bits with "inode32" when needing to run a limited binary. Documentation and commit message to alert Andrew and Linus and distros that we are risking some breakage with this, but supplying the antidote (not breakage of any distros themselves, no doubt they're all good; but breakage of what some users might run on them). > > Other than that, the first patch could be similar to how it is now, > incorporating Hugh's improvements to the first patch to put everything under > the same stat_lock in shmem_reserve_inode. So, I persuaded Amir to the other aspects my version, but did not persuade you? Well, I can live with that (or if not, can send mods on top of yours): but please read again why I was uncomfortable with yours, to check that you still prefer it (I agree that your patch is simpler, and none of my discomfort decisive). Thanks, Hugh
Hi Hugh, Sorry this response took so long, I had some non-work issues that took a lot of time last week. Hugh Dickins writes: >On Fri, 10 Jan 2020, Chris Down wrote: >> Hugh Dickins writes: >> > Dave, Amir, Chris, many thanks for the info you've filled in - >> > and absolutely no need to run any scan on your fleet for this, >> > I think we can be confident that even if fb had some 15-year-old tool >> > in use on its fleet of 2GB-file filesystems, it would not be the one >> > to insist on a kernel revert of 64-bit tmpfs inos. >> > >> > The picture looks clear now: while ChrisD does need to hold on to his >> > config option and inode32/inode64 mount option patch, it is much better >> > left out of the kernel until (very unlikely) proved necessary. >> >> Based on Mikael's comment above about Steam binaries, and the lack of >> likelihood that they can be rebuilt, I'm inclined to still keep inode{64,32}, >> but make legacy behaviour require explicit opt-in. That is: >> >> - Default it to inode64 >> - Remove the Kconfig option >> - Only print it as an option if tmpfs was explicitly mounted with inode32 >> >> The reason I suggest keeping this is that I'm mildly concerned that the kind >> of users who might be impacted by this change due to 32-bit _FILE_OFFSET_BITS >> -- like the not-too-uncommon case that Mikael brings up -- seem unlikely to >> be the kind of people that would find it in an rc. > >Okay. None of us are thrilled with it, but I agree that >Mikael's observation should override our developer's preference. > >So the "inode64" option will be accepted but redundant on mounting, >but exists for use as a remount option after mounting or remounting >with "inode32": allowing the admin to switch temporarily to mask off >the high ino bits with "inode32" when needing to run a limited binary. > >Documentation and commit message to alert Andrew and Linus and distros >that we are risking some breakage with this, but supplying the antidote >(not breakage of any distros themselves, no doubt they're all good; >but breakage of what some users might run on them). Sounds good. >> >> Other than that, the first patch could be similar to how it is now, >> incorporating Hugh's improvements to the first patch to put everything under >> the same stat_lock in shmem_reserve_inode. > >So, I persuaded Amir to the other aspects my version, but did not >persuade you? Well, I can live with that (or if not, can send mods >on top of yours): but please read again why I was uncomfortable with >yours, to check that you still prefer it (I agree that your patch is >simpler, and none of my discomfort decisive). Hmm, which bit were you thinking of? The lack of batching, shmem_encode_fh(), or the fact that nr_inodes can now be 0 on non-internal mounts? For batching, I'm neutral. I'm happy to use the approach from your patch and integrate it (and credit you, of course). For shmem_encode_fh, I'm not totally sure I understand the concern, if that's what you mean. For nr_inodes, I agree that intentional or unintentional, we should at least handle this case for now and can adjust later if the behaviour changes. Thanks again, Chris
On Mon, 20 Jan 2020, Chris Down wrote: > Hi Hugh, > > Sorry this response took so long, I had some non-work issues that took a lot > of time last week. No, clearly it's I who must apologize to you for very slow response. > > Hugh Dickins writes: > > > > So the "inode64" option will be accepted but redundant on mounting, > > but exists for use as a remount option after mounting or remounting > > with "inode32": allowing the admin to switch temporarily to mask off > > the high ino bits with "inode32" when needing to run a limited binary. > > > > Documentation and commit message to alert Andrew and Linus and distros > > that we are risking some breakage with this, but supplying the antidote > > (not breakage of any distros themselves, no doubt they're all good; > > but breakage of what some users might run on them). > > Sounds good. > > > > > > > Other than that, the first patch could be similar to how it is now, > > > incorporating Hugh's improvements to the first patch to put everything > > > under > > > the same stat_lock in shmem_reserve_inode. > > > > So, I persuaded Amir to the other aspects my version, but did not > > persuade you? Well, I can live with that (or if not, can send mods > > on top of yours): but please read again why I was uncomfortable with > > yours, to check that you still prefer it (I agree that your patch is > > simpler, and none of my discomfort decisive). > > Hmm, which bit were you thinking of? The lack of batching, shmem_encode_fh(), > or the fact that nr_inodes can now be 0 on non-internal mounts? I was uncomfortable with tmpfs depleting get_next_ino()'s pool in some configurations, and wanted the (get_next_ino-like) per-cpu but per-sb batching for nr_inodes=0, to minimize its stat_lock contention. I did not have a patch to shmem_encode_fh(), had gone through thinking that 64-bit inos made an additional fix there necessary; but Amir then educated us that it is safe as is, though could be cleaned up later. nr_inodes can be 0 on non-internal mounts, for many years. > > For batching, I'm neutral. I'm happy to use the approach from your patch and > integrate it (and credit you, of course). Credit not important, but you may well want to blame me for that complication :) > > For shmem_encode_fh, I'm not totally sure I understand the concern, if that's > what you mean. My concern had been that shmem_encode_fh() builds up an fh from i_ino and more, looks well prepared for a 64-bit ino, but appeared to be announcing a 32-bit ino in its return value: Amir reassures us that that return value does not matter. > > For nr_inodes, I agree that intentional or unintentional, we should at least > handle this case for now and can adjust later if the behaviour changes. nr_inodes=0 is an intentional configuration (but 0 denoting infinity: not very clean, and I've sometimes regretted that choice). If there's any behavior change, that's a separate matter from these 64-bit ino patches; maybe I mentioned it in passing and confused us - that we seem to have recently allowed a remounting limited<->unlimited that was not permitted before, and might or might not need a fix patch. Sorry again for delaying you, Chris: not at all what I'd wanted to do. Hugh
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index 5ecbc03e6b2f..203e12a684c9 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt @@ -136,6 +136,15 @@ These options do not have any effect on remount. You can change these parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem. +tmpfs has a mount option to select whether it will wrap at 32- or 64-bit inode +numbers: + +inode64 Use 64-bit inode numbers +inode32 Use 32-bit inode numbers + +On 64-bit, the default is set by CONFIG_TMPFS_INODE64. On 32-bit, inode64 is +not legal and will produce an error at mount time. + So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs' will give you tmpfs instance on /mytmpfs which can allocate 10GB RAM/SWAP in 10240 inodes and it is only accessible by root. @@ -147,3 +156,5 @@ Updated: Hugh Dickins, 4 June 2007 Updated: KOSAKI Motohiro, 16 Mar 2010 +Updated: + Chris Down, 2 Jan 2020 diff --git a/fs/Kconfig b/fs/Kconfig index 7b623e9fc1b0..e74f127df22a 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -199,6 +199,21 @@ config TMPFS_XATTR If unsure, say N. +config TMPFS_INODE64 + bool "Use 64-bit ino_t by default in tmpfs" + depends on TMPFS && 64BIT + default n + help + tmpfs has historically used only inode numbers as wide as an unsigned + int. In some cases this can cause wraparound, potentially resulting in + multiple files with the same inode number on a single device. This option + makes tmpfs use the full width of ino_t by default, similarly to the + inode64 mount option. + + To override this default, use the inode32 or inode64 mount options. + + If unsure, say N. + config HUGETLBFS bool "HugeTLB file system support" depends on X86 || IA64 || SPARC64 || (S390 && 64BIT) || \ diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 7fac91f490dc..8925eb774518 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -35,6 +35,7 @@ struct shmem_sb_info { unsigned char huge; /* Whether to try for hugepages */ kuid_t uid; /* Mount uid for root directory */ kgid_t gid; /* Mount gid for root directory */ + bool full_inums; /* If i_ino should be uint or ino_t */ ino_t next_ino; /* The next per-sb inode number to use */ struct mempolicy *mpol; /* default memory policy for mappings */ spinlock_t shrinklist_lock; /* Protects shrinklist */ diff --git a/mm/shmem.c b/mm/shmem.c index 9e97ba972225..05de65112bed 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -115,11 +115,13 @@ struct shmem_options { kuid_t uid; kgid_t gid; umode_t mode; + bool full_inums; int huge; int seen; #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 +#define SHMEM_SEEN_INUMS 8 }; #ifdef CONFIG_TMPFS @@ -2261,15 +2263,24 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode * since max_inodes is always 0, and is called from * potentially unknown contexts. As such, use the global * allocator which doesn't require the per-sb stat_lock. + * + * No special behaviour is needed for + * sbinfo->full_inums, because it's not possible to + * manually set on callers of this type, and + * CONFIG_TMPFS_INODE64 only applies to user-visible + * mounts. */ inode->i_ino = get_next_ino(); } else { spin_lock(&sbinfo->stat_lock); - if (unlikely(sbinfo->next_ino > UINT_MAX)) { + if (unlikely(!sbinfo->full_inums && + sbinfo->next_ino > UINT_MAX)) { /* * Emulate get_next_ino uint wraparound for * compatibility */ + pr_warn("%s: inode number overflow on device %d, consider using inode64 mount option\n", + __func__, MINOR(sb->s_dev)); sbinfo->next_ino = 1; } inode->i_ino = sbinfo->next_ino++; @@ -3406,6 +3417,8 @@ enum shmem_param { Opt_nr_inodes, Opt_size, Opt_uid, + Opt_inode32, + Opt_inode64, }; static const struct fs_parameter_spec shmem_param_specs[] = { @@ -3417,6 +3430,8 @@ static const struct fs_parameter_spec shmem_param_specs[] = { fsparam_string("nr_inodes", Opt_nr_inodes), fsparam_string("size", Opt_size), fsparam_u32 ("uid", Opt_uid), + fsparam_flag ("inode32", Opt_inode32), + fsparam_flag ("inode64", Opt_inode64), {} }; @@ -3441,6 +3456,7 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) unsigned long long size; char *rest; int opt; + const char *err; opt = fs_parse(fc, &shmem_fs_parameters, param, &result); if (opt < 0) @@ -3502,6 +3518,18 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) break; } goto unsupported_parameter; + case Opt_inode32: + ctx->full_inums = false; + ctx->seen |= SHMEM_SEEN_INUMS; + break; + case Opt_inode64: + if (sizeof(ino_t) < 8) { + err = "Cannot use inode64 with <64bit inums in kernel"; + goto err_msg; + } + ctx->full_inums = true; + ctx->seen |= SHMEM_SEEN_INUMS; + break; } return 0; @@ -3509,6 +3537,8 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) return invalf(fc, "tmpfs: Unsupported parameter '%s'", param->key); bad_value: return invalf(fc, "tmpfs: Bad value for '%s'", param->key); +err_msg: + return invalf(fc, "tmpfs: %s", err); } static int shmem_parse_options(struct fs_context *fc, void *data) @@ -3593,8 +3623,16 @@ static int shmem_reconfigure(struct fs_context *fc) } } + if ((ctx->seen & SHMEM_SEEN_INUMS) && !ctx->full_inums && + sbinfo->next_ino > UINT_MAX) { + err = "Current inum too high to switch to 32-bit inums"; + goto out; + } + if (ctx->seen & SHMEM_SEEN_HUGE) sbinfo->huge = ctx->huge; + if (ctx->seen & SHMEM_SEEN_INUMS) + sbinfo->full_inums = ctx->full_inums; if (ctx->seen & SHMEM_SEEN_BLOCKS) sbinfo->max_blocks = ctx->blocks; if (ctx->seen & SHMEM_SEEN_INODES) { @@ -3634,6 +3672,29 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root) if (!gid_eq(sbinfo->gid, GLOBAL_ROOT_GID)) seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, sbinfo->gid)); + + /* + * Showing inode{64,32} might be useful even if it's the system default, + * since then people don't have to resort to checking both here and + * /proc/config.gz to confirm 64-bit inums were successfully applied + * (which may not even exist if IKCONFIG_PROC isn't enabled). + * + * We hide it when inode64 isn't the default and we are using 32-bit + * inodes, since that probably just means the feature isn't even under + * consideration. + * + * As such: + * + * +-----------------+-----------------+ + * | TMPFS_INODE64=y | TMPFS_INODE64=n | + * +------------------+-----------------+-----------------+ + * | full_inums=true | show | show | + * | full_inums=false | show | hide | + * +------------------+-----------------+-----------------+ + * + */ + if (IS_ENABLED(CONFIG_TMPFS_INODE64) || sbinfo->full_inums) + seq_printf(seq, ",inode%d", (sbinfo->full_inums ? 64 : 32)); #ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE /* Rightly or wrongly, show huge mount option unmasked by shmem_huge */ if (sbinfo->huge) @@ -3681,6 +3742,8 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) ctx->blocks = shmem_default_max_blocks(); if (!(ctx->seen & SHMEM_SEEN_INODES)) ctx->inodes = shmem_default_max_inodes(); + if (!(ctx->seen & SHMEM_SEEN_INUMS)) + ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64); } else { sb->s_flags |= SB_NOUSER; } @@ -3694,6 +3757,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) sbinfo->free_inodes = sbinfo->max_inodes = ctx->inodes; sbinfo->uid = ctx->uid; sbinfo->gid = ctx->gid; + sbinfo->full_inums = ctx->full_inums; sbinfo->mode = ctx->mode; sbinfo->huge = ctx->huge; sbinfo->mpol = ctx->mpol;