Message ID | 1425429420-13142-1-git-send-email-dehrenberg@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Mar 03, 2015 at 04:37:00PM -0800, Dan Ehrenberg wrote: > Previously, stat on 32-bit platforms used old-style encoding and > validation of major-minor pairs, with 8:8 bits. However, dev_t is > 32-bits on these platforms, and coreutils seems to be treating the > values as 12:20 new-style values. The only reason to use the old > version is in the implementation of a legacy filesystem format which > only has 16 bits of space. Communicating a 12:20 value to userspace > when sizeof dev_t == 4 can be done on either a 32-bit or 64-bit > platform. > > This patch removes the artificial restriction in major:minor size. > The only backwards incompatibility which results is sometimes stat > succeeds if -EOVERFLOW would be returned otherwise. The legacy > old-style stat call is retained as-is. Umm... You do realize that coreutils are _very_ far from being the only userland programs handling the data returned by stat(2), right? What's to guarantee that your ABI change won't break any of those? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 3, 2015 at 4:53 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Mar 03, 2015 at 04:37:00PM -0800, Dan Ehrenberg wrote: >> Previously, stat on 32-bit platforms used old-style encoding and >> validation of major-minor pairs, with 8:8 bits. However, dev_t is >> 32-bits on these platforms, and coreutils seems to be treating the >> values as 12:20 new-style values. The only reason to use the old >> version is in the implementation of a legacy filesystem format which >> only has 16 bits of space. Communicating a 12:20 value to userspace >> when sizeof dev_t == 4 can be done on either a 32-bit or 64-bit >> platform. >> >> This patch removes the artificial restriction in major:minor size. >> The only backwards incompatibility which results is sometimes stat >> succeeds if -EOVERFLOW would be returned otherwise. The legacy >> old-style stat call is retained as-is. > > Umm... You do realize that coreutils are _very_ far from being the only > userland programs handling the data returned by stat(2), right? > > What's to guarantee that your ABI change won't break any of those? I guess it's impossible to guarantee, but if there is an error, it'll be that an -EOVERFLOW error is suppressed and the high bits of the major:minor pair are shaved off by the userspace program. I would suspect that this would just make debugging harder, rather than actually break an automated program which counts on getting EOVERFLOW from a huge block device, but no way to know. The block device has to actually exist for this to happen, and all we're talking about is stat failing. So it's replacing an an error code with erroneous way to get data about a device node (erroneous just because userspace ignores some of the bits with the answer). How about this instead: a new config option to switch on the 12:20 major:minor numbers on 32-bit platforms? Dan -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 03, 2015 at 05:10:22PM -0800, Daniel Ehrenberg wrote: > > What's to guarantee that your ABI change won't break any of those? > > I guess it's impossible to guarantee, but if there is an error, it'll > be that an -EOVERFLOW error is suppressed and the high bits of the > major:minor pair are shaved off by the userspace program. I would > suspect that this would just make debugging harder, rather than > actually break an automated program which counts on getting EOVERFLOW > from a huge block device, but no way to know. The block device has to > actually exist for this to happen, and all we're talking about is stat > failing. So it's replacing an an error code with erroneous way to get > data about a device node (erroneous just because userspace ignores > some of the bits with the answer). All it takes is more than 16 SCSI disks, AFAICS, and use of open-coded MINOR() somewhere in old userland code... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
At least, to base the device format on whether we are running on a 32 bit or 64 bit architecture does not make sense. If a tool calling stat(2) can not handle 12 bit major/20 bits minor, it would already break or about to break when running on a 64 bit machine. Regarding SCSI, the 17th disk will use SCSI_DISK1_MAJOR (65). Only the 257th disk will use the first scsi major (8) again and need a minor greater than 256. (see sd_major() in drivers/scsi/sd.c for details). Gwendal. On Tue, Mar 3, 2015 at 5:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Mar 03, 2015 at 05:10:22PM -0800, Daniel Ehrenberg wrote: >> > What's to guarantee that your ABI change won't break any of those? >> >> I guess it's impossible to guarantee, but if there is an error, it'll >> be that an -EOVERFLOW error is suppressed and the high bits of the >> major:minor pair are shaved off by the userspace program. I would >> suspect that this would just make debugging harder, rather than >> actually break an automated program which counts on getting EOVERFLOW >> from a huge block device, but no way to know. The block device has to >> actually exist for this to happen, and all we're talking about is stat >> failing. So it's replacing an an error code with erroneous way to get >> data about a device node (erroneous just because userspace ignores >> some of the bits with the answer). > > All it takes is more than 16 SCSI disks, AFAICS, and use of open-coded > MINOR() somewhere in old userland code... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 03, 2015 at 05:37:31PM -0800, Gwendal Grignou wrote: > At least, to base the device format on whether we are running on a 32 > bit or 64 bit architecture does not make sense. Yes, it does. Note that on 32bit ones stat64(2) *will* return an arbitrary value. On 64bit ones stat(2) will. > If a tool calling stat(2) can not handle 12 bit major/20 bits minor, > it would already break or about to break when running on a 64 bit > machine. > > Regarding SCSI, the 17th disk will use SCSI_DISK1_MAJOR (65). Only the > 257th disk will use the first scsi major (8) again and need a minor > greater than 256. (see sd_major() in drivers/scsi/sd.c for details). *nod* It's been years since I last looked at sd.c, TBH... Said that, with NFS it's definitely a minor per superblock, and it's not the only set_anon_super() user. Having a bunch of filesystems mounted over NFS will suffice... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 3, 2015 at 5:47 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Mar 03, 2015 at 05:37:31PM -0800, Gwendal Grignou wrote: >> At least, to base the device format on whether we are running on a 32 >> bit or 64 bit architecture does not make sense. > > Yes, it does. Note that on 32bit ones stat64(2) *will* return an arbitrary > value. On 64bit ones stat(2) will. I see, we - just - have to replace stat() calls with stat64(). It will work because on 64bit architecture, both stat and stat64 map to sys_newstat and on 32 bits, stat64 maps to sys_stat64 which does the right thing. It is a little like we are now using BLKGETSIZE64 instead of BLKGETSIZE because disks can be larger than 2TB. > >> If a tool calling stat(2) can not handle 12 bit major/20 bits minor, >> it would already break or about to break when running on a 64 bit >> machine. >> >> Regarding SCSI, the 17th disk will use SCSI_DISK1_MAJOR (65). Only the >> 257th disk will use the first scsi major (8) again and need a minor >> greater than 256. (see sd_major() in drivers/scsi/sd.c for details). > > *nod* > > It's been years since I last looked at sd.c, TBH... > > Said that, with NFS it's definitely a minor per superblock, and it's not the > only set_anon_super() user. Having a bunch of filesystems mounted over NFS > will suffice... or a bunch of ISCSI targets. Thanks for your insight, Gwendal. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/stat.c b/fs/stat.c index ae0c3ce..0a08e83 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -213,15 +213,6 @@ SYSCALL_DEFINE2(fstat, unsigned int, fd, struct __old_kernel_stat __user *, stat #endif /* __ARCH_WANT_OLD_STAT */ -#if BITS_PER_LONG == 32 -# define choose_32_64(a,b) a -#else -# define choose_32_64(a,b) b -#endif - -#define valid_dev(x) choose_32_64(old_valid_dev,new_valid_dev)(x) -#define encode_dev(x) choose_32_64(old_encode_dev,new_encode_dev)(x) - #ifndef INIT_STRUCT_STAT_PADDING # define INIT_STRUCT_STAT_PADDING(st) memset(&st, 0, sizeof(st)) #endif @@ -230,7 +221,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf) { struct stat tmp; - if (!valid_dev(stat->dev) || !valid_dev(stat->rdev)) + if (!new_valid_dev(stat->dev) || !new_valid_dev(stat->rdev)) return -EOVERFLOW; #if BITS_PER_LONG == 32 if (stat->size > MAX_NON_LFS) @@ -238,7 +229,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf) #endif INIT_STRUCT_STAT_PADDING(tmp); - tmp.st_dev = encode_dev(stat->dev); + tmp.st_dev = new_encode_dev(stat->dev); tmp.st_ino = stat->ino; if (sizeof(tmp.st_ino) < sizeof(stat->ino) && tmp.st_ino != stat->ino) return -EOVERFLOW; @@ -248,7 +239,7 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf) return -EOVERFLOW; SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid)); SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid)); - tmp.st_rdev = encode_dev(stat->rdev); + tmp.st_rdev = new_encode_dev(stat->rdev); tmp.st_size = stat->size; tmp.st_atime = stat->atime.tv_sec; tmp.st_mtime = stat->mtime.tv_sec;
Previously, stat on 32-bit platforms used old-style encoding and validation of major-minor pairs, with 8:8 bits. However, dev_t is 32-bits on these platforms, and coreutils seems to be treating the values as 12:20 new-style values. The only reason to use the old version is in the implementation of a legacy filesystem format which only has 16 bits of space. Communicating a 12:20 value to userspace when sizeof dev_t == 4 can be done on either a 32-bit or 64-bit platform. This patch removes the artificial restriction in major:minor size. The only backwards incompatibility which results is sometimes stat succeeds if -EOVERFLOW would be returned otherwise. The legacy old-style stat call is retained as-is. Signed-off-by: Dan Ehrenberg <dehrenberg@chromium.org> --- fs/stat.c | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-)