Message ID | 164329971128.5879.15718457509790221509.stgit@bazille.1015granger.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NFSD size, offset, and count sanity | expand |
On Thu, Jan 27, 2022 at 11:08:31AM -0500, Chuck Lever wrote: > iattr::ia_size is a loff_t. decode_fattr4() dumps a full u64 value > in there. If that value is larger than S64_MAX, then ia_size has > underflowed. > > In this case the negative size is passed through to the VFS and > underlying filesystems. I've observed XFS behavior: it returns > EIO but still attempts to access past the end of the device. What attempts to access beyond the end of the device? A file offset is not a disk offset, and the filesystem cannot allocate blocks for IO that are outside the device boundaries. So I don't understand how setting an inode size of >LLONGMAX can cause the filesystem to access blocks outside the range it can allocate and map IO to. If this falls through to trying to access data outside the range the filesystem is allowed to access then we've got a bug that needs to be fixed. Can you please clarify the behaviour that is occurring here (stack traces demonstrating the IO path that leads to access past the end of device would be useful) so we can look into this further? > IOW it assumes the caller has already sanity-checked the value. Every filesystem assumes that the iattr that is passed to ->setattr by notify_change() has been sanity checked and the parameters are within the valid VFS supported ranges, not just XFS. Perhaps this check should be in notify_change, not in the callers? Cheers, Dave.
> On Jan 27, 2022, at 7:36 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Thu, Jan 27, 2022 at 11:08:31AM -0500, Chuck Lever wrote: >> iattr::ia_size is a loff_t. decode_fattr4() dumps a full u64 value >> in there. If that value is larger than S64_MAX, then ia_size has >> underflowed. >> >> In this case the negative size is passed through to the VFS and >> underlying filesystems. I've observed XFS behavior: it returns >> EIO but still attempts to access past the end of the device. > > What attempts to access beyond the end of the device? A file offset > is not a disk offset, and the filesystem cannot allocate blocks for > IO that are outside the device boundaries. So I don't understand how > setting an inode size of >LLONGMAX can cause the filesystem to > access blocks outside the range it can allocate and map IO to. If > this falls through to trying to access data outside the range the > filesystem is allowed to access then we've got a bug that needs to > be fixed. > > Can you please clarify the behaviour that is occurring here (stack > traces demonstrating the IO path that leads to access past the end > of device would be useful) so we can look into this further? Reproducer: I constructed a pynfs test that sends an NFSv4.0 SETATTR request to set the file length to U64_MAX. That test was applied to pynfs today. git://git.linux-nfs.org/projects/bfields/pynfs.git I will note that I tried this test against a tmpfs export as well. No crash, but a subsequent GETATTR returned U64_MAX, which is surprising. There's really no checking in that path either. Note below: md0 is the device where this filesystem resides. It's a pair of 3TB PCIe NVMe cards striped together. Kernel at the time on this system was 5.17-rc1 + a few NFSD patches. Jan 26 16:01:26 klimt.1015granger.net rpc.mountd[972]: v4.0 client attached: 0x61bb6d4261eef9f6 from "192.168.1.67:53636" Jan 26 16:01:26 klimt.1015granger.net kernel: ------------[ cut here ]------------ Jan 26 16:01:26 klimt.1015granger.net kernel: WARNING: CPU: 2 PID: 1009 at fs/iomap/iter.c:33 iomap_iter+0x1b5/0x272 Jan 26 16:01:26 klimt.1015granger.net kernel: Modules linked in: rfkill rpcrdma rdma_ucm ib_srpt ib_umad ib_isert ib_ipoib iscsi_target_mod ib_> Jan 26 16:01:26 klimt.1015granger.net kernel: CPU: 2 PID: 1009 Comm: nfsd Not tainted 5.17.0-rc1-00165-g2785fad9b745 #3338 Jan 26 16:01:26 klimt.1015granger.net kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.3 10/28/2020 Jan 26 16:01:26 klimt.1015granger.net kernel: RIP: 0010:iomap_iter+0x1b5/0x272 Jan 26 16:01:26 klimt.1015granger.net kernel: Code: 8b 73 08 49 8b 04 24 4d 89 e9 4d 89 f0 48 8b 3b e8 c0 79 8e 00 85 c0 0f 88 c1 00 00 00 48 8> Jan 26 16:01:26 klimt.1015granger.net kernel: RSP: 0018:ffffa65701ea3a80 EFLAGS: 00010213 Jan 26 16:01:26 klimt.1015granger.net kernel: RAX: 0000000000000000 RBX: ffffa65701ea3ad0 RCX: ffff8ada8cf0f840 Jan 26 16:01:26 klimt.1015granger.net kernel: RDX: ffffffffffffffff RSI: ffff8ada49411000 RDI: ffff8ada8cf0f840 Jan 26 16:01:26 klimt.1015granger.net kernel: RBP: ffffa65701ea3aa0 R08: ffff8ada4a56b000 R09: ffffa65701ea3b40 Jan 26 16:01:26 klimt.1015granger.net kernel: R10: ffffa65701ea39a0 R11: 000000000efc25f5 R12: ffffffffc06d6100 Jan 26 16:01:26 klimt.1015granger.net kernel: R13: ffffa65701ea3b40 R14: ffffa65701ea3af8 R15: ffff8ada4a56b000 Jan 26 16:01:26 klimt.1015granger.net kernel: FS: 0000000000000000(0000) GS:ffff8ae97fd00000(0000) knlGS:0000000000000000 Jan 26 16:01:26 klimt.1015granger.net kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 26 16:01:26 klimt.1015granger.net kernel: CR2: 00007ffff8126260 CR3: 00000001a16dc001 CR4: 00000000001706e0 Jan 26 16:01:26 klimt.1015granger.net kernel: Call Trace: Jan 26 16:01:26 klimt.1015granger.net kernel: <TASK> Jan 26 16:01:26 klimt.1015granger.net kernel: iomap_zero_range+0x6c/0x1a9 Jan 26 16:01:26 klimt.1015granger.net kernel: ? __radix_tree_lookup+0x2f/0xac Jan 26 16:01:26 klimt.1015granger.net kernel: iomap_truncate_page+0x31/0x36 Jan 26 16:01:26 klimt.1015granger.net kernel: xfs_truncate_page+0x39/0x3b [xfs] Jan 26 16:01:26 klimt.1015granger.net kernel: xfs_setattr_size+0x11a/0x306 [xfs] Jan 26 16:01:26 klimt.1015granger.net kernel: xfs_vn_setattr_size+0x4e/0x57 [xfs] Jan 26 16:01:26 klimt.1015granger.net kernel: xfs_vn_setattr+0x67/0xb1 [xfs] Jan 26 16:01:26 klimt.1015granger.net kernel: notify_change+0x2ac/0x3a2 Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd_setattr+0x200/0x268 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: ? nfsd_setattr+0x200/0x268 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd4_setattr+0xf1/0x130 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd4_proc_compound+0x337/0x474 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd_dispatch+0x1a9/0x260 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: svc_process_common+0x331/0x4bc [sunrpc] Jan 26 16:01:26 klimt.1015granger.net kernel: ? nfsd_svc+0x2f5/0x2f5 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: svc_process+0xc8/0xe7 [sunrpc] Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd+0xdd/0x160 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: kthread+0xf7/0xff Jan 26 16:01:26 klimt.1015granger.net kernel: ? nfsd_shutdown_threads+0x65/0x65 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: ? kthread_complete_and_exit+0x20/0x20 Jan 26 16:01:26 klimt.1015granger.net kernel: ret_from_fork+0x22/0x30 Jan 26 16:01:26 klimt.1015granger.net kernel: </TASK> Jan 26 16:01:26 klimt.1015granger.net kernel: ---[ end trace 0000000000000000 ]--- Jan 26 16:01:26 klimt.1015granger.net kernel: ------------[ cut here ]------------ Jan 26 16:01:26 klimt.1015granger.net kernel: WARNING: CPU: 2 PID: 1009 at fs/iomap/iter.c:35 iomap_iter+0x1ca/0x272 Jan 26 16:01:26 klimt.1015granger.net kernel: Modules linked in: rfkill rpcrdma rdma_ucm ib_srpt ib_umad ib_isert ib_ipoib iscsi_target_mod ib_> Jan 26 16:01:26 klimt.1015granger.net kernel: CPU: 2 PID: 1009 Comm: nfsd Tainted: G W 5.17.0-rc1-00165-g2785fad9b745 #3338 Jan 26 16:01:26 klimt.1015granger.net kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.3 10/28/2020 Jan 26 16:01:26 klimt.1015granger.net kernel: RIP: 0010:iomap_iter+0x1ca/0x272 Jan 26 16:01:26 klimt.1015granger.net kernel: Code: 85 c0 0f 88 c1 00 00 00 48 8b 43 30 48 8b 53 08 48 39 d0 7e 02 0f 0b 48 8b 4b 38 48 85 c9 7> Jan 26 16:01:26 klimt.1015granger.net kernel: RSP: 0018:ffffa65701ea3a80 EFLAGS: 00010293 Jan 26 16:01:26 klimt.1015granger.net kernel: RAX: f8ada41a253c0000 RBX: ffffa65701ea3ad0 RCX: f8ada41a253c0000 Jan 26 16:01:26 klimt.1015granger.net kernel: RDX: ffffffffffffffff RSI: ffff8ada49411000 RDI: ffff8ada8cf0f840 Jan 26 16:01:26 klimt.1015granger.net kernel: RBP: ffffa65701ea3aa0 R08: ffff8ada4a56b000 R09: ffffa65701ea3b40 Jan 26 16:01:26 klimt.1015granger.net kernel: R10: ffffa65701ea39a0 R11: 000000000efc25f5 R12: ffffffffc06d6100 Jan 26 16:01:26 klimt.1015granger.net kernel: R13: ffffa65701ea3b40 R14: ffffa65701ea3af8 R15: ffff8ada4a56b000 Jan 26 16:01:26 klimt.1015granger.net kernel: FS: 0000000000000000(0000) GS:ffff8ae97fd00000(0000) knlGS:0000000000000000 Jan 26 16:01:26 klimt.1015granger.net kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 26 16:01:26 klimt.1015granger.net kernel: CR2: 00007ffff8126260 CR3: 00000001a16dc001 CR4: 00000000001706e0 Jan 26 16:01:26 klimt.1015granger.net kernel: Call Trace: Jan 26 16:01:26 klimt.1015granger.net kernel: <TASK> Jan 26 16:01:26 klimt.1015granger.net kernel: iomap_zero_range+0x6c/0x1a9 Jan 26 16:01:26 klimt.1015granger.net kernel: ? __radix_tree_lookup+0x2f/0xac Jan 26 16:01:26 klimt.1015granger.net kernel: iomap_truncate_page+0x31/0x36 Jan 26 16:01:26 klimt.1015granger.net kernel: xfs_truncate_page+0x39/0x3b [xfs] Jan 26 16:01:26 klimt.1015granger.net kernel: xfs_setattr_size+0x11a/0x306 [xfs] Jan 26 16:01:26 klimt.1015granger.net kernel: xfs_vn_setattr_size+0x4e/0x57 [xfs] Jan 26 16:01:26 klimt.1015granger.net kernel: xfs_vn_setattr+0x67/0xb1 [xfs] Jan 26 16:01:26 klimt.1015granger.net kernel: notify_change+0x2ac/0x3a2 Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd_setattr+0x200/0x268 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: ? nfsd_setattr+0x200/0x268 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd4_setattr+0xf1/0x130 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd4_proc_compound+0x337/0x474 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd_dispatch+0x1a9/0x260 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: svc_process_common+0x331/0x4bc [sunrpc] Jan 26 16:01:26 klimt.1015granger.net kernel: ? nfsd_svc+0x2f5/0x2f5 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: svc_process+0xc8/0xe7 [sunrpc] Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd+0xdd/0x160 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: kthread+0xf7/0xff Jan 26 16:01:26 klimt.1015granger.net kernel: ? nfsd_shutdown_threads+0x65/0x65 [nfsd] Jan 26 16:01:26 klimt.1015granger.net kernel: ? kthread_complete_and_exit+0x20/0x20 Jan 26 16:01:26 klimt.1015granger.net kernel: ret_from_fork+0x22/0x30 Jan 26 16:01:26 klimt.1015granger.net kernel: </TASK> Jan 26 16:01:26 klimt.1015granger.net kernel: ---[ end trace 0000000000000000 ]--- Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd: attempt to access beyond end of device md0: rw=2048, want=19907765165852672, limit=12501942272 >> IOW it assumes the caller has already sanity-checked the value. > > Every filesystem assumes that the iattr that is passed to ->setattr > by notify_change() has been sanity checked and the parameters are > within the valid VFS supported ranges, not just XFS. Perhaps this > check should be in notify_change, not in the callers? My (limited) understanding of the VFS code is that functions at the notify_change() level expect that its callers will have already sanitized the input -- those callers are largely the system call routines. That's why I chose to address this in NFSD. Maybe inode_newsize_ok() needs to check for negative size values? -- Chuck Lever
On Fri, Jan 28, 2022 at 01:48:48AM +0000, Chuck Lever III wrote: > > > > On Jan 27, 2022, at 7:36 PM, Dave Chinner <david@fromorbit.com> wrote: > > > > On Thu, Jan 27, 2022 at 11:08:31AM -0500, Chuck Lever wrote: > >> IOW it assumes the caller has already sanity-checked the value. > > > > Every filesystem assumes that the iattr that is passed to ->setattr > > by notify_change() has been sanity checked and the parameters are > > within the valid VFS supported ranges, not just XFS. Perhaps this > > check should be in notify_change, not in the callers? > > My (limited) understanding of the VFS code is that functions at > the notify_change() level expect that its callers will have > already sanitized the input -- those callers are largely the > system call routines. That's why I chose to address this in NFSD. > > Maybe inode_newsize_ok() needs to check for negative size values? Yeah, that would seem reasonable - the size passed to it is a loff_t, and it's not checked for overflows/negative values. So if it checked for offset < 0 if would catch this.... Cheers, Dave.
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index ed1ee25647be..b8ac2b9bce74 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -972,6 +972,9 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, int err; if (setattr->sa_iattr.ia_valid & ATTR_SIZE) { + if (setattr->sa_iattr.ia_size < 0) + return nfserr_fbig; + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &setattr->sa_stateid, WR_STATE, NULL, NULL);