Message ID | 20220614033802.1606738-1-asmadeus@codewreck.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | 9p: fix EBADF errors in cached mode | expand |
Dominique Martinet wrote on Tue, Jun 14, 2022 at 12:38:02PM +0900: > cached operations sometimes need to do invalid operations (e.g. read > on a write only file) > Historic fscache had added a "writeback fid" for this, but the conversion > to new fscache somehow lost usage of it: use the writeback fid instead > of normal one. > > Note that the way this works (writeback fid being linked to inode) means > we might use overprivileged fid for some operations, e.g. write as root > when we shouldn't. > Ideally we should keep both fids handy, and only use the writeback fid > when really required e.g. reads to a write-only file to fill in the page > cache (read-modify-write); but this is the situation we've always had > and this commit only fixes an issue we've had for too long. > > Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching") > Cc: stable@vger.kernel.org > Cc: David Howells <dhowells@redhat.com> > Reported-By: Christian Schoenebeck <linux_oss@crudebyte.com> > Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> > --- > Ok so finally had time to look at this, and it's not a lot so this is > the most straight forward way to do: just reverting to how the old > fscache worked. > > This appears to work from quick testing, Chiristian could you test it? > > I think the warnings you added in p9_client_read/write that check > fid->mode might a lot of sense, if you care to resend it as > WARN_ON((fid->mode & ACCMODE) == O_xyz); > instead I'll queue that for 5.20 > > > @Stable people, I've checked it applies to 5.17 and 5.18 so should be > good to grab once I submit it for inclusion (that commit was included in > 5.16, which is no longer stable) > > > fs/9p/vfs_addr.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c > index 7382c5227e94..262968d02f55 100644 > --- a/fs/9p/vfs_addr.c > +++ b/fs/9p/vfs_addr.c > @@ -58,7 +58,11 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) > */ > static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) > { > - struct p9_fid *fid = file->private_data; > + struct inode *inode = file_inode(file); > + struct v9fs_inode *v9inode = V9FS_I(inode); > + struct p9_fid *fid = v9inode->writeback_fid; > + Sorry for mails back-to-back (grmbl I hate git commit --amend not warning I only have unstaged changes), this is missing the following here: + /* If there is no writeback fid this file only ever has had + * read-only opens, so we can use file's fid which should + * always be set instead */ + if (!fid) + fid = file->private_data; Christian, you can find it here to test: https://github.com/martinetd/linux/commit/a6e033c41cc9f0ec105f5d208b0a820118e2bda8 > + BUG_ON(!fid); > > p9_fid_get(fid); > rreq->netfs_priv = fid; Thanks,
On Dienstag, 14. Juni 2022 05:41:40 CEST Dominique Martinet wrote: > Dominique Martinet wrote on Tue, Jun 14, 2022 at 12:38:02PM +0900: > > cached operations sometimes need to do invalid operations (e.g. read > > on a write only file) > > Historic fscache had added a "writeback fid" for this, but the conversion > > to new fscache somehow lost usage of it: use the writeback fid instead > > of normal one. > > > > Note that the way this works (writeback fid being linked to inode) means > > we might use overprivileged fid for some operations, e.g. write as root > > when we shouldn't. > > Ideally we should keep both fids handy, and only use the writeback fid > > when really required e.g. reads to a write-only file to fill in the page > > cache (read-modify-write); but this is the situation we've always had > > and this commit only fixes an issue we've had for too long. > > > > Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do > > reads and caching") Cc: stable@vger.kernel.org > > Cc: David Howells <dhowells@redhat.com> > > Reported-By: Christian Schoenebeck <linux_oss@crudebyte.com> > > Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> > > --- > > Ok so finally had time to look at this, and it's not a lot so this is > > the most straight forward way to do: just reverting to how the old > > fscache worked. > > > > This appears to work from quick testing, Chiristian could you test it? > > > > I think the warnings you added in p9_client_read/write that check > > fid->mode might a lot of sense, if you care to resend it as > > WARN_ON((fid->mode & ACCMODE) == O_xyz); > > instead I'll queue that for 5.20 > > > > > > @Stable people, I've checked it applies to 5.17 and 5.18 so should be > > good to grab once I submit it for inclusion (that commit was included in > > 5.16, which is no longer stable) > > > > fs/9p/vfs_addr.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c > > index 7382c5227e94..262968d02f55 100644 > > --- a/fs/9p/vfs_addr.c > > +++ b/fs/9p/vfs_addr.c > > @@ -58,7 +58,11 @@ static void v9fs_issue_read(struct netfs_io_subrequest > > *subreq)> > > */ > > > > static int v9fs_init_request(struct netfs_io_request *rreq, struct file > > *file) { > > > > - struct p9_fid *fid = file->private_data; > > + struct inode *inode = file_inode(file); > > + struct v9fs_inode *v9inode = V9FS_I(inode); > > + struct p9_fid *fid = v9inode->writeback_fid; > > + > > Sorry for mails back-to-back (grmbl I hate git commit --amend not > warning I only have unstaged changes), this is missing the following > here: I think git does actually. It shows you staged and unstaged changes as comment below the commit log text inside the editor. Not as a big fat warning, but the info is there. > + /* If there is no writeback fid this file only ever has had > + * read-only opens, so we can use file's fid which should > + * always be set instead */ > + if (!fid) > + fid = file->private_data; > > Christian, you can find it here to test: > https://github.com/martinetd/linux/commit/a6e033c41cc9f0ec105f5d208b0a820118 > e2bda8 > > + BUG_ON(!fid); > > > > p9_fid_get(fid); > > rreq->netfs_priv = fid; It definitely goes into the right direction, but I think it's going a bit too far by using writeback_fid also in cases where it is not necessary and wasn't used before in the past. What about something like this in v9fs_init_request() (yet untested): /* writeback_fid is always opened O_RDWR (instead of just O_WRONLY) * explicitly for this case: partial write backs that require a read * prior to actual write and therefore requires a fid with read * capability. */ if (rreq->origin == NETFS_READ_FOR_WRITE) fid = v9inode->writeback_fid; If desired, this could be further constrained later on like: if (rreq->origin == NETFS_READ_FOR_WRITE && (fid->mode & O_ACCMODE) == O_WRONLY) { fid = v9inode->writeback_fid; } I will definitely give these options some test spins here, a short feedback ahead would be appreciated though. Thanks Dominique! Best regards, Christian Schoenebeck
Christian Schoenebeck wrote on Tue, Jun 14, 2022 at 02:10:01PM +0200: > It definitely goes into the right direction, but I think it's going a bit too > far by using writeback_fid also in cases where it is not necessary and wasn't > used before in the past. Would help if I had an idea of what was used where in the past.. :) From a quick look at the code, checking out v5.10, v9fs_vfs_writepage_locked() used the writeback fid always for all writes v9fs_vfs_readpages is a bit more complex but only seems to be using the "direct" private_data fid for reads... It took me a bit of time but I think the reads you were seeing on writeback fid come from v9fs_write_begin that does some readpage on the writeback fid to populate the page before a non-filling write happens. > What about something like this in v9fs_init_request() (yet untested): > > /* writeback_fid is always opened O_RDWR (instead of just O_WRONLY) > * explicitly for this case: partial write backs that require a read > * prior to actual write and therefore requires a fid with read > * capability. > */ > if (rreq->origin == NETFS_READ_FOR_WRITE) > fid = v9inode->writeback_fid; ... Which seems to be exactly what this origin is about, so if that works I'm all for it. > If desired, this could be further constrained later on like: > > if (rreq->origin == NETFS_READ_FOR_WRITE && > (fid->mode & O_ACCMODE) == O_WRONLY) > { > fid = v9inode->writeback_fid; > } That also makes sense, if the fid mode has read permissions we might as well use these as the writeback fid would needlessly be doing root IOs. > I will definitely give these options some test spins here, a short feedback > ahead would be appreciated though. Please let me know how that works out, I'd be happy to use either of your versions instead of mine. If I can be greedy though I'd like to post it together with the other couple of fixes next week, so having something before the end of the week would be great -- I think even my first overkill version early and building on it would make sense at this point. But I think you've got the right end, so hopefully won't be needing to delay Cheers,
On Dienstag, 14. Juni 2022 14:45:38 CEST Dominique Martinet wrote: > Christian Schoenebeck wrote on Tue, Jun 14, 2022 at 02:10:01PM +0200: > > It definitely goes into the right direction, but I think it's going a bit > > too far by using writeback_fid also in cases where it is not necessary > > and wasn't used before in the past. > > Would help if I had an idea of what was used where in the past.. :) > > From a quick look at the code, checking out v5.10, > v9fs_vfs_writepage_locked() used the writeback fid always for all writes > v9fs_vfs_readpages is a bit more complex but only seems to be using the > "direct" private_data fid for reads... > It took me a bit of time but I think the reads you were seeing on > writeback fid come from v9fs_write_begin that does some readpage on the > writeback fid to populate the page before a non-filling write happens. Yes, the overall picture in the past was not clear to me either. To be more specific, I was reading your patch as if it would e.g. also use the writeback_fid if somebody explicitly called read() (i.e. not an implied read caused by a partial write back), and was concerned about a potential privilege escalation. Maybe it's just a theoretical issue, as this case is probably already catched on a higher, general fs handling level, but worth consideration. > > What about something like this in v9fs_init_request() (yet untested): > > /* writeback_fid is always opened O_RDWR (instead of just O_WRONLY) > > > > * explicitly for this case: partial write backs that require a read > > * prior to actual write and therefore requires a fid with read > > * capability. > > */ > > > > if (rreq->origin == NETFS_READ_FOR_WRITE) > > > > fid = v9inode->writeback_fid; > > ... Which seems to be exactly what this origin is about, so if that > works I'm all for it. > > > If desired, this could be further constrained later on like: > > if (rreq->origin == NETFS_READ_FOR_WRITE && > > > > (fid->mode & O_ACCMODE) == O_WRONLY) > > > > { > > > > fid = v9inode->writeback_fid; > > > > } > > That also makes sense, if the fid mode has read permissions we might as > well use these as the writeback fid would needlessly be doing root IOs. > > > I will definitely give these options some test spins here, a short > > feedback > > ahead would be appreciated though. > > Please let me know how that works out, I'd be happy to use either of > your versions instead of mine. > If I can be greedy though I'd like to post it together with the other > couple of fixes next week, so having something before the end of the > week would be great -- I think even my first overkill version early and > building on it would make sense at this point. > > But I think you've got the right end, so hopefully won't be needing to > delay I need a day or two for testing, then I will report back for sure. So it should perfectly fit into your intended schedule. Thanks! Best regards, Christian Schoenebeck
On Dienstag, 14. Juni 2022 16:11:35 CEST Christian Schoenebeck wrote: > On Dienstag, 14. Juni 2022 14:45:38 CEST Dominique Martinet wrote: [...] > > Please let me know how that works out, I'd be happy to use either of > > your versions instead of mine. > > If I can be greedy though I'd like to post it together with the other > > couple of fixes next week, so having something before the end of the > > week would be great -- I think even my first overkill version early and > > building on it would make sense at this point. > > > > But I think you've got the right end, so hopefully won't be needing to > > delay > > I need a day or two for testing, then I will report back for sure. So it > should perfectly fit into your intended schedule. Two things: 1. your EBADF patch is based on you recent get/put refactoring patch, so it won't apply on stable. 2. I fixed the conflict and gave your patch a test spin, and it triggers the BUG_ON(!fid); that you added with that patch. Backtrace based on 30306f6194ca ("Merge tag 'hardening-v5.19-rc3' ..."): [ 2.211473] kernel BUG at fs/9p/vfs_addr.c:65! ... [ 2.244415] netfs_alloc_request (fs/netfs/objects.c:42) netfs [ 2.245438] netfs_readahead (fs/netfs/buffered_read.c:166) netfs [ 2.246392] read_pages (./include/linux/pagemap.h:1264 ./include/linux/pagemap.h:1306 mm/readahead.c:164) [ 2.247120] ? folio_add_lru (./arch/x86/include/asm/preempt.h:103 mm/swap.c:468) [ 2.247911] page_cache_ra_unbounded (./include/linux/fs.h:808 mm/readahead.c:264) [ 2.248875] filemap_get_pages (mm/filemap.c:2594) [ 2.249723] filemap_read (mm/filemap.c:2679) [ 2.250478] ? ptep_set_access_flags (./arch/x86/include/asm/paravirt.h:441 arch/x86/mm/pgtable.c:493) [ 2.251417] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:186) [ 2.252253] ? do_wp_page (mm/memory.c:3293 mm/memory.c:3393) [ 2.253012] ? aa_file_perm (security/apparmor/file.c:604) [ 2.253824] new_sync_read (fs/read_write.c:402 (discriminator 1)) [ 2.254616] vfs_read (fs/read_write.c:482) [ 2.255313] ksys_read (fs/read_write.c:620) [ 2.256000] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 2.256764] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) Did your patch work there for you? I mean I have not applied the other pending 9p patches, but they should not really make difference, right? I won't have time today, but I will continue to look at it tomorrow. If you already had some thoughts on this, that would be great of course. Best regards, Christian Schoenebeck
Christian Schoenebeck wrote on Thu, Jun 16, 2022 at 03:35:59PM +0200: > 2. I fixed the conflict and gave your patch a test spin, and it triggers > the BUG_ON(!fid); that you added with that patch. Backtrace based on > 30306f6194ca ("Merge tag 'hardening-v5.19-rc3' ..."): hm, that's probably the version I sent without the fallback to private_data fid if writeback fid was sent (I've only commented without sending a v2) > 1. your EBADF patch is based on you recent get/put refactoring patch, so it won't apply on stable. ugh, you are correct, that was wrong as well in the version I sent by mail... I've hurried that way too much. The patch that's currently on the tip of my 9p-next branch should be alright though, I'll resend it now so you can apply cleanly if you don't want to fetch https://github.com/martinetd/linux/commits/9p-next > Did your patch work there for you? I mean I have not applied the other pending > 9p patches, but they should not really make difference, right? I won't have > time today, but I will continue to look at it tomorrow. If you already had > some thoughts on this, that would be great of course. Yes, my version passes basic tests at least, and I could no longer reproduce the problem. Without the if (!fid) fid = file->private_data though it does fail horribly like you've found out. -- Dominique
Dominique Martinet wrote on Thu, Jun 16, 2022 at 10:51:31PM +0900: > > Did your patch work there for you? I mean I have not applied the other pending > > 9p patches, but they should not really make difference, right? I won't have > > time today, but I will continue to look at it tomorrow. If you already had > > some thoughts on this, that would be great of course. > > Yes, my version passes basic tests at least, and I could no longer > reproduce the problem. For what it's worth I've also tested a version of your patch: ----- diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index a8f512b44a85..d0833fa69faf 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) { + struct inode *inode = file_inode(file); + struct v9fs_inode *v9inode = V9FS_I(inode); struct p9_fid *fid = file->private_data; + BUG_ON(!fid); + + /* we might need to read from a fid that was opened write-only + * for read-modify-write of page cache, use the writeback fid + * for that */ + if (rreq->origin == NETFS_READ_FOR_WRITE && + (fid->mode & O_ACCMODE) == O_WRONLY) { + fid = v9inode->writeback_fid; + BUG_ON(!fid); + } + refcount_inc(&fid->count); rreq->netfs_priv = fid; return 0; ----- And this also seems to work alright. I was about to ask why the original code did writes with the writeback fid, but I'm noticing now the current code still does (through v9fs_vfs_write_folio_locked()), so that part hasn't changed from the old code, and init_request will only be getting reads? Which actually makes sense now I'm thinking about it because I recall David saying he's working on netfs writes now... So that minimal version is probably what we want, give or take style adjustments (only initializing inode/v9inode in the if case or not) -- I sure hope compilers optimizes it away when not needed. I'll let you test one or both versions and will fixup the commit message again/credit you/resend if we go with this version, unless you want to send it. -- Dominique
On Donnerstag, 16. Juni 2022 15:51:31 CEST Dominique Martinet wrote: > Christian Schoenebeck wrote on Thu, Jun 16, 2022 at 03:35:59PM +0200: > > 2. I fixed the conflict and gave your patch a test spin, and it triggers > > the BUG_ON(!fid); that you added with that patch. Backtrace based on > > > 30306f6194ca ("Merge tag 'hardening-v5.19-rc3' ..."): > hm, that's probably the version I sent without the fallback to > private_data fid if writeback fid was sent (I've only commented without > sending a v2) Right, I forgot that you queued another version, sorry. With your already queued patch (today's v2) that's fine now. On Donnerstag, 16. Juni 2022 16:11:16 CEST Dominique Martinet wrote: > Dominique Martinet wrote on Thu, Jun 16, 2022 at 10:51:31PM +0900: > > > Did your patch work there for you? I mean I have not applied the other > > > pending 9p patches, but they should not really make difference, right? > > > I won't have time today, but I will continue to look at it tomorrow. If > > > you already had some thoughts on this, that would be great of course. > > > > Yes, my version passes basic tests at least, and I could no longer > > reproduce the problem. > > For what it's worth I've also tested a version of your patch: > > ----- > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c > index a8f512b44a85..d0833fa69faf 100644 > --- a/fs/9p/vfs_addr.c > +++ b/fs/9p/vfs_addr.c > @@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest > *subreq) */ > static int v9fs_init_request(struct netfs_io_request *rreq, struct file > *file) { > + struct inode *inode = file_inode(file); > + struct v9fs_inode *v9inode = V9FS_I(inode); > struct p9_fid *fid = file->private_data; > > + BUG_ON(!fid); > + > + /* we might need to read from a fid that was opened write-only > + * for read-modify-write of page cache, use the writeback fid > + * for that */ > + if (rreq->origin == NETFS_READ_FOR_WRITE && > + (fid->mode & O_ACCMODE) == O_WRONLY) { > + fid = v9inode->writeback_fid; > + BUG_ON(!fid); > + } > + > refcount_inc(&fid->count); > rreq->netfs_priv = fid; > return 0; > ----- > > And this also seems to work alright. > > I was about to ask why the original code did writes with the writeback > fid, but I'm noticing now the current code still does (through > v9fs_vfs_write_folio_locked()), so that part hasn't changed from the old > code, and init_request will only be getting reads? Which actually makes > sense now I'm thinking about it because I recall David saying he's > working on netfs writes now... > > So that minimal version is probably what we want, give or take style > adjustments (only initializing inode/v9inode in the if case or not) -- I > sure hope compilers optimizes it away when not needed. > > > I'll let you test one or both versions and will fixup the commit message > again/credit you/resend if we go with this version, unless you want to > send it. > > -- > Dominique I tested all 3 variants today, and they were all behaving correctly (no EBADF errors anymore, no other side effects observed). The minimalistic version (i.e. your initial suggestion) performed 20% slower in my tests, but that could be due to the fact that it was simply the 1st version I tested, so caching on host side might be the reason. If necessary I can check the performance aspect more thoroughly. Personally I would at least use the NETFS_READ_FOR_WRITE version, but that's up to you. On doubt, clarify with David's plans. Feel free to add my RB and TB tags to any of the 3 version(s) you end up queuing: Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Tested-by: Christian Schoenebeck <linux_oss@crudebyte.com> Best regards, Christian Schoenebeck
Christian Schoenebeck wrote on Thu, Jun 16, 2022 at 10:14:16PM +0200: > I tested all 3 variants today, and they were all behaving correctly (no EBADF > errors anymore, no other side effects observed). Thanks! > The minimalistic version (i.e. your initial suggestion) performed 20% slower > in my tests, but that could be due to the fact that it was simply the 1st > version I tested, so caching on host side might be the reason. If necessary I > can check the performance aspect more thoroughly. hmm, yeah we open the writeback fids anyway so I'm not sure what would be really different performance-wise, but I'd tend to go with the most restricted change anyway. > Personally I would at least use the NETFS_READ_FOR_WRITE version, but that's > up to you. On doubt, clarify with David's plans. > > Feel free to add my RB and TB tags to any of the 3 version(s) you end up > queuing: > > Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> > Tested-by: Christian Schoenebeck <linux_oss@crudebyte.com> Thanks, I'll add these and resend the last version for archival on the list / commit message wording check. At last that issue closed... -- Dominique
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 7382c5227e94..262968d02f55 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,7 +58,11 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) { - struct p9_fid *fid = file->private_data; + struct inode *inode = file_inode(file); + struct v9fs_inode *v9inode = V9FS_I(inode); + struct p9_fid *fid = v9inode->writeback_fid; + + BUG_ON(!fid); p9_fid_get(fid); rreq->netfs_priv = fid;
cached operations sometimes need to do invalid operations (e.g. read on a write only file) Historic fscache had added a "writeback fid" for this, but the conversion to new fscache somehow lost usage of it: use the writeback fid instead of normal one. Note that the way this works (writeback fid being linked to inode) means we might use overprivileged fid for some operations, e.g. write as root when we shouldn't. Ideally we should keep both fids handy, and only use the writeback fid when really required e.g. reads to a write-only file to fill in the page cache (read-modify-write); but this is the situation we've always had and this commit only fixes an issue we've had for too long. Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching") Cc: stable@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Reported-By: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> --- Ok so finally had time to look at this, and it's not a lot so this is the most straight forward way to do: just reverting to how the old fscache worked. This appears to work from quick testing, Chiristian could you test it? I think the warnings you added in p9_client_read/write that check fid->mode might a lot of sense, if you care to resend it as WARN_ON((fid->mode & ACCMODE) == O_xyz); instead I'll queue that for 5.20 @Stable people, I've checked it applies to 5.17 and 5.18 so should be good to grab once I submit it for inclusion (that commit was included in 5.16, which is no longer stable) fs/9p/vfs_addr.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)