Message ID | 20230220135225.91b0f28344c01d5306c31230@linux-foundation.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] MM updates for 6.3-rc1 | expand |
On Mon, 20 Feb 2023 13:52:25 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > There are a lot of conflicts in your future, mainly because of the > ongoing folio conversion work. This will hopefully come to an end > fairly soon. Forthcoming conflicts which are known about, along with > Stephen's fixes are: > ... And... I failed to mention a conflict which didn't generate a reject, thanks to a post-6.2-rc4 gfs2 patch. Stephen's fix for this is at https://lkml.kernel.org/r/20230127173638.1efbe423@canb.auug.org.au Or, From: Andrew Morton <akpm@linux-foundation.org> Subject: fs/gfs2/log.c: fix build in __gfs2_writepage() Date: Tue Feb 21 03:23:08 PM PST 2023 mm-stable was based on 6.2-rc4 and hence its patch d585bdbeb79a ("fs: convert writepage_t callback to pass a folio") didn't know about the post-rc4 95ecbd0f162f ("Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one""). Net result is that d585bdbeb79a failed to convert fs/gfs2/log.c. The fix is from Andreas. Fixes: d585bdbeb79a ("fs: convert writepage_t callback to pass a folio") Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Link: https://lkml.kernel.org/r/20230203105401.3362277-1-agruenba@redhat.com Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- --- a/fs/gfs2/log.c~fs-gfs2-logc-fix-build-in-__gfs2_writepage +++ a/fs/gfs2/log.c @@ -80,11 +80,11 @@ void gfs2_remove_from_ail(struct gfs2_bu brelse(bd->bd_bh); } -static int __gfs2_writepage(struct page *page, struct writeback_control *wbc, - void *data) +static int __gfs2_writepage(struct folio *folio, struct writeback_control *wbc, + void *data) { struct address_space *mapping = data; - int ret = mapping->a_ops->writepage(page, wbc); + int ret = mapping->a_ops->writepage(&folio->page, wbc); mapping_set_error(mapping, ret); return ret; }
The pull request you sent on Mon, 20 Feb 2023 13:52:25 -0800:
> https://lkml.kernel.org/r/20230106125915.60c8b547@canb.auug.org.au drivers/infiniband/hw/hfi1/file_ops.c
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3822a7c40997dc86b1458766a3f146d62393f084
Thank you!
On Mon, Feb 20, 2023 at 1:52 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > Linus, please merge this cycle's MM updates. Ok, finally got around to this one, and I am not thrilled about it. A few notes: - my fs/udf/inode.c is "minimal". Ugly, ugly, I think udf_adinicb_writepage() should just be made to actually deal with folios inherently, but I did *NOT* do that, and just did struct page *page = &folio->page; at the top, and left it at that. I'm not proud of it, but I hope Jan might want to do this properly. That conflict wasn't mentioned, and now I wonder if the UDF changes were in -next at all? - the fs/cifs/file.c conflict was a nightmare. Yes, I saw David Howells resolution suggestion. I think that one was buggy. It would wait for a page under writeback, and then go on to the *next* one without writing it back. I don't thin kthat was right. That said, I'm not at all convinced my version is right either. I can't test it, and that means I probably messed up. It looked sane to me when I did it, and it builds cleanly, but I honestly doubt myself. I think that code should probably use xas_for_each_marked(), but instead I kept the old model, added a loop like DavidH did, and then munged the end result until I thought it was palatable. NOTE! Don't even try to look at the conflict diff. It makes no sense at all. But somebody (I'd hope all of DavidH, SteveF and Willy) should really take a look at my end result. - gcc 12.2.1 quite reasonable complains about some of the new MM code: mm/migrate.c: In function ‘__migrate_folio_extract’: mm/migrate.c:1050:20: note: randstruct: casting between randomized structure pointer types (ssa): ‘struct anon_vma’ and ‘struct address_space’ 1050 | *anon_vmap = (void *)dst->mapping; | ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~ and while this doesn't cause a build failure ("note" is different from "warning"), I do think something needs to be done. Gcc is right. This code seems to *work* simply because it's intentionally mis-casting pointers, but I think it needs to be seriously looked at and something done to make gcc happy (and a *LARGE* comment about it). That last note is not some merge result, it's purely about the new MM code. Anyway, the merge is done and pushed out, I just am not very confident about the end result at all. That cifs thing really needs somebody competent looking at it. I think I went through three different iterations of my resolution before I was happy with my approach, and "happy" may end up being more about having exhausted my will to live, than about the end result actually being any good. I saw some noise about ext4 being a nightmare too, but I haven't gotten that pull request yet. I'll tackle the non-MM pull next, but I'm taking a break first. Alcohol may have to be involved. Linus
On Thu, 23 Feb 2023 17:33:37 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Feb 20, 2023 at 1:52 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > Linus, please merge this cycle's MM updates. > > Ok, finally got around to this one, and I am not thrilled about it. > > A few notes: > > - my fs/udf/inode.c is "minimal". Ugly, ugly, I think > udf_adinicb_writepage() should just be made to actually deal with > folios inherently, but I did *NOT* do that, and just did > > struct page *page = &folio->page; > > at the top, and left it at that. I'm not proud of it, but I hope > Jan might want to do this properly. > > That conflict wasn't mentioned, and now I wonder if the UDF changes > were in -next at all? This was a mea culpa, sorry. Stephen did encounter and resolve this (https://lkml.kernel.org/r/20230127165912.0e4a7b66@canb.auug.org.au) but I was fixated on his "linux-next: manual merge of the mm-stable tree with the XXX tree" emails and not his "linux-next: build failure after merge of the mm tree" emails. Self-LART has been applied.
Linus Torvalds <torvalds@linux-foundation.org> writes: > > - gcc 12.2.1 quite reasonable complains about some of the new MM code: > > mm/migrate.c: In function ‘__migrate_folio_extract’: > mm/migrate.c:1050:20: note: randstruct: casting between randomized > structure pointer types (ssa): ‘struct anon_vma’ and ‘struct > address_space’ > > 1050 | *anon_vmap = (void *)dst->mapping; > | ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~ > > and while this doesn't cause a build failure ("note" is different > from "warning"), I do think something needs to be done. Gcc is right. > This code seems to *work* simply because it's intentionally > mis-casting pointers, Yes. The mis-casting is intentional. I just need some place to hold the data temporarily (save in __migrate_folio_record() and clear in __migrate_folio_extract()). And "dst" is newly allocated folio. > but I think it needs to be seriously looked at and something done to > make gcc happy (and a *LARGE* comment about it). Sure. I will check whether there's some way to make gcc happy and add some comments about that. There's some comments for __migrate_folio_extract(), but that's isn't enough apprently.) > That last note is not some merge result, it's purely about the new MM code. > [snip] Best Regards, Huang, Ying
Linus Torvalds <torvalds@linux-foundation.org> wrote: > Yes, I saw David Howells resolution suggestion. I think that one > was buggy. It would wait for a page under writeback, and then go on to > the *next* one without writing it back. I don't thin kthat was right. You're right. Vishal's patch introduced it into afs and I copied it across and didn't notice it either then or on review of Vishal's patch. He inserted the extra for-loop as he's now extracting a batch, but kept the continue that used to repeat the extraction - except now it continues the wrong loop. So afs will need fixing too. The simplest ways I think are to just decrement the loop counter before continuing or to stick a goto in back to the beginning of the loop (which is what you did in cifs). But I'm not sure that's the correct thing to do. The previous code dropped the found folio and then repeated the search in case the folio got truncated, migrated or punched. I suspect that's probably what we should do. Also, thinking about it again, I'm not sure whether fetching a batch with filemap_get_folios_tag() like this in {afs,cifs}_writepages_region() is necessarily the right thing to do. There are three cases I'm thinking of: (1) A single folio is returned. This is trivial. (2) A run of contiguous folios are returned - {afs,cifs}_extend_writeback() is likely to write them back, in which case the batch is probably not useful. Note that *_extend_writeback() walks the xarray directly itself as it wants contiguous folios and doesn't want to extract any folio it's not going to use. (3) A list of scattered folios is returned. Granted this is more efficient if nothing else interferes - but there could be other writes in the gaps that we then skip over, other flushes that render some of our list clean or page invalidations. This is a change in behaviour, but I'm not sure that matters too much since a flush/sync can only be expected to write back what's modified at the time it is initiated. Further, processing each entry in the list is potentially very slow because we're doing a write across the network for each one (cifs might bump this into the background, but it might also have to (re)open a file handle on the server and wait for credits first to even begin the transaction). Which means all of the folios in the batch may then get pinned for a long period of time - up to 14x for the last folio in the batch - which could prevent things like page migration. Further, we might not get to write out all the folios in the batch as *_extend_writeback() might hit the wbc limit first. > That said, I'm not at all convinced my version is right either. I > can't test it, and that means I probably messed up. It looked sane to > me when I did it, and it builds cleanly, but I honestly doubt myself. It doesn't seem to work. A write seems to end in lots of: CIFS: VFS: No writable handle in writepages rc=-9 being emitted. I'll poke further into it - there's always the possibility that some other patch is interfering. David
The problem appears to be here: if (folio_mapping(folio) != mapping || !folio_test_dirty(folio)) { folio_unlock(folio); goto skip_write; } in my version it's: if (folio->mapping != mapping || !folio_test_dirty(folio)) { start += folio_size(folio); folio_unlock(folio); continue; } In your version, skip_write doesn't advance start if it too many skips occur. Changing your version to match fixes the problem - or, at least, the symptoms. I'm not sure exactly why it's occurring, but the -EBADF (-9) is coming from cifs_get_writable_file() not finding an open file. I think this is a Steve question. With Vishal's change to use filemap_get_folios_tag() instead of find_get_pages_range_tag(), the most common file write case (open, write, write, ..., write, close) in which all the data is added to the pagecache in one contiguous lump without seeking, hits this a lot. Unfortunately, unlike find_get_pages_range_tag(), filemap_get_folios_tag() doesn't allow you to set a limit on the number of pages it will return. David
On Mon, Feb 20, 2023 at 1:52 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > Linus, please merge this cycle's MM updates. Hmm. Just a note to let you know that I'm bisecting google-chrome dumping core, and it seems to be due to something in this pull. I've got a couple of hundred commits to go, so let's see what it bisects to, but right now all the bad state came from this pull, afaik. Maybe it's a false alarm, but it seems to be consistent so far. Will update when I've bisected closer. Linus
On Sat, Feb 25, 2023 at 6:43 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Hmm. Just a note to let you know that I'm bisecting google-chrome > dumping core, and it seems to be due to something in this pull. > > I've got a couple of hundred commits to go, so let's see what it > bisects to, but right now all the bad state came from this pull, > afaik. > > Maybe it's a false alarm, but it seems to be consistent so far. > > Will update when I've bisected closer. It's solidly in the "change XYZ to use vma iterator" chunk now. Commit 0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()") is good, and commit 37598f5a9d8b ("mlock: convert mlock to vma iterator") is bad. Will bisect further, but adding Liam to the participants because it's now narrowed down to his changes. Linus
On Sat, Feb 25, 2023 at 7:27 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Will bisect further, but adding Liam to the participants because it's > now narrowed down to his changes. Ok, it's commit 2286a6914c77 ("mm: change mprotect_fixup to vma iterator") It was entirely consistent, and bisected right to that all the way from my current git tip. Without that commit, google-chrome works fine. With that commit, I get "Aww snap" and a traps: ThreadPoolForeg[4337] trap invalid opcode ip:55d5542363ee sp:7fa5e04f1f80 error:0 in chrome[55d5537d3000+a14c000] message in the kernel dumps (and core dump noise in journalctl). The commit before is fine. Sadly, it doesn't revert cleanly on my current top-of-tree (or even _remotely_ cleanly_ because of all the other vma changes), so I can't test just reverting that on the current state. Also, it's not like I can debug google-chrome very much. It presumably does complex vma's and unusual mprotect() stuff to trigger this, when nothing else seems to care. Liam? Linus --- 2286a6914c776ec34cd97e4573b1466d055cb9de is the first bad commit commit 2286a6914c776ec34cd97e4573b1466d055cb9de Author: Liam R. Howlett <Liam.Howlett@Oracle.com> Date: Fri Jan 20 11:26:18 2023 -0500 mm: change mprotect_fixup to vma iterator Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/20230120162650.984577-18-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> fs/exec.c | 5 ++++- include/linux/mm.h | 6 +++--- mm/mprotect.c | 47 ++++++++++++++++++++++------------------------- 3 files changed, 29 insertions(+), 29 deletions(-)
On Sat, 25 Feb 2023 19:53:04 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sat, Feb 25, 2023 at 7:27 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Will bisect further, but adding Liam to the participants because it's > > now narrowed down to his changes. > > Ok, it's commit 2286a6914c77 ("mm: change mprotect_fixup to vma iterator") Liam sent us a fix yesterday, hopefully this? From: "Liam R. Howlett" <Liam.Howlett@oracle.com> Subject: mm/mprotect: Fix successful vma_merge() of next in do_mprotect_pkey() Date: Fri, 24 Feb 2023 16:20:55 -0500 If mprotect_fixup() successfully calls vma_merge() and replaces vma and the next vma, then the tmp variable in the do_mprotect_pkey() is not updated to point to the new vma end. This results in the loop detecting a gap between VMAs that does not exist. Fix the faulty value of tmp by setting it to the end location of the vma iterator at the end of the loop. Link: https://lkml.kernel.org/r/20230224212055.1786100-1-Liam.Howlett@oracle.com Fixes: 2286a6914c77 ("mm: change mprotect_fixup to vma iterator") Link: https://lore.kernel.org/linux-mm/20230223120407.729110a6ecd1416ac59d9cb0@linux-foundation.org/ Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Bert Karwatzki <spasswolf@web.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217061 Tested-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- --- a/mm/mprotect.c~mm-mprotect-fix-successful-vma_merge-of-next-in-do_mprotect_pkey +++ b/mm/mprotect.c @@ -832,6 +832,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, if (error) break; + tmp = vma_iter_end(&vmi); nstart = tmp; prot = reqprot; }
On Sat, Feb 25, 2023 at 7:57 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > Liam sent us a fix yesterday, hopefully this? Will test right away. It's certainly right in that suspicious area. Linus
On Sat, Feb 25, 2023 at 8:03 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Will test right away. It's certainly right in that suspicious area. Yup. That fixes it for me. I'll just apply it directly. Linus