Message ID | rdjwihb4vl62psonhbowazcd6tsv7jp6wbfkku76ze3m3uaxt3@nfe3ywdphf52 (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] bcachefs fixes for 6.12-rc5 | expand |
On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote: > >The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1: > > bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400) > >are available in the Git repository at: > > https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22 Hi Linus, There was a sub-thread on the linus-next discussion around improving telemetry around -next/lore w.r.t soaking time and mailing list reviews (https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/). I've prototyped a set of scripts based on suggestions in the thread, and wanted to see if you'd find it useful. A great way to test it out is with a random pull request you'd review anyway :) Is the below useful in any way? Or do you already do something like this locally and I'm just wasting your time? If it's useful, is bot reply to PRs the best way to share this? Any other information that would be useful? Here it goes: Days in -next: ---------------------------------------- 0 | ███████████ (5) 1 | 2 | █████████████████████████████████████████████████ (21) 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14+ | Commits that didn't spend time in -next: -------------------- a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path e04ee8608914d bcachefs: Mark more errors as AUTOFIX f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations 3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning Commits that weren't found on lore.kernel.org/all: -------------------- e04ee8608914d bcachefs: Mark more errors as AUTOFIX f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key() dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot() 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text() 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion
On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: > On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote: > > > > The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1: > > > > bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400) > > > > are available in the Git repository at: > > > > https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22 > > Hi Linus, > > There was a sub-thread on the linus-next discussion around improving > telemetry around -next/lore w.r.t soaking time and mailing list reviews > (https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/). > > I've prototyped a set of scripts based on suggestions in the thread, and > wanted to see if you'd find it useful. A great way to test it out is with > a random pull request you'd review anyway :) This looks really nice to me! Maybe add top-level grade ("B-") based on the stats available. You can call it the Nominal Acceptance Grade bot (NAGbot). Can so that everyone will pay attention, declare that it is an LLM (Large Linus Model). > Is the below useful in any way? Or do you already do something like this > locally and I'm just wasting your time? > > If it's useful, is bot reply to PRs the best way to share this? Any > other information that would be useful? > > Here it goes: > > > Days in -next: > ---------------------------------------- > 0 | ███████████ (5) > 1 | > 2 | █████████████████████████████████████████████████ (21) > 3 | > 4 | > 5 | > 6 | > 7 | > 8 | > 9 | > 10 | > 11 | > 12 | > 13 | > 14+ | > > Commits that didn't spend time in -next: I'd include a count summary "(5 of 26: 19%)" > -------------------- > a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path > e04ee8608914d bcachefs: Mark more errors as AUTOFIX > f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations > 3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery > eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning And then maybe limit this to 5 or 10 (imagine a huge PR like netdev or drm). > > > Commits that weren't found on lore.kernel.org/all: "(9 of 26: 35%)" > -------------------- > e04ee8608914d bcachefs: Mark more errors as AUTOFIX > f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations > bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key() > dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot() > 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type > d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text() > 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked > b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors > 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion Nice work!
On Tue, Oct 22, 2024 at 12:30:38PM -0700, Kees Cook wrote: > On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: > > -------------------- > > a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path > > e04ee8608914d bcachefs: Mark more errors as AUTOFIX > > f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations > > 3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery > > eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning > And then maybe limit this to 5 or 10 (imagine a huge PR like netdev or > drm). OTOH since this is supposed to be only for commits that didn't spend time in -next perhaps exploding badly is getting the message over clearly? Or at least putting something in that makes it stand out that truncation happened. It's a constant tradeoff with this sort of thing. > Nice work! Indeed.
On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: > On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote: > > > > The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1: > > > > bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400) > > > > are available in the Git repository at: > > > > https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22 > > Hi Linus, > > There was a sub-thread on the linus-next discussion around improving > telemetry around -next/lore w.r.t soaking time and mailing list reviews > (https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/). > > I've prototyped a set of scripts based on suggestions in the thread, and > wanted to see if you'd find it useful. A great way to test it out is with > a random pull request you'd review anyway :) > > Is the below useful in any way? Or do you already do something like this > locally and I'm just wasting your time? > > If it's useful, is bot reply to PRs the best way to share this? Any > other information that would be useful? As a maintainer I probably would've found this to be annoying, but with all my other outside observer / participant hats on, I think it's very good to have a bot to expose maintainers not following the process. > Here it goes: > > > Days in -next: > ---------------------------------------- > 0 | ███████████ (5) > 1 | > 2 | █████████████████████████████████████████████████ (21) > 3 | > 4 | > 5 | > 6 | > 7 | > 8 | > 9 | > 10 | > 11 | > 12 | > 13 | > 14+ | > > Commits that didn't spend time in -next: > -------------------- > a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path > e04ee8608914d bcachefs: Mark more errors as AUTOFIX > f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations > 3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery > eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning > > > Commits that weren't found on lore.kernel.org/all: > -------------------- > e04ee8608914d bcachefs: Mark more errors as AUTOFIX > f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations > bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key() > dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot() > 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type > d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text() > 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked > b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors > 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion Especially since there were already two whole roarings about this! This was a very good demonstration! PS: Would you be willing to share the part that searches lore? There's a few other git.kernel.org repos that might be interesting. --D > > -- > Thanks, > Sasha >
On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote: >On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: >> other information that would be useful? > >As a maintainer I probably would've found this to be annoying, but with >all my other outside observer / participant hats on, I think it's very >good to have a bot to expose maintainers not following the process. This was my thinking too. Maybe it makes sense for the bot to shut up if things look good (i.e. >N days in stable, everything on the mailing list). Or maybe just a simple "LGTM" or a "Reviewed-by:..."? >> Commits that weren't found on lore.kernel.org/all: >> -------------------- >> e04ee8608914d bcachefs: Mark more errors as AUTOFIX >> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations >> bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key() >> dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot() >> 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type >> d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text() >> 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked >> b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors >> 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion > >Especially since there were already two whole roarings about this! >This was a very good demonstration! > >PS: Would you be willing to share the part that searches lore? There's >a few other git.kernel.org repos that might be interesting. It's all at https://git.kernel.org/pub/scm/linux/kernel/git/sashal/next-analysis.git/ In particular, the query-lore.sh script.
On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote: > On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: > > On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote: > > > > > > The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1: > > > > > > bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400) > > > > > > are available in the Git repository at: > > > > > > https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22 > > > > Hi Linus, > > > > There was a sub-thread on the linus-next discussion around improving > > telemetry around -next/lore w.r.t soaking time and mailing list reviews > > (https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/). > > > > I've prototyped a set of scripts based on suggestions in the thread, and > > wanted to see if you'd find it useful. A great way to test it out is with > > a random pull request you'd review anyway :) > > > > Is the below useful in any way? Or do you already do something like this > > locally and I'm just wasting your time? > > > > If it's useful, is bot reply to PRs the best way to share this? Any > > other information that would be useful? > > As a maintainer I probably would've found this to be annoying, but with > all my other outside observer / participant hats on, I think it's very > good to have a bot to expose maintainers not following the process. That's the interesting about face... Personally, I'm curious what people are trying to achieve here.
Hi Sasha, This is awesome. Sasha Levin <sashal@kernel.org> writes: > On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote: >>On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: >>> other information that would be useful? >> >>As a maintainer I probably would've found this to be annoying, but with >>all my other outside observer / participant hats on, I think it's very >>good to have a bot to expose maintainers not following the process. > > This was my thinking too. Maybe it makes sense for the bot to shut up if > things look good (i.e. >N days in stable, everything on the mailing > list). Or maybe just a simple "LGTM" or a "Reviewed-by:..."? I think it has to reply with something, otherwise folks will wonder if the bot has broken or missed their pull request. But if all commits were in in linux-next and posted to a list, then the only content is the "Days in linux-next" histogram, which is not that long and is useful information IMHO. It would be nice if you could trim the tail of the histogram below the last populated row, that would make it more concise. For fixes pulls it is sometimes legitimate for commits not to have been in linux-next. But I think it's still good for the bot to highlight those, ideally fixes that miss linux-next are either very urgent or minor. >>> Commits that weren't found on lore.kernel.org/all: >>> -------------------- >>> e04ee8608914d bcachefs: Mark more errors as AUTOFIX >>> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations >>> bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key() >>> dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot() >>> 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type >>> d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text() >>> 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked >>> b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors >>> 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion Are you searching by message id, or subject? I sometimes edit subjects when applying patches, so a subject search could miss some. cheers
On Tue, Oct 22, 2024 at 12:30:38PM -0700, Kees Cook wrote: >On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: >> On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote: >> > >> > The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1: >> > >> > bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400) >> > >> > are available in the Git repository at: >> > >> > https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22 >> >> Hi Linus, >> >> There was a sub-thread on the linus-next discussion around improving >> telemetry around -next/lore w.r.t soaking time and mailing list reviews >> (https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/). >> >> I've prototyped a set of scripts based on suggestions in the thread, and >> wanted to see if you'd find it useful. A great way to test it out is with >> a random pull request you'd review anyway :) > >This looks really nice to me! Maybe add top-level grade ("B-") based on >the stats available. You can call it the Nominal Acceptance Grade bot >(NAGbot). Can so that everyone will pay attention, declare that it is an >LLM (Large Linus Model). This is something we need to figure out. If we do do this, then we need to figure out how to score, and scoring should be different between merge window and -rc cycles (heck, it should probably be different between -rc cycles themselves). >I'd include a count summary "(5 of 26: 19%)" Ack >> -------------------- >> a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path >> e04ee8608914d bcachefs: Mark more errors as AUTOFIX >> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations >> 3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery >> eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning > >And then maybe limit this to 5 or 10 (imagine a huge PR like netdev or >drm). They should never have a huge number of commits here. I think it's better to just let it explode. >> >> >> Commits that weren't found on lore.kernel.org/all: > >"(9 of 26: 35%)" Ack. >Nice work! Thanks!
On Wed, Oct 23, 2024 at 09:42:59PM +1100, Michael Ellerman wrote: >Hi Sasha, > >This is awesome. > >Sasha Levin <sashal@kernel.org> writes: >> On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote: >>>On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: >>>> other information that would be useful? >>> >>>As a maintainer I probably would've found this to be annoying, but with >>>all my other outside observer / participant hats on, I think it's very >>>good to have a bot to expose maintainers not following the process. >> >> This was my thinking too. Maybe it makes sense for the bot to shut up if >> things look good (i.e. >N days in stable, everything on the mailing >> list). Or maybe just a simple "LGTM" or a "Reviewed-by:..."? > >I think it has to reply with something, otherwise folks will wonder if >the bot has broken or missed their pull request. > >But if all commits were in in linux-next and posted to a list, then the >only content is the "Days in linux-next" histogram, which is not that long >and is useful information IMHO. > >It would be nice if you could trim the tail of the histogram below the >last populated row, that would make it more concise. Makes sense, I'll do that. >For fixes pulls it is sometimes legitimate for commits not to have been >in linux-next. But I think it's still good for the bot to highlight >those, ideally fixes that miss linux-next are either very urgent or >minor. Right, and Linus said he's okay with those. This is not a "shame" list but rather "look a little closer" list. >>>> Commits that weren't found on lore.kernel.org/all: >>>> -------------------- >>>> e04ee8608914d bcachefs: Mark more errors as AUTOFIX >>>> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations >>>> bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key() >>>> dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot() >>>> 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type >>>> d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text() >>>> 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked >>>> b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors >>>> 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion > >Are you searching by message id, or subject? I sometimes edit subjects >when applying patches, so a subject search could miss some. Both, and also by patch id, so if you just change the subject it should still be okay. We'll see when you send your next PR :)
The pull request you sent on Tue, 22 Oct 2024 13:39:10 -0400:
> https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c1e822754cc7f28b98c6897d62e8b47b4001e422
Thank you!
On Wed, Oct 23, 2024 at 07:41:32AM -0400, Sasha Levin wrote: > On Wed, Oct 23, 2024 at 09:42:59PM +1100, Michael Ellerman wrote: > > Hi Sasha, > > > > This is awesome. > > > > Sasha Levin <sashal@kernel.org> writes: > > > On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote: > > > > On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote: > > > > > other information that would be useful? > > > > > > > > As a maintainer I probably would've found this to be annoying, but with > > > > all my other outside observer / participant hats on, I think it's very > > > > good to have a bot to expose maintainers not following the process. > > > > > > This was my thinking too. Maybe it makes sense for the bot to shut up if > > > things look good (i.e. >N days in stable, everything on the mailing > > > list). Or maybe just a simple "LGTM" or a "Reviewed-by:..."? > > > > I think it has to reply with something, otherwise folks will wonder if > > the bot has broken or missed their pull request. > > > > But if all commits were in in linux-next and posted to a list, then the > > only content is the "Days in linux-next" histogram, which is not that long > > and is useful information IMHO. > > > > It would be nice if you could trim the tail of the histogram below the > > last populated row, that would make it more concise. > > Makes sense, I'll do that. > > > For fixes pulls it is sometimes legitimate for commits not to have been > > in linux-next. But I think it's still good for the bot to highlight > > those, ideally fixes that miss linux-next are either very urgent or > > minor. > > Right, and Linus said he's okay with those. This is not a "shame" list > but rather "look a little closer" list. Ok, that makes me feel better about this. I've got stuff that I hold back for weeks (or months), and others that I'm fine with sending the next day, once it's passed my CI. I'm going to try to be better about talking about which patches have risks (and why that risk is justified, else I wouldn't be sending it), or which patches look more involved but I've got reason to be confident about - that can get quite subtle. That's good for everyone down the line in terms of knowing what to expect. I wonder if also some of this is motivated by people concerned about things in bcachefs moving too fast, and running the risk of regressions? That's a justifiable concern, and priorities might be worth talking about a bit. I'm not currently seeing anything that makes me too concerned about regressions: users in general aren't complaining about regressions (previous pull request a user chimed in that he had been seeing less stability past few kernel releases, and he tried switching back to btrfs and the issues were still there) and my test dashboard is steadily improving. I do still have fairly critical (i.e. downtime causing) user reported issues coming in that are taking most of my time, although they're getting off into the weeds - one I've been working on the past few days was reported by a user with a ~20 drive filesystem where we're overflowing the maximum number of pointers in an extent, due to keeping too many cached copies around, and his filesystem goes emergency read-only. And there still seems to be something not-quite-right with snapshots and unlinked inodes, possibly a repair issue. Test dashboard still has a long ways to go before it's anywhere near as clean as I want (and it needs to be, so that I can easily spot regressions), but number of test failures have been steadily dropping and the results are getting more consistent, and none of the test failures there are scary ones that need to be jumped on. https://evilpiepirate.org/~testdashboard/ci?user=kmo&branch=bcachefs-testing (We were at 150-160 failures per full run a few weeks ago, now 100-110. The full runs also run fstests in 8 different configurations, so lots of duplicates). There have been a lot of new syzbot reports recently, and some of those do look more concerning. I don't think this indicates regressions - this looks to be like syzbot getting better with its code-coverage-guided testing at finding interesting codepaths, and for the concerning ones I don't think the code changed in the right timeframe for it to be a regressions. I think several of the recent syzbot reports are all due to the same bug, where it looks like we're not going read-only correctly and interior btree updates are still happening - I suspect that's been there for ages and an assert I recently added is making it more visible. I am still heavily in triage mode, and filesystem repair/recovery bugs take top priority. In general, my priorities are - critical user reported bugs first - failures from my automated test suite second (unless they're regressions) - syzbot last, because there's been basically zero overlap with syzbot bugs and user-affecting bugs so far, and because that's been an easy place for new people to jump in and help.