[GIT,PULL] bcachefs fixes for 6.12-rc5

Message ID	rdjwihb4vl62psonhbowazcd6tsv7jp6wbfkku76ze3m3uaxt3@nfe3ywdphf52 (mailing list archive)
State	New
Headers	show Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05C581BDA9F for <linux-fsdevel@vger.kernel.org>; Tue, 22 Oct 2024 17:39:18 +0000 (UTC) Date: Tue, 22 Oct 2024 13:39:10 -0400 From: Kent Overstreet <kent.overstreet@linux.dev> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-bcachefs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [GIT PULL] bcachefs fixes for 6.12-rc5 Message-ID: <rdjwihb4vl62psonhbowazcd6tsv7jp6wbfkku76ze3m3uaxt3@nfe3ywdphf52> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
Series	[GIT,PULL] bcachefs fixes for 6.12-rc5 \| expand [GIT,PULL] bcachefs fixes for 6.12-rc5

Kent Overstreet Oct. 22, 2024, 5:39 p.m. UTC

The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1:

  bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400)

are available in the Git repository at:

  https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22

for you to fetch changes up to a069f014797fdef8757f3adebc1c16416271a599:

  bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path (2024-10-20 18:09:09 -0400)

----------------------------------------------------------------
bcachefs fixes for 6.12-rc5

Lots of hotfixes:
- transaction restart injection has been shaking out a few things

- fix a data corruption in the buffered write path on -ENOSPC, found by
  xfstests generic/299

- Some small show_options fixes

- Repair mismatches in inode hash type, seed: different snapshot
  versions of an inode must have the same hash/type seed, used for
  directory entries and xattrs. We were checking the hash seed, but not
  the type, and a user contributed a filesystem where the hash type on
  one inode had somehow been flipped; these fixes allow his filesystem
  to repair.

  Additionally, the hash type flip made some directory entries
  invisible, which were then recreated by userspace; so the hash check
  code now checks for duplicate non dangling dirents, and renames one of
  them if necessary.

- Don't use wait_event_interruptible() in recovery: this fixes some
  filesystems failing to mount with -ERESTARTSYS

- Workaround for kvmalloc not supporting > INT_MAX allocations, causing
  an -ENOMEM when allocating the sorted array of journal keys: this
  allows a 75 TB filesystem to mount

- Make sure bch_inode_unpacked.bi_snapshot is set in the old inode
  compat path: this alllows Marcin's filesystem (in use since before
  6.7) to repair and mount.

----------------------------------------------------------------
Hongbo Li (2):
      bcachefs: fix incorrect show_options results
      bcachefs: skip mount option handle for empty string.

Kent Overstreet (24):
      bcachefs: fix restart handling in bch2_rename2()
      bcachefs: fix bch2_hash_delete() error path
      bcachefs: fix restart handling in bch2_fiemap()
      bcachefs: fix missing restart handling in bch2_read_retry_nodecode()
      bcachefs: fix restart handling in bch2_do_invalidates_work()
      bcachefs: fix restart handling in bch2_alloc_write_key()
      bcachefs: fix restart handling in __bch2_resume_logged_op_finsert()
      bcachefs: handle restarts in bch2_bucket_io_time_reset()
      bcachefs: Don't use commit_do() unnecessarily
      bcachefS: ec: fix data type on stripe deletion
      bcachefs: fix disk reservation accounting in bch2_folio_reservation_get()
      bcachefs: bch2_folio_reservation_get_partial() is now better behaved
      bcachefs: Fix data corruption on -ENOSPC in buffered write path
      bcachefs: Run in-kernel offline fsck without ratelimit errors
      bcachefs: INODE_STR_HASH() for bch_inode_unpacked
      bcachefs: Add hash seed, type to inode_to_text()
      bcachefs: Repair mismatches in inode hash seed, type
      bcachefs: bch2_hash_set_or_get_in_snapshot()
      bcachefs: fsck: Improve hash_check_key()
      bcachefs: Fix __bch2_fsck_err() warning
      bcachefs: Don't use wait_event_interruptible() in recovery
      bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
      bcachefs: Mark more errors as AUTOFIX
      bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path

 fs/bcachefs/alloc_background.c      |  37 +++--
 fs/bcachefs/alloc_foreground.c      |   2 +-
 fs/bcachefs/btree_gc.c              |  12 +-
 fs/bcachefs/btree_io.c              |   2 +-
 fs/bcachefs/btree_iter.h            |   2 +
 fs/bcachefs/btree_update.c          |   4 +-
 fs/bcachefs/btree_update.h          |   2 +-
 fs/bcachefs/btree_update_interior.c |   4 +-
 fs/bcachefs/buckets.c               |   7 +-
 fs/bcachefs/buckets.h               |  12 +-
 fs/bcachefs/chardev.c               |   1 +
 fs/bcachefs/darray.c                |  15 +-
 fs/bcachefs/dirent.c                |   7 -
 fs/bcachefs/dirent.h                |   7 +
 fs/bcachefs/disk_accounting.c       |   6 +-
 fs/bcachefs/ec.c                    |  22 +--
 fs/bcachefs/error.c                 |   5 +-
 fs/bcachefs/fs-io-buffered.c        |   6 +
 fs/bcachefs/fs-io-pagecache.c       |  70 +++++----
 fs/bcachefs/fs-io.c                 |   2 +-
 fs/bcachefs/fs.c                    |  18 +--
 fs/bcachefs/fsck.c                  | 281 +++++++++++++++++++++++++++++-------
 fs/bcachefs/inode.c                 |  27 ++--
 fs/bcachefs/inode.h                 |   1 +
 fs/bcachefs/inode_format.h          |   6 +-
 fs/bcachefs/io_misc.c               |   2 +-
 fs/bcachefs/io_read.c               |   8 +-
 fs/bcachefs/io_write.c              |   4 +-
 fs/bcachefs/journal.c               |  10 +-
 fs/bcachefs/journal.h               |   2 +-
 fs/bcachefs/opts.c                  |   6 +-
 fs/bcachefs/opts.h                  |   3 +-
 fs/bcachefs/quota.c                 |   2 +-
 fs/bcachefs/rebalance.c             |   4 +-
 fs/bcachefs/recovery.c              |   2 +-
 fs/bcachefs/sb-errors_format.h      |   4 +-
 fs/bcachefs/str_hash.h              |  60 +++++---
 fs/bcachefs/subvolume.c             |   7 +-
 fs/bcachefs/super.c                 |   2 +-
 fs/bcachefs/tests.c                 |   4 +-
 fs/bcachefs/xattr.c                 |   2 +-
 41 files changed, 475 insertions(+), 205 deletions(-)

Sasha Levin Oct. 22, 2024, 7:06 p.m. UTC | #1

On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote:
>
>The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1:
>
>  bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400)
>
>are available in the Git repository at:
>
>  https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22

Hi Linus,

There was a sub-thread on the linus-next discussion around improving
telemetry around -next/lore w.r.t soaking time and mailing list reviews
(https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/).

I've prototyped a set of scripts based on suggestions in the thread, and
wanted to see if you'd find it useful. A great way to test it out is with
a random pull request you'd review anyway :)

Is the below useful in any way? Or do you already do something like this
locally and I'm just wasting your time?

If it's useful, is bot reply to PRs the best way to share this? Any
other information that would be useful?

Here it goes:


Days in -next:
----------------------------------------
  0  | ███████████ (5)
  1  |
  2  | █████████████████████████████████████████████████ (21)
  3  |
  4  |
  5  |
  6  |
  7  |
  8  |
  9  |
10  |
11  |
12  |
13  |
14+ |

Commits that didn't spend time in -next:
--------------------
a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path
e04ee8608914d bcachefs: Mark more errors as AUTOFIX
f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery
eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning


Commits that weren't found on lore.kernel.org/all:
--------------------
e04ee8608914d bcachefs: Mark more errors as AUTOFIX
f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key()
dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot()
15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type
d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text()
78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked
b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors
4007bbb203a0c bcachefS: ec: fix data type on stripe deletion

Kees Cook Oct. 22, 2024, 7:30 p.m. UTC | #2

On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote:
> On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote:
> > 
> > The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1:
> > 
> >  bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400)
> > 
> > are available in the Git repository at:
> > 
> >  https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22
> 
> Hi Linus,
> 
> There was a sub-thread on the linus-next discussion around improving
> telemetry around -next/lore w.r.t soaking time and mailing list reviews
> (https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/).
> 
> I've prototyped a set of scripts based on suggestions in the thread, and
> wanted to see if you'd find it useful. A great way to test it out is with
> a random pull request you'd review anyway :)

This looks really nice to me! Maybe add top-level grade ("B-") based on
the stats available. You can call it the Nominal Acceptance Grade bot
(NAGbot). Can so that everyone will pay attention, declare that it is an
LLM (Large Linus Model).

> Is the below useful in any way? Or do you already do something like this
> locally and I'm just wasting your time?
> 
> If it's useful, is bot reply to PRs the best way to share this? Any
> other information that would be useful?
> 
> Here it goes:
> 
> 
> Days in -next:
> ----------------------------------------
>  0  | ███████████ (5)
>  1  |
>  2  | █████████████████████████████████████████████████ (21)
>  3  |
>  4  |
>  5  |
>  6  |
>  7  |
>  8  |
>  9  |
> 10  |
> 11  |
> 12  |
> 13  |
> 14+ |
> 
> Commits that didn't spend time in -next:

I'd include a count summary "(5 of 26: 19%)"

> --------------------
> a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path
> e04ee8608914d bcachefs: Mark more errors as AUTOFIX
> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
> 3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery
> eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning

And then maybe limit this to 5 or 10 (imagine a huge PR like netdev or
drm).

> 
> 
> Commits that weren't found on lore.kernel.org/all:

"(9 of 26: 35%)"

> --------------------
> e04ee8608914d bcachefs: Mark more errors as AUTOFIX
> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
> bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key()
> dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot()
> 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type
> d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text()
> 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked
> b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors
> 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion

Nice work!

Mark Brown Oct. 22, 2024, 7:34 p.m. UTC | #3

On Tue, Oct 22, 2024 at 12:30:38PM -0700, Kees Cook wrote:
> On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote:

> > --------------------
> > a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path
> > e04ee8608914d bcachefs: Mark more errors as AUTOFIX
> > f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
> > 3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery
> > eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning

> And then maybe limit this to 5 or 10 (imagine a huge PR like netdev or
> drm).

OTOH since this is supposed to be only for commits that didn't spend
time in -next perhaps exploding badly is getting the message over
clearly?  Or at least putting something in that makes it stand out that
truncation happened.  It's a constant tradeoff with this sort of thing.

> Nice work!

Indeed.

Darrick J. Wong Oct. 22, 2024, 8:49 p.m. UTC | #4

On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote:
> On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote:
> > 
> > The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1:
> > 
> >  bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400)
> > 
> > are available in the Git repository at:
> > 
> >  https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22
> 
> Hi Linus,
> 
> There was a sub-thread on the linus-next discussion around improving
> telemetry around -next/lore w.r.t soaking time and mailing list reviews
> (https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/).
> 
> I've prototyped a set of scripts based on suggestions in the thread, and
> wanted to see if you'd find it useful. A great way to test it out is with
> a random pull request you'd review anyway :)
> 
> Is the below useful in any way? Or do you already do something like this
> locally and I'm just wasting your time?
> 
> If it's useful, is bot reply to PRs the best way to share this? Any
> other information that would be useful?

As a maintainer I probably would've found this to be annoying, but with
all my other outside observer / participant hats on, I think it's very
good to have a bot to expose maintainers not following the process.

> Here it goes:
> 
> 
> Days in -next:
> ----------------------------------------
>  0  | ███████████ (5)
>  1  |
>  2  | █████████████████████████████████████████████████ (21)
>  3  |
>  4  |
>  5  |
>  6  |
>  7  |
>  8  |
>  9  |
> 10  |
> 11  |
> 12  |
> 13  |
> 14+ |
> 
> Commits that didn't spend time in -next:
> --------------------
> a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path
> e04ee8608914d bcachefs: Mark more errors as AUTOFIX
> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
> 3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery
> eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning
> 
> 
> Commits that weren't found on lore.kernel.org/all:
> --------------------
> e04ee8608914d bcachefs: Mark more errors as AUTOFIX
> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
> bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key()
> dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot()
> 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type
> d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text()
> 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked
> b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors
> 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion

Especially since there were already two whole roarings about this!
This was a very good demonstration!

PS: Would you be willing to share the part that searches lore?  There's
a few other git.kernel.org repos that might be interesting.

--D

> 
> -- 
> Thanks,
> Sasha
>

Sasha Levin Oct. 22, 2024, 9:20 p.m. UTC | #5

On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote:
>On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote:
>> other information that would be useful?
>
>As a maintainer I probably would've found this to be annoying, but with
>all my other outside observer / participant hats on, I think it's very
>good to have a bot to expose maintainers not following the process.

This was my thinking too. Maybe it makes sense for the bot to shut up if
things look good (i.e. >N days in stable, everything on the mailing
list). Or maybe just a simple "LGTM" or a "Reviewed-by:..."?

>> Commits that weren't found on lore.kernel.org/all:
>> --------------------
>> e04ee8608914d bcachefs: Mark more errors as AUTOFIX
>> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
>> bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key()
>> dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot()
>> 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type
>> d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text()
>> 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked
>> b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors
>> 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion
>
>Especially since there were already two whole roarings about this!
>This was a very good demonstration!
>
>PS: Would you be willing to share the part that searches lore?  There's
>a few other git.kernel.org repos that might be interesting.

It's all at https://git.kernel.org/pub/scm/linux/kernel/git/sashal/next-analysis.git/

In particular, the query-lore.sh script.

Kent Overstreet Oct. 22, 2024, 10:58 p.m. UTC | #6

On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote:
> > On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote:
> > > 
> > > The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1:
> > > 
> > >  bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400)
> > > 
> > > are available in the Git repository at:
> > > 
> > >  https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22
> > 
> > Hi Linus,
> > 
> > There was a sub-thread on the linus-next discussion around improving
> > telemetry around -next/lore w.r.t soaking time and mailing list reviews
> > (https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/).
> > 
> > I've prototyped a set of scripts based on suggestions in the thread, and
> > wanted to see if you'd find it useful. A great way to test it out is with
> > a random pull request you'd review anyway :)
> > 
> > Is the below useful in any way? Or do you already do something like this
> > locally and I'm just wasting your time?
> > 
> > If it's useful, is bot reply to PRs the best way to share this? Any
> > other information that would be useful?
> 
> As a maintainer I probably would've found this to be annoying, but with
> all my other outside observer / participant hats on, I think it's very
> good to have a bot to expose maintainers not following the process.

That's the interesting about face...

Personally, I'm curious what people are trying to achieve here.

Michael Ellerman Oct. 23, 2024, 10:42 a.m. UTC | #7

Hi Sasha,

This is awesome.

Sasha Levin <sashal@kernel.org> writes:
> On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote:
>>On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote:
>>> other information that would be useful?
>>
>>As a maintainer I probably would've found this to be annoying, but with
>>all my other outside observer / participant hats on, I think it's very
>>good to have a bot to expose maintainers not following the process.
>
> This was my thinking too. Maybe it makes sense for the bot to shut up if
> things look good (i.e. >N days in stable, everything on the mailing
> list). Or maybe just a simple "LGTM" or a "Reviewed-by:..."?

I think it has to reply with something, otherwise folks will wonder if
the bot has broken or missed their pull request.

But if all commits were in in linux-next and posted to a list, then the
only content is the "Days in linux-next" histogram, which is not that long
and is useful information IMHO.

It would be nice if you could trim the tail of the histogram below the
last populated row, that would make it more concise.

For fixes pulls it is sometimes legitimate for commits not to have been
in linux-next. But I think it's still good for the bot to highlight
those, ideally fixes that miss linux-next are either very urgent or
minor.

>>> Commits that weren't found on lore.kernel.org/all:
>>> --------------------
>>> e04ee8608914d bcachefs: Mark more errors as AUTOFIX
>>> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
>>> bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key()
>>> dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot()
>>> 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type
>>> d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text()
>>> 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked
>>> b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors
>>> 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion

Are you searching by message id, or subject? I sometimes edit subjects
when applying patches, so a subject search could miss some.

cheers

Sasha Levin Oct. 23, 2024, 11:39 a.m. UTC | #8

On Tue, Oct 22, 2024 at 12:30:38PM -0700, Kees Cook wrote:
>On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote:
>> On Tue, Oct 22, 2024 at 01:39:10PM -0400, Kent Overstreet wrote:
>> >
>> > The following changes since commit 5e3b72324d32629fa013f86657308f3dbc1115e1:
>> >
>> >  bcachefs: Fix sysfs warning in fstests generic/730,731 (2024-10-14 05:43:01 -0400)
>> >
>> > are available in the Git repository at:
>> >
>> >  https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22
>>
>> Hi Linus,
>>
>> There was a sub-thread on the linus-next discussion around improving
>> telemetry around -next/lore w.r.t soaking time and mailing list reviews
>> (https://lore.kernel.org/all/792F4759-EA33-48B8-9AD0-FA14FA69E86E@kernel.org/).
>>
>> I've prototyped a set of scripts based on suggestions in the thread, and
>> wanted to see if you'd find it useful. A great way to test it out is with
>> a random pull request you'd review anyway :)
>
>This looks really nice to me! Maybe add top-level grade ("B-") based on
>the stats available. You can call it the Nominal Acceptance Grade bot
>(NAGbot). Can so that everyone will pay attention, declare that it is an
>LLM (Large Linus Model).

This is something we need to figure out. If we do do this, then we need
to figure out how to score, and scoring should be different between
merge window and -rc cycles (heck, it should probably be different
between -rc cycles themselves).

>I'd include a count summary "(5 of 26: 19%)"

Ack

>> --------------------
>> a069f014797fd bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path
>> e04ee8608914d bcachefs: Mark more errors as AUTOFIX
>> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
>> 3956ff8bc2f39 bcachefs: Don't use wait_event_interruptible() in recovery
>> eb5db64c45709 bcachefs: Fix __bch2_fsck_err() warning
>
>And then maybe limit this to 5 or 10 (imagine a huge PR like netdev or
>drm).

They should never have a huge number of commits here. I think it's
better to just let it explode.

>>
>>
>> Commits that weren't found on lore.kernel.org/all:
>
>"(9 of 26: 35%)"

Ack.

>Nice work!

Thanks!

Sasha Levin Oct. 23, 2024, 11:41 a.m. UTC | #9

On Wed, Oct 23, 2024 at 09:42:59PM +1100, Michael Ellerman wrote:
>Hi Sasha,
>
>This is awesome.
>
>Sasha Levin <sashal@kernel.org> writes:
>> On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote:
>>>On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote:
>>>> other information that would be useful?
>>>
>>>As a maintainer I probably would've found this to be annoying, but with
>>>all my other outside observer / participant hats on, I think it's very
>>>good to have a bot to expose maintainers not following the process.
>>
>> This was my thinking too. Maybe it makes sense for the bot to shut up if
>> things look good (i.e. >N days in stable, everything on the mailing
>> list). Or maybe just a simple "LGTM" or a "Reviewed-by:..."?
>
>I think it has to reply with something, otherwise folks will wonder if
>the bot has broken or missed their pull request.
>
>But if all commits were in in linux-next and posted to a list, then the
>only content is the "Days in linux-next" histogram, which is not that long
>and is useful information IMHO.
>
>It would be nice if you could trim the tail of the histogram below the
>last populated row, that would make it more concise.

Makes sense, I'll do that.

>For fixes pulls it is sometimes legitimate for commits not to have been
>in linux-next. But I think it's still good for the bot to highlight
>those, ideally fixes that miss linux-next are either very urgent or
>minor.

Right, and Linus said he's okay with those. This is not a "shame" list
but rather "look a little closer" list.

>>>> Commits that weren't found on lore.kernel.org/all:
>>>> --------------------
>>>> e04ee8608914d bcachefs: Mark more errors as AUTOFIX
>>>> f0d3302073e60 bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
>>>> bc6d2d10418e1 bcachefs: fsck: Improve hash_check_key()
>>>> dc96656b20eb6 bcachefs: bch2_hash_set_or_get_in_snapshot()
>>>> 15a3836c8ed7b bcachefs: Repair mismatches in inode hash seed, type
>>>> d8e879377ffb3 bcachefs: Add hash seed, type to inode_to_text()
>>>> 78cf0ae636a55 bcachefs: INODE_STR_HASH() for bch_inode_unpacked
>>>> b96f8cd3870a1 bcachefs: Run in-kernel offline fsck without ratelimit errors
>>>> 4007bbb203a0c bcachefS: ec: fix data type on stripe deletion
>
>Are you searching by message id, or subject? I sometimes edit subjects
>when applying patches, so a subject search could miss some.

Both, and also by patch id, so if you just change the subject it should
still be okay.

We'll see when you send your next PR :)

pr-tracker-bot@kernel.org Oct. 24, 2024, 8:33 p.m. UTC | #10

The pull request you sent on Tue, 22 Oct 2024 13:39:10 -0400:

> https://github.com/koverstreet/bcachefs tags/bcachefs-2024-10-22

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c1e822754cc7f28b98c6897d62e8b47b4001e422

Thank you!

Kent Overstreet Oct. 24, 2024, 10:19 p.m. UTC | #11

On Wed, Oct 23, 2024 at 07:41:32AM -0400, Sasha Levin wrote:
> On Wed, Oct 23, 2024 at 09:42:59PM +1100, Michael Ellerman wrote:
> > Hi Sasha,
> > 
> > This is awesome.
> > 
> > Sasha Levin <sashal@kernel.org> writes:
> > > On Tue, Oct 22, 2024 at 01:49:31PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Oct 22, 2024 at 03:06:38PM -0400, Sasha Levin wrote:
> > > > > other information that would be useful?
> > > > 
> > > > As a maintainer I probably would've found this to be annoying, but with
> > > > all my other outside observer / participant hats on, I think it's very
> > > > good to have a bot to expose maintainers not following the process.
> > > 
> > > This was my thinking too. Maybe it makes sense for the bot to shut up if
> > > things look good (i.e. >N days in stable, everything on the mailing
> > > list). Or maybe just a simple "LGTM" or a "Reviewed-by:..."?
> > 
> > I think it has to reply with something, otherwise folks will wonder if
> > the bot has broken or missed their pull request.
> > 
> > But if all commits were in in linux-next and posted to a list, then the
> > only content is the "Days in linux-next" histogram, which is not that long
> > and is useful information IMHO.
> > 
> > It would be nice if you could trim the tail of the histogram below the
> > last populated row, that would make it more concise.
> 
> Makes sense, I'll do that.
> 
> > For fixes pulls it is sometimes legitimate for commits not to have been
> > in linux-next. But I think it's still good for the bot to highlight
> > those, ideally fixes that miss linux-next are either very urgent or
> > minor.
> 
> Right, and Linus said he's okay with those. This is not a "shame" list
> but rather "look a little closer" list.

Ok, that makes me feel better about this. I've got stuff that I hold
back for weeks (or months), and others that I'm fine with sending the
next day, once it's passed my CI.

I'm going to try to be better about talking about which patches have
risks (and why that risk is justified, else I wouldn't be sending it),
or which patches look more involved but I've got reason to be confident
about - that can get quite subtle. That's good for everyone down the
line in terms of knowing what to expect.

I wonder if also some of this is motivated by people concerned about
things in bcachefs moving too fast, and running the risk of regressions?
That's a justifiable concern, and priorities might be worth talking
about a bit.

I'm not currently seeing anything that makes me too concerned about
regressions: users in general aren't complaining about regressions
(previous pull request a user chimed in that he had been seeing less
stability past few kernel releases, and he tried switching back to btrfs
and the issues were still there) and my test dashboard is steadily
improving.

I do still have fairly critical (i.e. downtime causing) user reported
issues coming in that are taking most of my time, although they're
getting off into the weeds - one I've been working on the past few days
was reported by a user with a ~20 drive filesystem where we're
overflowing the maximum number of pointers in an extent, due to keeping
too many cached copies around, and his filesystem goes emergency
read-only. And there still seems to be something not-quite-right with
snapshots and unlinked inodes, possibly a repair issue.

Test dashboard still has a long ways to go before it's anywhere near as
clean as I want (and it needs to be, so that I can easily spot
regressions), but number of test failures have been steadily dropping
and the results are getting more consistent, and none of the test
failures there are scary ones that need to be jumped on.

https://evilpiepirate.org/~testdashboard/ci?user=kmo&branch=bcachefs-testing

(We were at 150-160 failures per full run a few weeks ago, now 100-110.
The full runs also run fstests in 8 different configurations, so lots of
duplicates).

There have been a lot of new syzbot reports recently, and some of those
do look more concerning. I don't think this indicates regressions - this
looks to be like syzbot getting better with its code-coverage-guided
testing at finding interesting codepaths, and for the concerning ones I
don't think the code changed in the right timeframe for it to be a
regressions. I think several of the recent syzbot reports are all due to
the same bug, where it looks like we're not going read-only correctly
and interior btree updates are still happening - I suspect that's been
there for ages and an assert I recently added is making it more visible.

I am still heavily in triage mode, and filesystem repair/recovery bugs
take top priority. In general, my priorities are
- critical user reported bugs first
- failures from my automated test suite second (unless they're
  regressions)
- syzbot last, because there's been basically zero overlap with syzbot
  bugs and user-affecting bugs so far, and because that's been an easy
  place for new people to jump in and help.

[GIT,PULL] bcachefs fixes for 6.12-rc5

Pull-request

Message

Comments