[GIT,PULL] bcachefs fixes for 6.12-rc2

Message ID	cphtxla2se4gavql3re5xju7mqxld4rp6q4wbqephb6by5ibfa@5myddcaxerpb (mailing list archive)
State	New
Headers	show Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 631BC15854F for <linux-fsdevel@vger.kernel.org>; Sat, 5 Oct 2024 18:35:24 +0000 (UTC) Date: Sat, 5 Oct 2024 14:35:18 -0400 From: Kent Overstreet <kent.overstreet@linux.dev> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-bcachefs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [GIT PULL] bcachefs fixes for 6.12-rc2 Message-ID: <cphtxla2se4gavql3re5xju7mqxld4rp6q4wbqephb6by5ibfa@5myddcaxerpb> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
Series	[GIT,PULL] bcachefs fixes for 6.12-rc2 \| expand [GIT,PULL] bcachefs fixes for 6.12-rc2

Kent Overstreet Oct. 5, 2024, 6:35 p.m. UTC

Several more filesystems repaired, thank you to the users who have been
providing testing. The snapshots + unlinked fixes on top of this are
posted here:

https://lore.kernel.org/linux-bcachefs/20241005182955.1588763-1-kent.overstreet@linux.dev/T/#t

The following changes since commit 2007d28ec0095c6db0a24fd8bb8fe280c65446cd:

  bcachefs: rename version -> bversion for big endian builds (2024-09-29 23:55:52 -0400)

are available in the Git repository at:

  git://evilpiepirate.org/bcachefs.git tags/bcachefs-2024-10-05

for you to fetch changes up to 0f25eb4b60771f08fbcca878a8f7f88086d0c885:

  bcachefs: Rework logged op error handling (2024-10-04 20:25:32 -0400)

----------------------------------------------------------------
bcachefs fixes for 6.12-rc2

A lot of little fixes, bigger ones include:

- bcachefs's __wait_on_freeing_inode() was broken in rc1 due to vfs
  changes, now fixed along with another lost wakeup
- fragmentation LRU fixes; fsck now repairs successfully (this is the
  data structure copygc uses); along with some nice simplification.
- Rework logged op error handling, so that if logged op replay errors
  (due to another filesystem error) we delete the logged op instead of
  going into an infinite loop)
- Various small filesystem connectivitity repair fixes

The final part of this patch series, fixing snapshots + unlinked file
handling, is now out on the list - I'm giving that part of the series
more time for user testing.

----------------------------------------------------------------
Kent Overstreet (18):
      bcachefs: Fix bad shift in bch2_read_flag_list()
      bcachefs: Fix return type of dirent_points_to_inode_nowarn()
      bcachefs: Fix bch2_inode_is_open() check
      bcachefs: Fix trans_commit disk accounting revert
      bcachefs: Add missing wakeup to bch2_inode_hash_remove()
      bcachefs: Fix reattach_inode()
      bcachefs: Create lost+found in correct snapshot
      bcachefs: bkey errors are only AUTOFIX during read
      bcachefs: Make sure we print error that causes fsck to bail out
      bcachefs: Mark more errors AUTOFIX
      bcachefs: minor lru fsck fixes
      bcachefs: Kill alloc_v4.fragmentation_lru
      bcachefs: Check for directories with no backpointers
      bcachefs: Check for unlinked inodes with dirents
      bcachefs: Check for unlinked, non-empty dirs in check_inode()
      bcachefs: Kill snapshot arg to fsck_write_inode()
      bcachefs: Add warn param to subvol_get_snapshot, peek_inode
      bcachefs: Rework logged op error handling

 fs/bcachefs/alloc_background.c        |  30 ++++--
 fs/bcachefs/alloc_background_format.h |   2 +-
 fs/bcachefs/btree_gc.c                |   3 -
 fs/bcachefs/btree_trans_commit.c      |   3 +-
 fs/bcachefs/error.c                   |  23 +++-
 fs/bcachefs/error.h                   |   9 +-
 fs/bcachefs/fs.c                      |  33 +++---
 fs/bcachefs/fsck.c                    | 194 ++++++++++++++++++++++------------
 fs/bcachefs/inode.c                   |  44 +++-----
 fs/bcachefs/inode.h                   |  28 +++--
 fs/bcachefs/io_misc.c                 |  63 +++++++----
 fs/bcachefs/logged_ops.c              |  16 +--
 fs/bcachefs/logged_ops.h              |   2 +-
 fs/bcachefs/lru.c                     |  34 +++---
 fs/bcachefs/move.c                    |   2 +-
 fs/bcachefs/movinggc.c                |  12 ++-
 fs/bcachefs/sb-errors_format.h        |  33 +++---
 fs/bcachefs/subvolume.c               |  16 ++-
 fs/bcachefs/subvolume.h               |   2 +
 fs/bcachefs/util.c                    |   2 +-
 20 files changed, 342 insertions(+), 209 deletions(-)

Linus Torvalds Oct. 5, 2024, 10:34 p.m. UTC | #1

On Sat, 5 Oct 2024 at 11:35, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> Several more filesystems repaired, thank you to the users who have been
> providing testing. The snapshots + unlinked fixes on top of this are
> posted here:

I'm getting really fed up here Kent.

These have commit times from last night. Which makes me wonder how
much testing they got.

And before you start whining - again - about how you are fixing bugs,
let me remind you about the build failures you had on big-endian
machines because your patches had gotten ZERO testing outside your
tree.

That was just last week, and I'm getting the strong feeling that
absolutely nothing was learnt from the experience.

I have pulled this, but I searched for a couple of the commit messages
on the lists, and found *nothing* (ok, I found your pull request,
which obviously mentioned the first line of the commit messages).

I'm seriously thinking about just stopping pulling from you, because I
simply don't see you improving on your model. If you want to have an
experimental tree, you can damn well have one outside the mainline
kernel. I've told you before, and nothing seems to really make you
understand.

I was hoping and expecting that bcachefs being mainlined would
actually help development.  It has not. You're still basically the
only developer, there's no real sign that that will change, and you
seem to feel like sending me untested stuff that nobody else has ever
seen the day before the next rc release is just fine.

You're a smart person. I feel like I've given you enough hints. Why
don't you sit back and think about it, and let's make it clear: you
have exactly two choices here:

 (a) play better with others

 (b) take your toy and go home

Those are the choices.

                Linus

pr-tracker-bot@kernel.org Oct. 5, 2024, 10:36 p.m. UTC | #2

The pull request you sent on Sat, 5 Oct 2024 14:35:18 -0400:

> git://evilpiepirate.org/bcachefs.git tags/bcachefs-2024-10-05

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8f602276d3902642fdc3429b548d73c745446601

Thank you!

Kent Overstreet Oct. 5, 2024, 10:54 p.m. UTC | #3

On Sat, Oct 05, 2024 at 03:34:56PM GMT, Linus Torvalds wrote:
> On Sat, 5 Oct 2024 at 11:35, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >
> > Several more filesystems repaired, thank you to the users who have been
> > providing testing. The snapshots + unlinked fixes on top of this are
> > posted here:
> 
> I'm getting really fed up here Kent.
> 
> These have commit times from last night. Which makes me wonder how
> much testing they got.

The /commit/ dates are from last night, because I polish up commit
messages and reorder until the last might (I always push smaller fixes
up front and fixes that are likely to need rework to the back).

The vast majority of those fixes are all ~2 weeks old.

> And before you start whining - again - about how you are fixing bugs,
> let me remind you about the build failures you had on big-endian
> machines because your patches had gotten ZERO testing outside your
> tree.

No, there simply aren't that many people running big endian. I have
users building and running my trees on a daily basis. If I push
something broken before I go to bed I have bug reports waiting for me
_the next morning_ when I wake up.

> That was just last week, and I'm getting the strong feeling that
> absolutely nothing was learnt from the experience.
> 
> I have pulled this, but I searched for a couple of the commit messages
> on the lists, and found *nothing* (ok, I found your pull request,
> which obviously mentioned the first line of the commit messages).
> 
> I'm seriously thinking about just stopping pulling from you, because I
> simply don't see you improving on your model. If you want to have an
> experimental tree, you can damn well have one outside the mainline
> kernel. I've told you before, and nothing seems to really make you
> understand.

At this point, it's honestly debatable whether the experimental label
should apply. I'm getting bug reports that talk about production use and
working on metadata dumps where the superblock indicates the filesystem
has been in continuous use for years.

And many, many people talking about how even at this relatively early
point it doesn't fall over like btrfs does.

Let that sink in.

Btrfs has been mainline for years, and it still craps out on people. I
was just in a meeting two days ago, closing funding, and a big reason it
was an easy sell was because they have to run btrfs in _read only_ mode
because otherwise it craps out.

So if the existing process, the existing way of doing things, hasn't
been able to get btrfs to a point where people can rely on it after 10
years - perhaps you and the community don't know quite as much as you
think you do about the realities of what it takes to ship a working
filesystem.

And from where I sit, on the bcachefs side of things, things are going
smoothly and quickly. Bug reports are diminishing in frequency and
severity, even as userbase is going up; distros are picking it up (just
not Debian and Fedora); the timeline I laid out at LSF is still looking
reasonable.

> I was hoping and expecting that bcachefs being mainlined would
> actually help development.  It has not. You're still basically the
> only developer, there's no real sign that that will change, and you
> seem to feel like sending me untested stuff that nobody else has ever
> seen the day before the next rc release is just fine.

I've got a team lined up, just secured funding to start paying them and
it looks like I'm about to secure more.

And the community is growing, I'm reviewing and taking patches from more
people, and regularly mentoring them on the codebase.

And on top of all that, you shouting about "process" rings pretty hollow
when I _remember_ the days when you guys were rewriting core mm code in
rc kernels.

Given where bcachefs is at in the lifecycle of a big codebase being
stabilized, you should be expecting to see stuff like that here. Stuff
is getting found and fixed, and then we ship those fixes so we can find
the next stuff.

> You're a smart person. I feel like I've given you enough hints. Why
> don't you sit back and think about it, and let's make it clear: you
> have exactly two choices here:
> 
>  (a) play better with others
> 
>  (b) take your toy and go home

You've certainly yelled a lot...

Linus Torvalds Oct. 5, 2024, 11:15 p.m. UTC | #4

On Sat, 5 Oct 2024 at 15:54, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> The vast majority of those fixes are all ~2 weeks old.

With the patches not appearing on the list, that seems entirely irrelevant.

Apparently they are 2 weeks on IN YOUR TREE.

And absolutely nowhere else.

> Let that sink in.

Seriously.

You completely dodged my actual argument, except for pointing at how
we didn't have process two decades ago.

If you can't actually even face this, what's the point any more?

               Linus

Kent Overstreet Oct. 5, 2024, 11:40 p.m. UTC | #5

On Sat, Oct 05, 2024 at 04:15:25PM GMT, Linus Torvalds wrote:
> On Sat, 5 Oct 2024 at 15:54, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >
> > The vast majority of those fixes are all ~2 weeks old.
> 
> With the patches not appearing on the list, that seems entirely irrelevant.
> 
> Apparently they are 2 weeks on IN YOUR TREE.
> 
> And absolutely nowhere else.

If what you want is patches appearing on the list, I'm not unwilling to
make that change.

I take issue, and indeed even dig my heels in, when the only people
asking for that are _only_ yelling about that and aren't involved
otherwise.

But you will find that if you talk to me as one human being to another,
where we can share and listen to each other's concerns, I'm more than
happy to be reasonable.

But I'm not going to just toe the line when it's just yelling.

Seriously.

Because the last time you flipped out over a pull request, I spent the
rest of the cycle telling people "x y and z are fixed, but you'll have
to build my tree instead of running a released kernel". And that gets
tiresome; some of the bugs were significant - and no issues to date have
been found in the stuff you kicked back, which tells me my process is
just fine.

So let _that_ sink in. In order to support my userbase, as well as
iterate to find the _next_ set of bugs, I have to be able to ship
bugfixes in a timely manner, and if that's going to keep being an issue
perhaps I should be having those conversations with distro kernel
maintainers now, instead of later.

> > Let that sink in.
> 
> Seriously.
> 
> You completely dodged my actual argument, except for pointing at how
> we didn't have process two decades ago.
> 
> If you can't actually even face this, what's the point any more?

Face _what_ exactly? Because at this point, I can't even tell what it is
you want, what you're reacting to keeps shifting.

Kent Overstreet Oct. 5, 2024, 11:47 p.m. UTC | #6

On Sat, Oct 05, 2024 at 07:41:03PM GMT, Kent Overstreet wrote:
> Face _what_ exactly? Because at this point, I can't even tell what it is
> you want, what you're reacting to keeps shifting.

And more than that, I'm done with trying to cater, and I'm done with
these long winded rants. Look, I quite enjoy the direct approach, but
I'm done with having to apologize for you in order to calm people down
every time this happens.

If you're so convinced you know best, I invite you to start writing your
own filesystem. Go for it.

Linus Torvalds Oct. 6, 2024, 12:14 a.m. UTC | #7

On Sat, 5 Oct 2024 at 16:41, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> If what you want is patches appearing on the list, I'm not unwilling to
> make that change.

I want you to WORK WITH OTHERS. Including me - which means working
with the rules and processes we have in place.

Making the argument that we didn't have those rules twenty years ago
is just stupid.  We have them NOW, because we learnt better. You don't
get to say "look, you didn't have rules 20 years ago, so why should I
have them now?"

Patches appearing on the list is not some kind of sufficient thing.
It's the absolute minimal requirement. The fact that absolutely *NONE*
of the patches in your pull request showed up when I searched just
means that you clearly didn't even attempt to have others involved
(ok, I probably only searched for half of them and then I gave up in
disgust).

We literally had a bcachefs build failure last week. It showed up
pretty much immediately after I pulled your tree. And because you sent
in the bcachefs "fixes" with the bug the day before I cut rc1, we
ended up with a broken rc1.

And hey, mistakes happen. But when the *SAME* absolute disregard for
testing happens the very next weekend, do you really expect me to be
happy about it?

It's this complete disregard for anybody else that I find problematic.
You don't even try to get other developers involved, or follow
upstream rules.

And then you don't seem to even understand why I then complain.

In fact, you in the next email say:

> If you're so convinced you know best, I invite you to start writing your
> own filesystem. Go for it.

Not at all. I'm not interested in creating another bcachefs.

I'm contemplating just removing bcachefs entirely from the mainline
tree. Because you show again and again that you have no interest in
trying to make mainline work.

You can do it out of mainline. You did it for a decade, and that
didn't cause problems. I thought it would be better if it finally got
mainlined, but by all your actions you seem to really want to just
play in your own sandbox and not involve anybody else.

So if this is just your project and nobody else is expected to
participate, and you don't care about the fact that you break the
mainline build, why the hell did you want to be in the mainline tree
in the first place?

                   Linus

Kent Overstreet Oct. 6, 2024, 12:54 a.m. UTC | #8

On Sat, Oct 05, 2024 at 05:14:31PM GMT, Linus Torvalds wrote:
> On Sat, 5 Oct 2024 at 16:41, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >
> > If what you want is patches appearing on the list, I'm not unwilling to
> > make that change.
> 
> I want you to WORK WITH OTHERS. Including me - which means working
> with the rules and processes we have in place.

That has to work both ways.

Because when I explain my reasoning and processes, and it's ignored and
the same basic stuff is repeatedly yelled back, I'm just going to tune
it out.

I'm more than happy to work with people, but that's got to be a
conversation, and one based on mutual respect.

> Making the argument that we didn't have those rules twenty years ago
> is just stupid.  We have them NOW, because we learnt better. You don't
> get to say "look, you didn't have rules 20 years ago, so why should I
> have them now?"

That wasn't my argument.

My point was that a codebase at an earlier phase of development, that
hasn't had as long to stabilize, is inherently going to be more in flux.
Earlier in development fixing bugs is going to be a high prioritity,
relatively speaking, vs. avoiding regressions; sometimes the important
thing is to make forward progress, iterate, and ship and get feedback
from users.

I think the way you guys were doing development 20 years ago was
entirely appropriate at that time, and that's what I need to be doing
now; I need to be less conservative than the kernel as a whole.

That isn't to say that there aren't things we can and should be doing
to mitigate that (i.e. improving build testing, which now that I'm
finishing up with the current project I can do), or that there isn't
room for discussion on the particulars.

But seriously; bcachefs is shaping up far better than btrfs (which,
afaik _did_ try to play by all the rules), and process has _absolutely_
been a factor in that.

> Patches appearing on the list is not some kind of sufficient thing.
> It's the absolute minimal requirement. The fact that absolutely *NONE*
> of the patches in your pull request showed up when I searched just
> means that you clearly didn't even attempt to have others involved
> (ok, I probably only searched for half of them and then I gave up in
> disgust).

Those fixes were all pretty basic, and broadly speaking I know what
everyone else who's working on bcachefs is doing and what they're
working on. Hongbo has been quite helpful with a bunch of things (and
starting to help out in the bug tracker and IRC channel), Alan has been
digging around in six locks and most recently the cycle detector code,
and I've been answering questions as he learns his way around, Thomas
has been getting started on some backpointers scalability work.

Nothing will be served by having them review thoroughly a big stream of
small uninteresting fixes, it'd suck up all their time and prevent them
from doing anything productive. I have, quite literally, tried this and
had it happen on multiple occasions in the past.

I do post when I've got something more interesting going on, and I'd
been anticipating posting more as the stabilizing slows down.

> We literally had a bcachefs build failure last week. It showed up
> pretty much immediately after I pulled your tree. And because you sent
> in the bcachefs "fixes" with the bug the day before I cut rc1, we
> ended up with a broken rc1.
> 
> And hey, mistakes happen. But when the *SAME* absolute disregard for
> testing happens the very next weekend, do you really expect me to be
> happy about it?

And I do apologize for the build failure, and I will get on the
automated multi-arch build testing - that needed to happen anyways.

But I also have to remind you that I'm one of the few people who's
actually been pushing for more and better automated testing (I now have
infrastructure for the communty that anyone can use, just ask me for an
account) - and that's been another solo effort because so few people are
even interested, so the fact that this even came up grates on me. This
is a problem with a technical solution, and instead we're all just
arguing.

> It's this complete disregard for anybody else that I find problematic.
> You don't even try to get other developers involved, or follow
> upstream rules.

Linus, just because you don't see it doesn't mean it doesn't exist. I
spend a significant fraction of my day on IRC and the phone with both
users and other developers.

And "upstream rules" has always been a fairly ad-hoc thing, which even
you barely seem able to spell out.

It's taken _forever_ to get to "yes, you do want patches on the list",
and you seem to have some feeling that the volume of fixes is an issue
for you, but god only knows if that's more than a hazy feeling for you.

> > If you're so convinced you know best, I invite you to start writing your
> > own filesystem. Go for it.
> 
> Not at all. I'm not interested in creating another bcachefs.
> 
> I'm contemplating just removing bcachefs entirely from the mainline
> tree. Because you show again and again that you have no interest in
> trying to make mainline work.

You can do that, and it won't be the end of the world for me (although a
definite inconvenience) - but it's going to suck for a lot of users.

> You can do it out of mainline. You did it for a decade, and that
> didn't cause problems. I thought it would be better if it finally got
> mainlined, but by all your actions you seem to really want to just
> play in your own sandbox and not involve anybody else.
> 
> So if this is just your project and nobody else is expected to
> participate, and you don't care about the fact that you break the
> mainline build, why the hell did you want to be in the mainline tree
> in the first place?

Honestly?

Because I want Linux to have a filesystem we can all be proud of, that
users can rely on, that has a level of robustness and polish that we can
all aspire to.

Carl E. Thompson Oct. 6, 2024, 1:20 a.m. UTC | #9

Here is a user's perspective from someone who's built a career from Linux (thanks to all of you)...

The big hardship with testing bcachefs before it was merged into the kernel was that it couldn't be built as an out-of-tree module and instead a whole other kernel tree needed to be built. That was a pain.

Now, the core kernel infrastructure changes that bcachefs relies on are in the kernel and bcachefs can very easily and quickly be built as an out-of-tree module in just a few seconds. I submit to all involved that maybe that's the best way to go **for now**. 

Switching to out of tree for now would make it much easier for Kent to have the fast-paced development model he desires for this stage in bcachefs' development. It would also make using and testing bcachefs much easier for power users like me because when an issue is detected we could get a fix or new feature much faster than having to wait for a distribution to ship the next kernel version and with less ancillary risk than building and using a less-tested kernel tree. Distributions themselves also are very familiar with packaging up out-of-tree modules and distribution tools like dkms make using them dead simple even for casual users.

The way things are now isn't great for me as a Linux power user. I often want to use the latest or even RC kernels on my systems to get some new hardware support or other feature and I'm used to being able to do that without too many problems. But recently I've had to skip cutting-edge kernel versions that I otherwise wanted to try because there have been issues in bcachefs that I didn't want to have to face or work around. Switching to an out of tree module for now would be the best of all worlds for me because I could pick and choose which combination of kernel / bcachefs to use for each system and situation.

Just my 2¢.

Carl

> On 2024-10-05 5:14 PM PDT Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>  
> On Sat, 5 Oct 2024 at 16:41, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >
> > If what you want is patches appearing on the list, I'm not unwilling to
> > make that change.
> 
> I want you to WORK WITH OTHERS. Including me - which means working
> with the rules and processes we have in place.
> 
> Making the argument that we didn't have those rules twenty years ago
> is just stupid.  We have them NOW, because we learnt better. You don't
> get to say "look, you didn't have rules 20 years ago, so why should I
> have them now?"
> 
> Patches appearing on the list is not some kind of sufficient thing.
> It's the absolute minimal requirement. The fact that absolutely *NONE*
> of the patches in your pull request showed up when I searched just
> means that you clearly didn't even attempt to have others involved
> (ok, I probably only searched for half of them and then I gave up in
> disgust).
> 
> We literally had a bcachefs build failure last week. It showed up
> pretty much immediately after I pulled your tree. And because you sent
> in the bcachefs "fixes" with the bug the day before I cut rc1, we
> ended up with a broken rc1.
> 
> And hey, mistakes happen. But when the *SAME* absolute disregard for
> testing happens the very next weekend, do you really expect me to be
> happy about it?
> 
> It's this complete disregard for anybody else that I find problematic.
> You don't even try to get other developers involved, or follow
> upstream rules.
> 
> And then you don't seem to even understand why I then complain.
> 
> In fact, you in the next email say:
> 
> > If you're so convinced you know best, I invite you to start writing your
> > own filesystem. Go for it.
> 
> Not at all. I'm not interested in creating another bcachefs.
> 
> I'm contemplating just removing bcachefs entirely from the mainline
> tree. Because you show again and again that you have no interest in
> trying to make mainline work.
> 
> You can do it out of mainline. You did it for a decade, and that
> didn't cause problems. I thought it would be better if it finally got
> mainlined, but by all your actions you seem to really want to just
> play in your own sandbox and not involve anybody else.
> 
> So if this is just your project and nobody else is expected to
> participate, and you don't care about the fact that you break the
> mainline build, why the hell did you want to be in the mainline tree
> in the first place?
> 
>                    Linus

Kent Overstreet Oct. 6, 2024, 1:56 a.m. UTC | #10

On Sat, Oct 05, 2024 at 06:20:53PM GMT, Carl E. Thompson wrote:
> Here is a user's perspective from someone who's built a career from Linux (thanks to all of you)...
> 
> The big hardship with testing bcachefs before it was merged into the kernel was that it couldn't be built as an out-of-tree module and instead a whole other kernel tree needed to be built. That was a pain.
> 
> Now, the core kernel infrastructure changes that bcachefs relies on are in the kernel and bcachefs can very easily and quickly be built as an out-of-tree module in just a few seconds. I submit to all involved that maybe that's the best way to go **for now**. 
> 
> Switching to out of tree for now would make it much easier for Kent to have the fast-paced development model he desires for this stage in bcachefs' development. It would also make using and testing bcachefs much easier for power users like me because when an issue is detected we could get a fix or new feature much faster than having to wait for a distribution to ship the next kernel version and with less ancillary risk than building and using a less-tested kernel tree. Distributions themselves also are very familiar with packaging up out-of-tree modules and distribution tools like dkms make using them dead simple even for casual users.
> 
> The way things are now isn't great for me as a Linux power user. I
> often want to use the latest or even RC kernels on my systems to get
> some new hardware support or other feature and I'm used to being able
> to do that without too many problems. But recently I've had to skip
> cutting-edge kernel versions that I otherwise wanted to try because
> there have been issues in bcachefs that I didn't want to have to face
> or work around. Switching to an out of tree module for now would be
> the best of all worlds for me because I could pick and choose which
> combination of kernel / bcachefs to use for each system and situation.

Carl - thanks, I wasn't aware of this.

Can you give me details? 6.11 had the disk accounting rewrite, which was
huge and (necessarily) had some fallout, if you're seeing regressions
otherwise that are slipping through then - yes it's time to slow down
and reevaluate.

Details would be extremely helpful, so we can improve our regression
testing.

Carl E. Thompson Oct. 6, 2024, 3:06 a.m. UTC | #11

Yeah, of course there were the disk accounting issues and before that was the kernel upgrade-downgrade bug going from 6.8 back to 6.7. Currently over on Reddit at least one user is mention read errors and / or performance regressions on the current RC version that I'd rather avoid.

There were a number of other issues that cropped up in some earlier versions but not others such as deadlocks when using compression (particularly zstd), weirdness when using compression with 4k blocks and suspend / resume failures when using bcachefs. 

None of those things were a big deal to me as I mostly only use bcachefs on root filesystems which are of course easy to recreate. But I do currently use bcachefs for all the filesystems on my main laptop so issues there can be more of a pain.

As an example of potential issues I'd like to avoid I often upgrade my laptop and swap the old SSD in and am currently considering pulling the trigger on a Ryzen AI laptop such as the ProArt P16. However, this new processor has some cutting edge features only fully supported in 6.12 so I'd prefer to use that kernel if I can. But... because according to Reddit there are apparently issues with bcachefs in the 6.12RC kernels that means I am hesitant to buy the laptop and use the RC kernel the carefree manor I normally would. Yeah, first world problems!

Speaking of Reddit, I don't know if you saw it but a user there quotes you as saying users who use release candidates should expect them to be "dangerous as crap." I could not find a post where you said that in the thread that user pointed to but if you **did** say something like that then I guess I have a different concept of what "release candidate" means.

So for me it would be a lot easier if bcachefs versions were decoupled from kernel versions. 

Thanks,
Carl

> On 2024-10-05 6:56 PM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
>  
> On Sat, Oct 05, 2024 at 06:20:53PM GMT, Carl E. Thompson wrote:
> > Here is a user's perspective from someone who's built a career from Linux (thanks to all of you)...
> > 
> > The big hardship with testing bcachefs before it was merged into the kernel was that it couldn't be built as an out-of-tree module and instead a whole other kernel tree needed to be built. That was a pain.
> > 
> > Now, the core kernel infrastructure changes that bcachefs relies on are in the kernel and bcachefs can very easily and quickly be built as an out-of-tree module in just a few seconds. I submit to all involved that maybe that's the best way to go **for now**. 
> > 
> > Switching to out of tree for now would make it much easier for Kent to have the fast-paced development model he desires for this stage in bcachefs' development. It would also make using and testing bcachefs much easier for power users like me because when an issue is detected we could get a fix or new feature much faster than having to wait for a distribution to ship the next kernel version and with less ancillary risk than building and using a less-tested kernel tree. Distributions themselves also are very familiar with packaging up out-of-tree modules and distribution tools like dkms make using them dead simple even for casual users.
> > 
> > The way things are now isn't great for me as a Linux power user. I
> > often want to use the latest or even RC kernels on my systems to get
> > some new hardware support or other feature and I'm used to being able
> > to do that without too many problems. But recently I've had to skip
> > cutting-edge kernel versions that I otherwise wanted to try because
> > there have been issues in bcachefs that I didn't want to have to face
> > or work around. Switching to an out of tree module for now would be
> > the best of all worlds for me because I could pick and choose which
> > combination of kernel / bcachefs to use for each system and situation.
> 
> Carl - thanks, I wasn't aware of this.
> 
> Can you give me details? 6.11 had the disk accounting rewrite, which was
> huge and (necessarily) had some fallout, if you're seeing regressions
> otherwise that are slipping through then - yes it's time to slow down
> and reevaluate.
> 
> Details would be extremely helpful, so we can improve our regression
> testing.

Kent Overstreet Oct. 6, 2024, 3:42 a.m. UTC | #12

On Sat, Oct 05, 2024 at 08:06:31PM GMT, Carl E. Thompson wrote:
> Yeah, of course there were the disk accounting issues and before that
> was the kernel upgrade-downgrade bug going from 6.8 back to 6.7.
> Currently over on Reddit at least one user is mention read errors and
> / or performance regressions on the current RC version that I'd rather
> avoid.

So, disk accounting rewrite: that code was basically complete, just
baking, for a full six months before merging - so, not exactly rushed,
and it saw user testing before merging. Given the size, and how invasive
it was, some regressions were inevitable and they were pretty small and
localized.

The upgrade/downgrade bug was really nasty, yeah.

> There were a number of other issues that cropped up in some earlier
> versions but not others such as deadlocks when using compression
> (particularly zstd), weirdness when using compression with 4k blocks
> and suspend / resume failures when using bcachefs. 

I don't believe any of those were bcachefs regressions, although some
are bcachefs bugs - suspend/resume for example there's still an open
bug.

I've seen multiple compression bugs that were mostly not bcachefs bugs
(i.e. there was a zstd bug that affected bcachefs that took forever to
fix, and there's a recently reported LZ4HC bug that may or may not be
bcachefs).

> None of those things were a big deal to me as I mostly only use
> bcachefs on root filesystems which are of course easy to recreate. But
> I do currently use bcachefs for all the filesystems on my main laptop
> so issues there can be more of a pain.

Are you talking about issues you've hit, or issues that you've seen
reported? Because the main subject of discussion is regressions.

> 
> As an example of potential issues I'd like to avoid I often upgrade my
> laptop and swap the old SSD in and am currently considering pulling
> the trigger on a Ryzen AI laptop such as the ProArt P16. However, this
> new processor has some cutting edge features only fully supported in
> 6.12 so I'd prefer to use that kernel if I can. But... because
> according to Reddit there are apparently issues with bcachefs in the
> 6.12RC kernels that means I am hesitant to buy the laptop and use the
> RC kernel the carefree manor I normally would. Yeah, first world
> problems!

The main 6.12-rc1 issue was actually caused by Christain's change to
inode state wakeups - it was a VFS change where bcachefs wasn't updated.

That should've been caught by automated testing on fs-next - so that
one's on me; fs-next is still fairly new and I still need to get that
going.

> Speaking of Reddit, I don't know if you saw it but a user there quotes
> you as saying users who use release candidates should expect them to
> be "dangerous as crap." I could not find a post where you said that in
> the thread that user pointed to but if you **did** say something like
> that then I guess I have a different concept of what "release
> candidate" means.

I don't recall saying that, but I did say something about Canonical
shipping rc kernels to the general population - that's a bit crazy.
Rc kernels should generally be run by users who know what they're
getting into and have some ability to help test and debug.

> So for me it would be a lot easier if bcachefs versions were decoupled
> from kernel versions. 

Well, this sounds more like generalized concern than anything concrete I
can act on, to be honest - but if you've got regressions that you've
been hit by, please tell me about those.

The feedback I've generally been getting has been that each release has
been getting steadily better, and more stable and usable - and lately
pretty much all I've been doing has been fixing user reported bugs, so
those I naturally want to get out quickly if the bugs are serious enough
and I'm confident that they'll be low risk - and there has been a lot of
that.

The shrinker fixes for fsck OOMing that didn't land in 6.11 were
particularly painful for a lot of users.

The key cache/rcu pending work that didn't land in 6.11, that was a
major usability issue for several users that I talked to.

The past couple weeks I've been working on filesystem repair and
snapshots issues for several users that were inadvertently torture
testing snapshots - the fixes are turning out to be fairly involved, but
I'm also weighing there "how likely are other users to be affected by
this, and do we want to wait another 3 months", and I've got multiple
reports of affected users.

Theodore Ts'o Oct. 6, 2024, 4:30 a.m. UTC | #13

On Sat, Oct 05, 2024 at 08:54:32PM -0400, Kent Overstreet wrote:
> But I also have to remind you that I'm one of the few people who's
> actually been pushing for more and better automated testing (I now have
> infrastructure for the communty that anyone can use, just ask me for an
> account) - and that's been another solo effort because so few people are
> even interested, so the fact that this even came up grates on me. This
> is a problem with a technical solution, and instead we're all just
> arguing.

Um, hello?  All of the file system developers have our own automated
testing, and my system, {kvm,gce,android}-xfstests[1][[2] and Luis's
kdevops[3] are both availble for others to use.  We've done quite a
lot in terms of doumentations and making it easier for others to use.
(And that's not incluing the personal test runners used by folks like
Josef, Cristoph, Dave, and Darrick.)

[1] https://thunk.org/gce-xfstest
[2] https://github.com/tytso/xfstests-bld
[3] https://github.com/linux-kdevops/kdevops

That's why we're not particularly interested in yours --- my system
has been in active use since 2011, and it's been well-tuned for me and
others to use.  (For example, Leah has been using it for XFS stable
backports, and it's also used for testing Google's Data Center
kernels, and GCE's Cloud Optimized OS.)

You may believe that yours is better than anyone else's, but with
respect, I disagree, at least for my own workflow and use case.  And
if you look at the number of contributors in both Luis and my xfstests
runners[2][3], I suspect you'll find that we have far more
contributors in our git repo than your solo effort....

					- Ted

Kent Overstreet Oct. 6, 2024, 4:33 a.m. UTC | #14

On Sun, Oct 06, 2024 at 12:30:02AM GMT, Theodore Ts'o wrote:
> On Sat, Oct 05, 2024 at 08:54:32PM -0400, Kent Overstreet wrote:
> > But I also have to remind you that I'm one of the few people who's
> > actually been pushing for more and better automated testing (I now have
> > infrastructure for the communty that anyone can use, just ask me for an
> > account) - and that's been another solo effort because so few people are
> > even interested, so the fact that this even came up grates on me. This
> > is a problem with a technical solution, and instead we're all just
> > arguing.
> 
> Um, hello?  All of the file system developers have our own automated
> testing, and my system, {kvm,gce,android}-xfstests[1][[2] and Luis's
> kdevops[3] are both availble for others to use.  We've done quite a
> lot in terms of doumentations and making it easier for others to use.
> (And that's not incluing the personal test runners used by folks like
> Josef, Cristoph, Dave, and Darrick.)
> 
> [1] https://thunk.org/gce-xfstest
> [2] https://github.com/tytso/xfstests-bld
> [3] https://github.com/linux-kdevops/kdevops
> 
> That's why we're not particularly interested in yours --- my system
> has been in active use since 2011, and it's been well-tuned for me and
> others to use.  (For example, Leah has been using it for XFS stable
> backports, and it's also used for testing Google's Data Center
> kernels, and GCE's Cloud Optimized OS.)
> 
> You may believe that yours is better than anyone else's, but with
> respect, I disagree, at least for my own workflow and use case.  And
> if you look at the number of contributors in both Luis and my xfstests
> runners[2][3], I suspect you'll find that we have far more
> contributors in our git repo than your solo effort....

Correct me if I'm wrong, but your system isn't available to the
community, and I haven't seen a CI or dashboard for kdevops?

Believe me, I would love to not be sinking time into this as well, but
we need to standardize on something everyone can use.

Martin Steigerwald Oct. 6, 2024, 11:49 a.m. UTC | #15

Hi Kent, hi Linus.

Kent Overstreet - 06.10.24, 02:54:32 CEST:
> On Sat, Oct 05, 2024 at 05:14:31PM GMT, Linus Torvalds wrote:
> > On Sat, 5 Oct 2024 at 16:41, Kent Overstreet 
<kent.overstreet@linux.dev> wrote:
> > > If what you want is patches appearing on the list, I'm not unwilling
> > > to
> > > make that change.
> > 
> > I want you to WORK WITH OTHERS. Including me - which means working
> > with the rules and processes we have in place.
> 
> That has to work both ways.

Exactly, Kent.

And it is my impression from reading the whole thread up to now and from 
reading previous threads it is actually about: Having your way and your 
way only.

That is not exactly "work both ways".

Quite similarly regarding your stand towards distributions like Debian.

Sure you can question well established rules all the way you want and 
maybe you are even right about it. I do not feel qualified enough to judge 
on that. I am all for challenging well established rules on justified 
grounds…

But… even if that is the case it is still a negotiation process. Expecting 
that communities change well established rules on the spot just cause you 
are asking for it… quite bold if you ask me. It would be a negotiation 
process and work both ways would mean to agree on some kind of middle 
ground. But it appears to me you do not seem to have the patience for such 
a process. So it is arguing on both sides which costs a lot of energy of 
everyone involved.

From what I perceive you are actually actively working against well 
established rules. And you are surprised on the reaction? That is kind of 
naive if you ask me.

At least you wrote you are willing to post patches to the mailing list: So 
why not start with at least that *minimal* requirement according to Linus 
as a step you do? Maybe even just as a sign of good will towards the 
kernel community? That has been asked of you concretely, so why not just 
do it?

Maybe this can work out by negotiating a middle ground going one little 
step at a time?

I still do have a BCacheFS on my laptop for testing, but meanwhile I 
wonder whether some of the crazy kernel regressions I have seen with the 
last few kernels where exactly related to having mounted that BCacheFS 
test filesystem. I am tempted to replace the BCacheFS with a BTRFS just to 
find out.

Lastly 6.10.12-1 Debian kernel crashes on a pool-spawner thread when I 
enter the command „reboot“. That is right a reboot crashes the system – I 
never have seen anything this crazy with any Linux kernel so far! I have 
made a photo of it but after that long series of regressions I am even too 
tired to post a bug report about it just to be told again to bisect the 
issue. And it is not the first work queue related issue I found between 6.8 
and 6.11 kernels.

Actually I think I just replace that BCacheFS with another BTRFS in order 
to see whether it reduces the amount of crazy regressions I got so fed up 
with recently. Especially its not fair to report all of this to the Lenovo 
Linux community guy Mark Pearson in case its not even related to the new 
ThinkPad T14 AMD Gen 5 I am using. Mind you that series of regressions 
started with a T14 AMD Gen 1 roughly at the time I started testing 
BCacheFS and I had hoped they go away with the new laptop. Additionally I 
have not seen a single failure with BTRFS on any on my systems – including 
quite some laptops and several servers, even using LXC containers – for… I 
don't remember when. Since kernel 4.6 BTRFS at least for me is rock 
stable. And I agree, it took a huge lot of time until it was stable. But 
whether that is due to the processes you criticize or other reasons or a 
combination thereof… do you know for sure?

I am wondering: did the mainline kernel just get so much more unstable in 
the last 3-6 months or may there be a relationship to the test BCacheFS 
filesystem I was using that eluded me so far. Of course, I do not know for 
now, but reading Carl's mails really made me wonder.

Maybe there is none, so don't get me wrong… but reading this thread got me 
suspicious now. I am happily proven wrong on that suspicion and I commit 
to report back on it. Especially when the amount of regressions does not 
decline and I got suspicious of BCacheFS unjustly.

Best,

Kent Overstreet Oct. 6, 2024, 5:18 p.m. UTC | #16

On Sun, Oct 06, 2024 at 01:49:23PM GMT, Martin Steigerwald wrote:
> Hi Kent, hi Linus.
> 
> Kent Overstreet - 06.10.24, 02:54:32 CEST:
> > On Sat, Oct 05, 2024 at 05:14:31PM GMT, Linus Torvalds wrote:
> > > On Sat, 5 Oct 2024 at 16:41, Kent Overstreet 
> <kent.overstreet@linux.dev> wrote:
> > > > If what you want is patches appearing on the list, I'm not unwilling
> > > > to
> > > > make that change.
> > > 
> > > I want you to WORK WITH OTHERS. Including me - which means working
> > > with the rules and processes we have in place.
> > 
> > That has to work both ways.
> 
> Exactly, Kent.
> 
> And it is my impression from reading the whole thread up to now and from 
> reading previous threads it is actually about: Having your way and your 
> way only.
> 
> That is not exactly "work both ways".
> 
> Quite similarly regarding your stand towards distributions like Debian.

My issue wasn't with Debian as a whole; it was with one particular
packaging rule which was causing issues, and a maintainer who - despite
warnings that it would cause issues - broke the build and sat on it,
leaving a broken version up, which resulted in users unable to access
their filesystems when they couldn't mount in degraded mode.

> I still do have a BCacheFS on my laptop for testing, but meanwhile I 
> wonder whether some of the crazy kernel regressions I have seen with the 
> last few kernels where exactly related to having mounted that BCacheFS 
> test filesystem. I am tempted to replace the BCacheFS with a BTRFS just to 
> find out.

I think you should be looking elsewhere - there have been zero reports
of random crashes or anything like what you're describing. Even in
syzbot testing we've been pretty free from the kind of memory safety
issues that would cause random crashes

The closest bugs to what you're describing would be the
__wait_on_freeing_inode() deadlock in 6.12-rc1, and the LZ4HC crash that
I've yet to triage - but you specifically have to be using lz4:15
compression to hit that path.

The worst syzbot has come up with is something strange at the boundary
with the crypto code, and I haven't seen any user reports that line up
with that one.

Linus Torvalds Oct. 6, 2024, 7:04 p.m. UTC | #17

On Sat, 5 Oct 2024 at 21:33, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> On Sun, Oct 06, 2024 at 12:30:02AM GMT, Theodore Ts'o wrote:
> >
> > You may believe that yours is better than anyone else's, but with
> > respect, I disagree, at least for my own workflow and use case.  And
> > if you look at the number of contributors in both Luis and my xfstests
> > runners[2][3], I suspect you'll find that we have far more
> > contributors in our git repo than your solo effort....
>
> Correct me if I'm wrong, but your system isn't available to the
> community, and I haven't seen a CI or dashboard for kdevops?
>
> Believe me, I would love to not be sinking time into this as well, but
> we need to standardize on something everyone can use.

I really don't think we necessarily need to standardize. Certainly not
across completely different subsystems.

Maybe filesystem people have something in common, but honestly, even
that is rather questionable. Different filesystems have enough
different features that you will have different testing needs.

And a filesystem tree and an architecture tree (or the networking
tree, or whatever) have basically almost _zero_ overlap in testing -
apart from the obvious side of just basic build and boot testing.

And don't even get me started on drivers, which have a whole different
thing and can generally not be tested in some random VM at all.

So no. People should *not* try to standardize on something everyone can use.

But _everybody_ should participate in the basic build testing (and the
basic boot testing we have, even if it probably doesn't exercise much
of most subsystems).  That covers a *lot* of stuff that various
domain-specific testing does not (and generally should not).

For example, when you do filesystem-specific testing, you very seldom
have much issues with different compilers or architectures. Sure,
there can be compiler version issues that affect behavior, but let's
be honest: it's very very rare. And yes, there are big-endian machines
and the whole 32-bit vs 64-bit thing, and that can certainly affect
your filesystem testing, but I would expect it to be a fairly rare and
secondary thing for you to worry about when you try to stress your
filesystem for correctness.

But build and boot testing? All those random configs, all those odd
architectures, and all those odd compilers *do* affect build testing.
So you as a filesystem maintainer should *not* generally strive to do
your own basic build test, but very much participate in the generic
build test that is being done by various bots (not just on linux-next,
but things like the 0day bot on various patch series posted to the
list etc).

End result: one size does not fit all. But I get unhappy when I see
some subsystem that doesn't seem to participate in what I consider the
absolute bare minimum.

Btw, there are other ways to make me less unhappy. For example, a
couple of years ago, we had a string of issues with the networking
tree. Not because there was any particular maintenance issue, but
because the networking tree is basically one of the biggest subsystems
there are, and so bugs just happen more for that simple reason. Random
driver issues that got found resolved quickly, but that kept happening
in rc releases (or even final releases).

And that was *despite* the networking fixes generally having been in linux-next.

Now, the reason I mention the networking tree is that the one simple
thing that made it a lot less stressful was that I asked whether the
networking fixes pulls could just come in on Thursday instead of late
on Friday or Saturday. That meant that any silly things that the bots
picked up on (or good testers picked up on quickly) now had an extra
day or two to get resolved.

Now, it may be that the string of unfortunate networking issues that
caused this policy were entirely just bad luck, and we just haven't
had that. But the networking pull still comes in on Thursdays, and
we've been doing it that way for four years, and it seems to have
worked out well for both sides. I certainly feel a lot better about
being able to do the (sometimes fairly sizeable) pull on a Thursday,
knowing that if there is some last-minute issue, we can still fix just
*that* before the rc or final release.

And hey, that's literally just a "this was how we dealt with one
particular situation". Not everybody needs to have the same rules,
because the exact details will be different. I like doing releases on
Sundays, because that way the people who do a fairly normal Mon-Fri
week come in to a fresh release (whether rc or not). And people tend
to like sending in their "work of the week" to me on Fridays, so I get
a lot of pull requests on Friday, and most of the time that works just
fine.

So the networking tree timing policy ended up working quite well for
that, but there's no reason it should be "The Rule" and that everybody
should do it. But maybe it would lessen the stress on both sides for
bcachefs too if we aimed for that kind of thing?

             Linus

Kent Overstreet Oct. 6, 2024, 7:29 p.m. UTC | #18

On Sun, Oct 06, 2024 at 12:04:45PM GMT, Linus Torvalds wrote:
> On Sat, 5 Oct 2024 at 21:33, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >
> > On Sun, Oct 06, 2024 at 12:30:02AM GMT, Theodore Ts'o wrote:
> > >
> > > You may believe that yours is better than anyone else's, but with
> > > respect, I disagree, at least for my own workflow and use case.  And
> > > if you look at the number of contributors in both Luis and my xfstests
> > > runners[2][3], I suspect you'll find that we have far more
> > > contributors in our git repo than your solo effort....
> >
> > Correct me if I'm wrong, but your system isn't available to the
> > community, and I haven't seen a CI or dashboard for kdevops?
> >
> > Believe me, I would love to not be sinking time into this as well, but
> > we need to standardize on something everyone can use.
> 
> I really don't think we necessarily need to standardize. Certainly not
> across completely different subsystems.
> 
> Maybe filesystem people have something in common, but honestly, even
> that is rather questionable. Different filesystems have enough
> different features that you will have different testing needs.
> 
> And a filesystem tree and an architecture tree (or the networking
> tree, or whatever) have basically almost _zero_ overlap in testing -
> apart from the obvious side of just basic build and boot testing.
> 
> And don't even get me started on drivers, which have a whole different
> thing and can generally not be tested in some random VM at all.

Drivers are obviously a whole different ballgame, but what I'm after is
more
- tooling the community can use
- some level of common infrastructure, so we're not all rolling our own.

"Test infrastructure the community can use" is a big one, because
enabling the community and making it easier for people to participate
and do real development is where our pipeline of new engineers comes
from.

Over the past 15 years, I've seen the filesystem community get smaller
and older, and that's not a good thing. I've had some good success with
giving ktest access to people in the community, who then start using it
actively and contributing (small, so far) patches (and interesting, a
lot of the new activity is from China) - this means they can do
development at a reasonable pace and I don't have to look at their code
until it's actually passing all the tests, which is _huge_.

And filesystem tests take overnight to run on a single machine, so
having something that gets them results back in 20 minutes is also huge.

The other thing I'd really like is to take the best of what we've got
for testrunner/CI dashboard (and opinions will vary, but of course I
like ktest the best) and make it available to other subsystems (mm,
block, kselftests) because not everyone has time to roll their own.

That takes a lot of facetime - getting to know people's workflows,
porting tests - so it hasn't happened as much as I'd like, but it's
still an active interest of mine.

> So no. People should *not* try to standardize on something everyone can use.
> 
> But _everybody_ should participate in the basic build testing (and the
> basic boot testing we have, even if it probably doesn't exercise much
> of most subsystems).  That covers a *lot* of stuff that various
> domain-specific testing does not (and generally should not).
> 
> For example, when you do filesystem-specific testing, you very seldom
> have much issues with different compilers or architectures. Sure,
> there can be compiler version issues that affect behavior, but let's
> be honest: it's very very rare. And yes, there are big-endian machines
> and the whole 32-bit vs 64-bit thing, and that can certainly affect
> your filesystem testing, but I would expect it to be a fairly rare and
> secondary thing for you to worry about when you try to stress your
> filesystem for correctness.

But - a big gap right now is endian /portability/, and that one is a
pain to cover with automated tests because you either need access to
both big and little endian hardware (at a minumm for creating test
images), or you need to run qemu in full-emulation mode, which is pretty
unbearably slow.

> But build and boot testing? All those random configs, all those odd
> architectures, and all those odd compilers *do* affect build testing.
> So you as a filesystem maintainer should *not* generally strive to do
> your own basic build test, but very much participate in the generic
> build test that is being done by various bots (not just on linux-next,
> but things like the 0day bot on various patch series posted to the
> list etc).
> 
> End result: one size does not fit all. But I get unhappy when I see
> some subsystem that doesn't seem to participate in what I consider the
> absolute bare minimum.

So the big issue for me has been that with the -next/0day pipeline, I
have no visibility into when it finishes; which means it has to go onto
my mental stack of things to watch for and becomes yet another thing to
pipeline, and the more I have to pipeline the more I lose track of
things.

(Seriously: when I am constantly tracking 5 different bug reports and
talking to 5 different users, every additional bit of mental state I
have to remember is death by a thousand cuts).

Which would all be solved with a dashboard - which is why adding the
bulid testing to ktest (or ideally, stealing _all_ the 0day tests for
ktest) is becoming a bigger and bigger priority.

> Btw, there are other ways to make me less unhappy. For example, a
> couple of years ago, we had a string of issues with the networking
> tree. Not because there was any particular maintenance issue, but
> because the networking tree is basically one of the biggest subsystems
> there are, and so bugs just happen more for that simple reason. Random
> driver issues that got found resolved quickly, but that kept happening
> in rc releases (or even final releases).
> 
> And that was *despite* the networking fixes generally having been in linux-next.

Yeah, same thing has been going on in filesystem land, which is why now
have fs-next that we're supposed to be targeting our testing automation
at.

That one will likely come slower for me, because I need to clear out a
bunch of CI failing tests before I'll want to look at that, but it's on
my radar.

> Now, the reason I mention the networking tree is that the one simple
> thing that made it a lot less stressful was that I asked whether the
> networking fixes pulls could just come in on Thursday instead of late
> on Friday or Saturday. That meant that any silly things that the bots
> picked up on (or good testers picked up on quickly) now had an extra
> day or two to get resolved.

Ok, if fixes coming in on Saturday is an issue for you that's something
I can absolutely change. The only _critical_ one for rc2 was the
__wait_for_freeing_inode() fix (which did come in late), the rest
could've waited until Monday.

> Now, it may be that the string of unfortunate networking issues that
> caused this policy were entirely just bad luck, and we just haven't
> had that. But the networking pull still comes in on Thursdays, and
> we've been doing it that way for four years, and it seems to have
> worked out well for both sides. I certainly feel a lot better about
> being able to do the (sometimes fairly sizeable) pull on a Thursday,
> knowing that if there is some last-minute issue, we can still fix just
> *that* before the rc or final release.
> 
> And hey, that's literally just a "this was how we dealt with one
> particular situation". Not everybody needs to have the same rules,
> because the exact details will be different. I like doing releases on
> Sundays, because that way the people who do a fairly normal Mon-Fri
> week come in to a fresh release (whether rc or not). And people tend
> to like sending in their "work of the week" to me on Fridays, so I get
> a lot of pull requests on Friday, and most of the time that works just
> fine.
> 
> So the networking tree timing policy ended up working quite well for
> that, but there's no reason it should be "The Rule" and that everybody
> should do it. But maybe it would lessen the stress on both sides for
> bcachefs too if we aimed for that kind of thing?

Yeah, that sounds like the plan then.

Alan Huang Oct. 6, 2024, 9:31 p.m. UTC | #19

On Oct 7, 2024, at 03:29, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> On Sun, Oct 06, 2024 at 12:04:45PM GMT, Linus Torvalds wrote:
>> On Sat, 5 Oct 2024 at 21:33, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>>> 
>>> On Sun, Oct 06, 2024 at 12:30:02AM GMT, Theodore Ts'o wrote:
>>>> 
>>>> You may believe that yours is better than anyone else's, but with
>>>> respect, I disagree, at least for my own workflow and use case.  And
>>>> if you look at the number of contributors in both Luis and my xfstests
>>>> runners[2][3], I suspect you'll find that we have far more
>>>> contributors in our git repo than your solo effort....
>>> 
>>> Correct me if I'm wrong, but your system isn't available to the
>>> community, and I haven't seen a CI or dashboard for kdevops?
>>> 
>>> Believe me, I would love to not be sinking time into this as well, but
>>> we need to standardize on something everyone can use.
>> 
>> I really don't think we necessarily need to standardize. Certainly not
>> across completely different subsystems.
>> 
>> Maybe filesystem people have something in common, but honestly, even
>> that is rather questionable. Different filesystems have enough
>> different features that you will have different testing needs.
>> 
>> And a filesystem tree and an architecture tree (or the networking
>> tree, or whatever) have basically almost _zero_ overlap in testing -
>> apart from the obvious side of just basic build and boot testing.
>> 
>> And don't even get me started on drivers, which have a whole different
>> thing and can generally not be tested in some random VM at all.
> 
> Drivers are obviously a whole different ballgame, but what I'm after is
> more
> - tooling the community can use
> - some level of common infrastructure, so we're not all rolling our own.
> 
> "Test infrastructure the community can use" is a big one, because
> enabling the community and making it easier for people to participate
> and do real development is where our pipeline of new engineers comes
> from.

Yeah, the CI is really helpful, at least for those who want to get involved in
the development of bcachefs. As a new comer, I’m not at all interested in setting up
a separate testing environment at the very beginning, which might be time-consuming
and costly.

> 
> Over the past 15 years, I've seen the filesystem community get smaller
> and older, and that's not a good thing. I've had some good success with
> giving ktest access to people in the community, who then start using it
> actively and contributing (small, so far) patches (and interesting, a
> lot of the new activity is from China) - this means they can do
> development at a reasonable pace and I don't have to look at their code
> until it's actually passing all the tests, which is _huge_.
> 
> And filesystem tests take overnight to run on a single machine, so
> having something that gets them results back in 20 minutes is also huge.

Exactly, I can verify some ideas very quickly with the help of the CI.

So, a big thank you for all the effort you've put into it!

> 
> The other thing I'd really like is to take the best of what we've got
> for testrunner/CI dashboard (and opinions will vary, but of course I
> like ktest the best) and make it available to other subsystems (mm,
> block, kselftests) because not everyone has time to roll their own.
> 
> That takes a lot of facetime - getting to know people's workflows,
> porting tests - so it hasn't happened as much as I'd like, but it's
> still an active interest of mine.
> 
>> So no. People should *not* try to standardize on something everyone can use.
>> 
>> But _everybody_ should participate in the basic build testing (and the
>> basic boot testing we have, even if it probably doesn't exercise much
>> of most subsystems).  That covers a *lot* of stuff that various
>> domain-specific testing does not (and generally should not).
>> 
>> For example, when you do filesystem-specific testing, you very seldom
>> have much issues with different compilers or architectures. Sure,
>> there can be compiler version issues that affect behavior, but let's
>> be honest: it's very very rare. And yes, there are big-endian machines
>> and the whole 32-bit vs 64-bit thing, and that can certainly affect
>> your filesystem testing, but I would expect it to be a fairly rare and
>> secondary thing for you to worry about when you try to stress your
>> filesystem for correctness.
> 
> But - a big gap right now is endian /portability/, and that one is a
> pain to cover with automated tests because you either need access to
> both big and little endian hardware (at a minumm for creating test
> images), or you need to run qemu in full-emulation mode, which is pretty
> unbearably slow.
> 
>> But build and boot testing? All those random configs, all those odd
>> architectures, and all those odd compilers *do* affect build testing.
>> So you as a filesystem maintainer should *not* generally strive to do
>> your own basic build test, but very much participate in the generic
>> build test that is being done by various bots (not just on linux-next,
>> but things like the 0day bot on various patch series posted to the
>> list etc).
>> 
>> End result: one size does not fit all. But I get unhappy when I see
>> some subsystem that doesn't seem to participate in what I consider the
>> absolute bare minimum.
> 
> So the big issue for me has been that with the -next/0day pipeline, I
> have no visibility into when it finishes; which means it has to go onto
> my mental stack of things to watch for and becomes yet another thing to
> pipeline, and the more I have to pipeline the more I lose track of
> things.
> 
> (Seriously: when I am constantly tracking 5 different bug reports and
> talking to 5 different users, every additional bit of mental state I
> have to remember is death by a thousand cuts).
> 
> Which would all be solved with a dashboard - which is why adding the
> bulid testing to ktest (or ideally, stealing _all_ the 0day tests for
> ktest) is becoming a bigger and bigger priority.
> 
>> Btw, there are other ways to make me less unhappy. For example, a
>> couple of years ago, we had a string of issues with the networking
>> tree. Not because there was any particular maintenance issue, but
>> because the networking tree is basically one of the biggest subsystems
>> there are, and so bugs just happen more for that simple reason. Random
>> driver issues that got found resolved quickly, but that kept happening
>> in rc releases (or even final releases).
>> 
>> And that was *despite* the networking fixes generally having been in linux-next.
> 
> Yeah, same thing has been going on in filesystem land, which is why now
> have fs-next that we're supposed to be targeting our testing automation
> at.
> 
> That one will likely come slower for me, because I need to clear out a
> bunch of CI failing tests before I'll want to look at that, but it's on
> my radar.
> 
>> Now, the reason I mention the networking tree is that the one simple
>> thing that made it a lot less stressful was that I asked whether the
>> networking fixes pulls could just come in on Thursday instead of late
>> on Friday or Saturday. That meant that any silly things that the bots
>> picked up on (or good testers picked up on quickly) now had an extra
>> day or two to get resolved.
> 
> Ok, if fixes coming in on Saturday is an issue for you that's something
> I can absolutely change. The only _critical_ one for rc2 was the
> __wait_for_freeing_inode() fix (which did come in late), the rest
> could've waited until Monday.
> 
>> Now, it may be that the string of unfortunate networking issues that
>> caused this policy were entirely just bad luck, and we just haven't
>> had that. But the networking pull still comes in on Thursdays, and
>> we've been doing it that way for four years, and it seems to have
>> worked out well for both sides. I certainly feel a lot better about
>> being able to do the (sometimes fairly sizeable) pull on a Thursday,
>> knowing that if there is some last-minute issue, we can still fix just
>> *that* before the rc or final release.
>> 
>> And hey, that's literally just a "this was how we dealt with one
>> particular situation". Not everybody needs to have the same rules,
>> because the exact details will be different. I like doing releases on
>> Sundays, because that way the people who do a fairly normal Mon-Fri
>> week come in to a fresh release (whether rc or not). And people tend
>> to like sending in their "work of the week" to me on Fridays, so I get
>> a lot of pull requests on Friday, and most of the time that works just
>> fine.
>> 
>> So the networking tree timing policy ended up working quite well for
>> that, but there's no reason it should be "The Rule" and that everybody
>> should do it. But maybe it would lessen the stress on both sides for
>> bcachefs too if we aimed for that kind of thing?
> 
> Yeah, that sounds like the plan then.

Josef Bacik Oct. 7, 2024, 2:58 p.m. UTC | #20

On Sat, Oct 05, 2024 at 06:54:19PM -0400, Kent Overstreet wrote:
> On Sat, Oct 05, 2024 at 03:34:56PM GMT, Linus Torvalds wrote:
> > On Sat, 5 Oct 2024 at 11:35, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> > >
> > > Several more filesystems repaired, thank you to the users who have been
> > > providing testing. The snapshots + unlinked fixes on top of this are
> > > posted here:
> > 
> > I'm getting really fed up here Kent.
> > 
> > These have commit times from last night. Which makes me wonder how
> > much testing they got.
> 
> The /commit/ dates are from last night, because I polish up commit
> messages and reorder until the last might (I always push smaller fixes
> up front and fixes that are likely to need rework to the back).
> 
> The vast majority of those fixes are all ~2 weeks old.
> 
> > And before you start whining - again - about how you are fixing bugs,
> > let me remind you about the build failures you had on big-endian
> > machines because your patches had gotten ZERO testing outside your
> > tree.
> 
> No, there simply aren't that many people running big endian. I have
> users building and running my trees on a daily basis. If I push
> something broken before I go to bed I have bug reports waiting for me
> _the next morning_ when I wake up.
> 
> > That was just last week, and I'm getting the strong feeling that
> > absolutely nothing was learnt from the experience.
> > 
> > I have pulled this, but I searched for a couple of the commit messages
> > on the lists, and found *nothing* (ok, I found your pull request,
> > which obviously mentioned the first line of the commit messages).
> > 
> > I'm seriously thinking about just stopping pulling from you, because I
> > simply don't see you improving on your model. If you want to have an
> > experimental tree, you can damn well have one outside the mainline
> > kernel. I've told you before, and nothing seems to really make you
> > understand.
> 
> At this point, it's honestly debatable whether the experimental label
> should apply. I'm getting bug reports that talk about production use and
> working on metadata dumps where the superblock indicates the filesystem
> has been in continuous use for years.
> 
> And many, many people talking about how even at this relatively early
> point it doesn't fall over like btrfs does.
> 

I tend to ignore these kind of emails, it's been a decade and weirdly the file
system development community likes to use btrfs as a punching bag.  I honestly
don't care what anybody else thinks, but I've gotten feedback from others in the
community that they wish I'd say something when somebody says things so patently
false.  So I'm going to respond exactly once to this, and it'll be me satisfying
my quota for this kind of thing for the rest of the year.

Btrfs is used by default in the desktop spin of Fedora, openSuse, and maybe some
others.  Our development community is actively plugged into those places, we
drop everything to help when issues arise there.  Btrfs is the foundation of the
Meta fleet.  We rely on its capabilities and, most importantly of all, its
stability for our infrastructure.

Is it perfect?  Absolutely not.  You will never hear me say that.  I have often,
and publicly, said that Meta also uses XFS in our database workloads, because it
simply is just better than Btrfs at that.

Yes, XFS is better at Btrfs at some things.  I'm not afraid to admit that,
because my personal worth is not tied to the software projects I'm involved in.
Dave Chinner, Darrick Wong, Christoph Hellwig, Eric Sandeen, and many others have
done a fantastic job with XFS.  I have a lot of respect for them and the work
they've done.  I've learned a lot from them.

Ext4 is better at Btrfs in a lot of those same things.  Ted T'so, Andreas
Dilger, Jan Kara, and many others have done a fantastic job with ext4.

I have learned a lot from all of these developers, all of these file systems,
and many others in this community.

Bcachefs is doing new and interesting things.  There are many things that I see
you do that I wish we had the foresight to know were going to be a problem with
Btrfs and done it differently.  You, along with the wider file system community,
have a lot of the same ideals, same practices, and same desire to do your
absolute best work.  That is an admirable trait, one that we all share.

But dragging other people and their projects down is not the sort of behavior
that I think should have a place in this community.  This is not the kind of
community I want to exist in.  You are not the only person who does this, but
you are the most vocal and constant example of it.  Just like I tell my kids,
just because somebody else is doing something wrong doesn't mean you get to do
it too.

We can improve our own projects, we can collaborate, and we can support
each others work.  Christian and I tag-teamed the mount namespace work.  Amir
and I tag-teamed the Fanotify HSM work.  Those two projects are the most fun and
rewarding experiences I've had in the last few years.  This work is way more fun
when we can work together, and the relationships I've built in this community
through this collaboration around solving problems are my most cherished
professional relationships.

Or we can keep doing this, randomly throwing mud at each other, pissing each
other off, making ourselves into unhireable pariahs.  I've made my decision, and
honestly I think it's better.

But what the fuck do I know, I work on btrfs.  Thanks,

Josef

Jason A. Donenfeld Oct. 7, 2024, 3:01 p.m. UTC | #21

On Sun, Oct 06, 2024 at 03:29:51PM -0400, Kent Overstreet wrote:
> But - a big gap right now is endian /portability/, and that one is a
> pain to cover with automated tests because you either need access to
> both big and little endian hardware (at a minumm for creating test
> images), or you need to run qemu in full-emulation mode, which is pretty
> unbearably slow.

It's really not that bad, at least for my use cases:

    https://www.wireguard.com/build-status/

This thing sends pings to my cellphone too. You can poke around in
tools/testing/selftests/wireguard/qemu/ if you're curious. It's kinda
gnarly but has proven very very flexible to hack up for whatever
additional testing I need. For example, I've been using it for some of
my recent non-wireguard work here: https://git.zx2c4.com/linux-rng/commit/?h=jd/vdso-test-harness

Taking this straight-up probably won't fit for your filesystem work, but
maybe it can act as a bit of motivation that automated qemu'ing can
generally work. It has definitely caught a lot of silly bugs during
development time.

If for your cases, this winds up taking 3 days to run instead of the
minutes mine needs, so be it, that's a small workflow adjustment thing.
You might not get the same dopamine feedback loop of seeing your changes
in action and deployed to users _now_, but maybe delaying the
gratification a bit will be good anyway.

Jason

Martin Steigerwald Oct. 7, 2024, 3:13 p.m. UTC | #22

Kent Overstreet - 06.10.24, 19:18:00 MESZ:
> > I still do have a BCacheFS on my laptop for testing, but meanwhile I
> > wonder whether some of the crazy kernel regressions I have seen with
> > the last few kernels where exactly related to having mounted that
> > BCacheFS test filesystem. I am tempted to replace the BCacheFS with a
> > BTRFS just to find out.
> 
> I think you should be looking elsewhere - there have been zero reports
> of random crashes or anything like what you're describing. Even in
> syzbot testing we've been pretty free from the kind of memory safety
> issues that would cause random crashes

Okay.

From what I saw of the backtrace I am not sure it is a memory safety bug. 
It could be a deadlock thing with work queues. Anyway… as you can read 
below it is not BCacheFS related. But I understand too little about all of 
this to say for sure.

> The closest bugs to what you're describing would be the
> __wait_on_freeing_inode() deadlock in 6.12-rc1, and the LZ4HC crash that
> I've yet to triage - but you specifically have to be using lz4:15
> compression to hit that path.

Well a crash on reboot happened again, without BCacheFS. I wrote that I 
report back, either case.

I think I will wait whether this goes away with a newer kernel as some of 
the other regressions I saw before. It was not in all of the 6.11 series 
of Debian kernels but just in the most recent one. In case it doesn't I 
may open a kernel bug report with Debian directly.

For extra safety I did a memory test with memtest86+ 7.00. Zero errors.

As for one of the other regressions I cannot tell yet, whether they have 
gone away. So far they did not occur again.

But so far it looks that replacing BCacheFS with BTRFS does not make a 
difference. And I wanted to report that back.

Best,

Kent Overstreet Oct. 7, 2024, 7:59 p.m. UTC | #23

On Mon, Oct 07, 2024 at 05:01:55PM GMT, Jason A. Donenfeld wrote:
> On Sun, Oct 06, 2024 at 03:29:51PM -0400, Kent Overstreet wrote:
> > But - a big gap right now is endian /portability/, and that one is a
> > pain to cover with automated tests because you either need access to
> > both big and little endian hardware (at a minumm for creating test
> > images), or you need to run qemu in full-emulation mode, which is pretty
> > unbearably slow.
> 
> It's really not that bad, at least for my use cases:
> 
>     https://www.wireguard.com/build-status/
> 
> This thing sends pings to my cellphone too. You can poke around in
> tools/testing/selftests/wireguard/qemu/ if you're curious. It's kinda
> gnarly but has proven very very flexible to hack up for whatever
> additional testing I need. For example, I've been using it for some of
> my recent non-wireguard work here: https://git.zx2c4.com/linux-rng/commit/?h=jd/vdso-test-harness
> 
> Taking this straight-up probably won't fit for your filesystem work, but
> maybe it can act as a bit of motivation that automated qemu'ing can
> generally work. It has definitely caught a lot of silly bugs during
> development time.

I have all the qemu automation:
https://evilpiepirate.org/git/ktest.git/

That's what I use for normal interactive development, i.e. I run
something like

build-test-kernel -I ~/ktest/tests/fs/bcachefs/replication.ktest rereplicate

which builds a kernel, launches a VM and starts running a test; test
output on stdout, I can ssh in, ctrl-c kills it like any other test.

And those same tests are run automatically by my CI, which watches
various git branches and produces results here:
https://evilpiepirate.org/~testdashboard/ci?user=kmo&branch=bcachefs-testing

(Why yes, thas is a lot of failing tests still.)

I'm giving out accounts on this to anyone in the community doing kernel
development, we've got fstests wrappers for every local filesystem, plus
nfs, plus assorted other tests. Can always use more hardware if anyone
wants to provide more machines.

Kent Overstreet Oct. 7, 2024, 8:21 p.m. UTC | #24

On Mon, Oct 07, 2024 at 10:58:47AM GMT, Josef Bacik wrote:
> I tend to ignore these kind of emails, it's been a decade and weirdly the file
> system development community likes to use btrfs as a punching bag.  I honestly
> don't care what anybody else thinks, but I've gotten feedback from others in the
> community that they wish I'd say something when somebody says things so patently
> false.  So I'm going to respond exactly once to this, and it'll be me satisfying
> my quota for this kind of thing for the rest of the year.
> 
> Btrfs is used by default in the desktop spin of Fedora, openSuse, and maybe some
> others.  Our development community is actively plugged into those places, we
> drop everything to help when issues arise there.  Btrfs is the foundation of the
> Meta fleet.  We rely on its capabilities and, most importantly of all, its
> stability for our infrastructure.
> 
> Is it perfect?  Absolutely not.  You will never hear me say that.  I have often,
> and publicly, said that Meta also uses XFS in our database workloads, because it
> simply is just better than Btrfs at that.
> 
> Yes, XFS is better at Btrfs at some things.  I'm not afraid to admit that,
> because my personal worth is not tied to the software projects I'm involved in.
> Dave Chinner, Darrick Wong, Christoph Hellwig, Eric Sandeen, and many others have
> done a fantastic job with XFS.  I have a lot of respect for them and the work
> they've done.  I've learned a lot from them.
> 
> Ext4 is better at Btrfs in a lot of those same things.  Ted T'so, Andreas
> Dilger, Jan Kara, and many others have done a fantastic job with ext4.
> 
> I have learned a lot from all of these developers, all of these file systems,
> and many others in this community.
> 
> Bcachefs is doing new and interesting things.  There are many things that I see
> you do that I wish we had the foresight to know were going to be a problem with
> Btrfs and done it differently.  You, along with the wider file system community,
> have a lot of the same ideals, same practices, and same desire to do your
> absolute best work.  That is an admirable trait, one that we all share.
> 
> But dragging other people and their projects down is not the sort of behavior
> that I think should have a place in this community.  This is not the kind of
> community I want to exist in.  You are not the only person who does this, but
> you are the most vocal and constant example of it.  Just like I tell my kids,
> just because somebody else is doing something wrong doesn't mean you get to do
> it too.

Josef, I've got to be honest with you, if 10 years in one filesystem
still has a lot of user reports that clearly aren't being addressed
where the filesystem is wedging itself, that's a pretty epic fail and
that really is the main reason why I'm here.

#1 priority in filesystem land has to be robustness. Not features, not
performance; it has to simply work.

The bar for "acceptably good" is really, really high when you're
responsible for user's data. In the rest of the kernel, if you screw up,
generally the worst that happens is you crash the machine - users are
annoyed, whatever they were doing gets interrupted, but nothing
drastically bad happens.

In filesystem land, fairly minor screwups can lead to the entire machine
being down for extended periods of time if the filesystem has wedged
itself and really involved repair procedures that users _should not_
have to do, or worst real data loss. And you need to be thinking about
the trust that users are placing in you; that's people's _lives_ they're
storing on their machines.

So no, based on the feedback I still _regularly_ get I don't think btrfs
hit an acceptable level of reliability, and if it's taking this long I
doubt it will.

"Mostly works" is just not good enough.

To be fair, bcachefs isn't "good enough" yet either, I'm still getting
bug reports where bcachefs wedges itself too.

But I've also been pretty explicit about that, and I'm not taking the
experimental label off until those reports have stopped and we've
addressed _every_ known way it can wedge itself and we've torture tested
the absolute crap out of repair.

And I think you've set the bar too low, by just accepting that btrfs
isn't going to be as good as xfs in some situations.

I don't think there's any reason a modern COW filesystem has to be
crappier in _any_ respect than ext4/xfs. It's just a matter of
prioritizing the essentials and working at it until it's done.

Jason A. Donenfeld Oct. 7, 2024, 9:21 p.m. UTC | #25

On Mon, Oct 07, 2024 at 03:59:17PM -0400, Kent Overstreet wrote:
> On Mon, Oct 07, 2024 at 05:01:55PM GMT, Jason A. Donenfeld wrote:
> > On Sun, Oct 06, 2024 at 03:29:51PM -0400, Kent Overstreet wrote:
> > > But - a big gap right now is endian /portability/, and that one is a
> > > pain to cover with automated tests because you either need access to
> > > both big and little endian hardware (at a minumm for creating test
> > > images), or you need to run qemu in full-emulation mode, which is pretty
> > > unbearably slow.
> > 
> > It's really not that bad, at least for my use cases:
> > 
> >     https://www.wireguard.com/build-status/
> > 
> > This thing sends pings to my cellphone too. You can poke around in
> > tools/testing/selftests/wireguard/qemu/ if you're curious. It's kinda
> > gnarly but has proven very very flexible to hack up for whatever
> > additional testing I need. For example, I've been using it for some of
> > my recent non-wireguard work here: https://git.zx2c4.com/linux-rng/commit/?h=jd/vdso-test-harness
> > 
> > Taking this straight-up probably won't fit for your filesystem work, but
> > maybe it can act as a bit of motivation that automated qemu'ing can
> > generally work. It has definitely caught a lot of silly bugs during
> > development time.
> 
> I have all the qemu automation:
> https://evilpiepirate.org/git/ktest.git/

Neat. I suppose you can try to hook up all the other archs to run in TCG
there, and then you'll be able to test big endian and whatever other
weird issues crop up.

Jann Horn Oct. 7, 2024, 11:33 p.m. UTC | #26

On Sun, Oct 6, 2024 at 9:29 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
> On Sun, Oct 06, 2024 at 12:04:45PM GMT, Linus Torvalds wrote:
> > But build and boot testing? All those random configs, all those odd
> > architectures, and all those odd compilers *do* affect build testing.
> > So you as a filesystem maintainer should *not* generally strive to do
> > your own basic build test, but very much participate in the generic
> > build test that is being done by various bots (not just on linux-next,
> > but things like the 0day bot on various patch series posted to the
> > list etc).
> >
> > End result: one size does not fit all. But I get unhappy when I see
> > some subsystem that doesn't seem to participate in what I consider the
> > absolute bare minimum.
>
> So the big issue for me has been that with the -next/0day pipeline, I
> have no visibility into when it finishes; which means it has to go onto
> my mental stack of things to watch for and becomes yet another thing to
> pipeline, and the more I have to pipeline the more I lose track of
> things.

FWIW, my understanding is that linux-next is not just infrastructure
for CI bots. For example, there is also tooling based on -next that
doesn't have such a thing as "done with processing" - my understanding
is that syzkaller (https://syzkaller.appspot.com/upstream) has
instances that fuzz linux-next
("ci-upstream-linux-next-kasan-gce-root").

Theodore Ts'o Oct. 9, 2024, 3:51 a.m. UTC | #27

On Sun, Oct 06, 2024 at 12:33:51AM -0400, Kent Overstreet wrote:
> 
> Correct me if I'm wrong, but your system isn't available to the
> community, and I haven't seen a CI or dashboard for kdevops?

It's up on github for anyone to download, and I've provided pre-built
test appliance so people don't have to have downloaded xfstests and
all of its dependencies and build it from scratch.  (That's been
automated, of course, but the build infrastructure is setup to use a
Debian build chroot, and with the precompiled test appliances, you can
use my test runner on pretty much any Linux distribution; it will even
work on MacOS if you have qemu built from macports, although for now
you have to build the kernel on Linux distro using Parallels VM[1].)

I'll note that IMHO making testing resources available to the
community isn't really the bottleneck.  Using cloud resources,
especially if you spin up the VM's only when you need to run the
tests, and shut them down once the test is complete, which
gce-xfstests does, is actually quite cheap.  At retail prices, running
a dozen ext4 file system configurations against xfstests's "auto"
group will take about 24 hours of VM time, and including the cost of
the block devices, costs just under two dollars USD.  Because the
tests are run in parallel, the total wall clock time to run all of the
tests is about two and a half hours.  Running the "quick" group on a
single file system configuration costs pennies.  So the $300 of free
GCE credits will actually get someone pretty far!

No, the bottleneck is having someone knowledgeable enough to interpret
the test results and then finding the root cause of the failures.
This is one of the reasons why I haven't stressed all that much about
dashboards.  Dashboards are only useful if the right person(s) is
looking at them.  That's why I've been much more interested in making
it stupidly easy to run tests on someone's local resources, e.g.:

     https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md

In fact, for most people, the entry point that I envision as being
most interesting is that they download the kvm-xfstests, and following
the instructions in the quickstart, so they can run "kvm-xfstests
smoke" before sending me an ext4 patch.  Running the smoke test only
takes 15 minutes using qemu, and it's much more convenient for them to
run that on their local machine than to trigger the test on some
remote machine, whether it's in the cloud or someone's remote test
server.

In any case, that's why I haven't been interesting in working with
your test infrastructure; I have my own, and in my opinion, my
approach is the better one to make available to the community, and so
when I have time to improve it, I'd much rather work on
{kvm,gce,android}-xfstests.

Cheers,

						- Ted

[1] Figuring out how to coerce the MacOS toolchain to build the Linux
kernel would be cool if anyone ever figures it out.  However, I *have*
done kernel development using a Macbook Air M2 while on a cruise ship
with limited internet access, building the kernel using a Parallels VM
running Debian testing, and then using qemu from MacPorts to avoid the
double virtualization performance penalty to run xfstests to test the
freshly-built arm64 kernel, using my xfstests runner -- and all of
this is available on github for anyone to use.

Kent Overstreet Oct. 9, 2024, 4:17 a.m. UTC | #28

On Tue, Oct 08, 2024 at 10:51:39PM GMT, Theodore Ts'o wrote:
> On Sun, Oct 06, 2024 at 12:33:51AM -0400, Kent Overstreet wrote:
> > 
> > Correct me if I'm wrong, but your system isn't available to the
> > community, and I haven't seen a CI or dashboard for kdevops?
> 
> It's up on github for anyone to download, and I've provided pre-built
> test appliance so people don't have to have downloaded xfstests and
> all of its dependencies and build it from scratch.  (That's been
> automated, of course, but the build infrastructure is setup to use a
> Debian build chroot, and with the precompiled test appliances, you can
> use my test runner on pretty much any Linux distribution; it will even
> work on MacOS if you have qemu built from macports, although for now
> you have to build the kernel on Linux distro using Parallels VM[1].)

How many steps are required, start to finish, to test a git branch and
get the results?

Compare that to my setup, where I give you an account, we set up the
config file that lists tests to run and git branches to test, and then
results show up in the dashboard.

> I'll note that IMHO making testing resources available to the
> community isn't really the bottleneck.  Using cloud resources,
> especially if you spin up the VM's only when you need to run the
> tests, and shut them down once the test is complete, which
> gce-xfstests does, is actually quite cheap.  At retail prices, running
> a dozen ext4 file system configurations against xfstests's "auto"
> group will take about 24 hours of VM time, and including the cost of
> the block devices, costs just under two dollars USD.  Because the
> tests are run in parallel, the total wall clock time to run all of the
> tests is about two and a half hours.  Running the "quick" group on a
> single file system configuration costs pennies.  So the $300 of free
> GCE credits will actually get someone pretty far!

That's the same argument that I've been making - machine resources are
cheap these days.

And using bare metal machines significantly simplifies the backend
(watchdogs, catching full kernel and test output, etc.).

> No, the bottleneck is having someone knowledgeable enough to interpret
> the test results and then finding the root cause of the failures.
> This is one of the reasons why I haven't stressed all that much about
> dashboards.  Dashboards are only useful if the right person(s) is
> looking at them.  That's why I've been much more interested in making
> it stupidly easy to run tests on someone's local resources, e.g.:
> 
>      https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md

Yes, it needs to be trivial to run the same test locally that gets run
by the automated infrastructure, I've got that as well.

But dashboards are important, as well. And the git log based dashboard
I've got drastically reduces time spent manually bisecting.

> In fact, for most people, the entry point that I envision as being
> most interesting is that they download the kvm-xfstests, and following
> the instructions in the quickstart, so they can run "kvm-xfstests
> smoke" before sending me an ext4 patch.  Running the smoke test only
> takes 15 minutes using qemu, and it's much more convenient for them to
> run that on their local machine than to trigger the test on some
> remote machine, whether it's in the cloud or someone's remote test
> server.
> 
> In any case, that's why I haven't been interesting in working with
> your test infrastructure; I have my own, and in my opinion, my
> approach is the better one to make available to the community, and so
> when I have time to improve it, I'd much rather work on
> {kvm,gce,android}-xfstests.

Well, my setup also isn't tied to xfstests, and it's fairly trivial to
wrap all of our other (mm, block) tests.

But like I said before, I don't particularly care which one wins, as
long as we're pushing forward with something.

Theodore Ts'o Oct. 9, 2024, 5:54 p.m. UTC | #29

On Wed, Oct 09, 2024 at 12:17:35AM -0400, Kent Overstreet wrote:
> How many steps are required, start to finish, to test a git branch and
> get the results?

See the quickstart doc.  The TL;DR is (1) do the git clone, (2) "make
; make install" (this is just to set up the paths in the shell scripts
and then copying it to your ~/bin directory, so this takes a second or
so)", and then (3) "install-kconfig ; kbuild ; kvm-xfstests smoke" in
your kernel tree.

> But dashboards are important, as well. And the git log based dashboard
> I've got drastically reduces time spent manually bisecting.

gce-xfstests ltm -c ext4/1k generic/750 --repo ext4.git \
	     --bisect-bad dev --bisect-good origin

With automated bisecting, I don't have to spend any of my personal
time; I just wait for the results to show up in my inbox, without
needing to refer to any dashboards.  :-)

> > In any case, that's why I haven't been interesting in working with
> > your test infrastructure; I have my own, and in my opinion, my
> > approach is the better one to make available to the community, and so
> > when I have time to improve it, I'd much rather work on
> > {kvm,gce,android}-xfstests.
> 
> Well, my setup also isn't tied to xfstests, and it's fairly trivial to
> wrap all of our other (mm, block) tests.

Neither is mine; the name {kvm,gce,qemu,android}-xfstests is the same
for historical reasons.  I have blktests, ltp, stress-ng and the
Phoronix Test Suites wired up (although using comparing against
historical baselines with PTS is a bit manual at the moment).

> But like I said before, I don't particularly care which one wins, as
> long as we're pushing forward with something.

I'd say that in the file system development community there has been a
huge amount of interest in testing, because we all have a general
consensus that testing is support important[1].  Most of us decided
that the "There Can Be Only One" from the Highlander Movie is just not
happening, because everyone's test infrastructures is optimized for
their particular workflow, just as there's a really good reason why
there are 75+ file systems in Linux, and half-dozen or so very popular
general-purpose file systems.

And that's a good thing.

Cheers,

						- Ted

[1] https://docs.google.com/presentation/d/14MKWxzEDZ-JwNh0zNUvMbQa5ZyArZFdblTcF5fUa7Ss/edit#slide=id.g1635d98056_0_45

Daniel Gomez Oct. 10, 2024, 8:51 a.m. UTC | #30

On Wed Oct 9, 2024 at 5:51 AM CEST, Theodore Ts'o wrote:
> On Sun, Oct 06, 2024 at 12:33:51AM -0400, Kent Overstreet wrote:
>> 
>> Correct me if I'm wrong, but your system isn't available to the
>> community, and I haven't seen a CI or dashboard for kdevops?
>
> It's up on github for anyone to download, and I've provided pre-built
> test appliance so people don't have to have downloaded xfstests and
> all of its dependencies and build it from scratch.  (That's been
> automated, of course, but the build infrastructure is setup to use a
> Debian build chroot, and with the precompiled test appliances, you can
> use my test runner on pretty much any Linux distribution; it will even
> work on MacOS if you have qemu built from macports, although for now
> you have to build the kernel on Linux distro using Parallels VM[1].)
>
> I'll note that IMHO making testing resources available to the
> community isn't really the bottleneck.  Using cloud resources,
> especially if you spin up the VM's only when you need to run the
> tests, and shut them down once the test is complete, which
> gce-xfstests does, is actually quite cheap.  At retail prices, running
> a dozen ext4 file system configurations against xfstests's "auto"
> group will take about 24 hours of VM time, and including the cost of
> the block devices, costs just under two dollars USD.  Because the
> tests are run in parallel, the total wall clock time to run all of the
> tests is about two and a half hours.  Running the "quick" group on a
> single file system configuration costs pennies.  So the $300 of free
> GCE credits will actually get someone pretty far!
>
> No, the bottleneck is having someone knowledgeable enough to interpret
> the test results and then finding the root cause of the failures.
> This is one of the reasons why I haven't stressed all that much about
> dashboards.  Dashboards are only useful if the right person(s) is
> looking at them.  That's why I've been much more interested in making
> it stupidly easy to run tests on someone's local resources, e.g.:
>
>      https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md
>
> In fact, for most people, the entry point that I envision as being
> most interesting is that they download the kvm-xfstests, and following
> the instructions in the quickstart, so they can run "kvm-xfstests
> smoke" before sending me an ext4 patch.  Running the smoke test only
> takes 15 minutes using qemu, and it's much more convenient for them to
> run that on their local machine than to trigger the test on some
> remote machine, whether it's in the cloud or someone's remote test
> server.
>
> In any case, that's why I haven't been interesting in working with
> your test infrastructure; I have my own, and in my opinion, my
> approach is the better one to make available to the community, and so
> when I have time to improve it, I'd much rather work on
> {kvm,gce,android}-xfstests.
>
> Cheers,
>
> 						- Ted
>
>
> [1] Figuring out how to coerce the MacOS toolchain to build the Linux
> kernel would be cool if anyone ever figures it out.  However, I *have*

Building Linux for arm64 is now supported in macOS. You can find all patch
series discussions here [1]. In case you want to give this a try, here the
steps:

	```shell
	diskutil apfs addVolume /dev/disk<N> "Case-sensitive APFS" linux
	```
	
	```shell
	brew install coreutils findutils gnu-sed gnu-tar grep llvm make pkg-config
	```
	
	```shell
	brew tap bee-headers/bee-headers
	brew install bee-headers/bee-headers/bee-headers
	```
	
	Initialize the environment with `bee-init`. Repeat with every new shell:
	
	```shell
	source bee-init
	```
	
	```shell
	make LLVM=1 defconfig
	make LLVM=1 -j$(nproc)
	```
	
More details about the setup required can be found here [2].

This allows to build the kernel and boot it with QEMU -kernel argument. And
debug it with with lldb.

[1]
v3: https://lore.kernel.org/all/20240925-macos-build-support-v3-1-233dda880e60@samsung.com/
v2: https://lore.kernel.org/all/20240906-macos-build-support-v2-0-06beff418848@samsung.com/
v1: https://lore.kernel.org/all/20240807-macos-build-support-v1-0-4cd1ded85694@samsung.com/

[2] https://github.com/bee-headers/homebrew-bee-headers/blob/main/README.md

Daniel

> done kernel development using a Macbook Air M2 while on a cruise ship
> with limited internet access, building the kernel using a Parallels VM
> running Debian testing, and then using qemu from MacPorts to avoid the
> double virtualization performance penalty to run xfstests to test the
> freshly-built arm64 kernel, using my xfstests runner -- and all of
> this is available on github for anyone to use.

[GIT,PULL] bcachefs fixes for 6.12-rc2

Pull-request

Message

Comments