Message ID | sctzes5z3s2zoadzldrpw3yfycauc4kpcsbpidjkrew5hkz7yf@eejp6nunfpin (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] bcachefs fixes for 6.11-rc5 | expand |
On Sat, 24 Aug 2024 at 02:54, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > Hi Linus, big one this time... Yeah, no, enough is enough. The last pull was already big. This is too big, it touches non-bcachefs stuff, and it's not even remotely some kind of regression. At some point "fix something" just turns into development, and this is that point. Nobody sane uses bcachefs and expects it to be stable, so every single user is an experimental site. The bcachefs patches have become these kinds of "lots of development during the release cycles rather than before it", to the point where I'm starting to regret merging bcachefs. If bcachefs can't work sanely within the normal upstream kernel release schedule, maybe it shouldn't *be* in the normal upstream kernel. This is getting beyond ridiculous. Linus
On Sat, Aug 24, 2024 at 09:23:00AM GMT, Linus Torvalds wrote: > On Sat, 24 Aug 2024 at 02:54, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > Hi Linus, big one this time... > > Yeah, no, enough is enough. The last pull was already big. > > This is too big, it touches non-bcachefs stuff, and it's not even > remotely some kind of regression. > > At some point "fix something" just turns into development, and this is > that point. > > Nobody sane uses bcachefs and expects it to be stable, so every single > user is an experimental site. Eh? Universal consensus has been that bcachefs is _definitely_ more trustworthy than brtfs, in terms of "will this filesystem ever go unrecoverable or lose my data" - I've seen many reports of people who've put it through the same situations where btrfs falls. I've ever seen people compare bcachefs's robustness in positive terms vs. /xfs/; and that's the result of a *hell* of a lot of work with the #1 goal of having a robust filesystem that _never_ loses data. Syzbot dashboard bears this out as well, bcachefs is starting to look better than btrfs there as well... (Peanut gallery: Please don't rush out and switch to bcachefs just yet. I still have a backlog of bugs and issues - some of them serious, as in your filessystem will go emergency read only - and I don't want people getting bit. There's still a ton to do; I'm not taking EXPERIMENTAL off until at least the fuzz testing for on disk corruption is in play). Look, I've been doing this for a long time, I've had people running my code in production for a long time, and I'm working with my users on a daily basis to address issues. I don't throw code over the wall; I do everything I can to support it and make sure it's working well. And - the "srcu held for 10+s warnings" really were bad, there are going to be a long tail of those that need to be fixed - to get to the rest, we need the primary causes fixed first. And when I ship code, I'm _always_ weighing "how much do we want this" vs. "risk of regression/risk in general" - I'm not just throwing out whatever I feel like. Look, this is the filesystem you're all going to want to be running in - knock on wood - just a year or two, because I'm working to to make it more robust and reliable than xfs and ext4 (and yes, it will be) with _end to end data integrity_. We need this. there's still tons of people with "btrfs just crapped itself and now I'm fucked" horror stories, and running a non checksumming filesystem is like buying non ECC ram. I've got users with 100+ TB filesystems who trust my code, and I haven't lost anyone's filesystem who was patient and willing to work with me. But I've got to get this done, and right now that does mean moving fast and grinding through a lot of issues. (again for the peanut gallery: _please_ do not rush to install it yet unless you are willing and able to report issues, I'll say when the bugs have been worked through and the hardening is done).
On Sat, 24 Aug 2024 at 10:14, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Sat, Aug 24, 2024 at 09:23:00AM GMT, Linus Torvalds wrote: > > On Sat, 24 Aug 2024 at 02:54, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > Hi Linus, big one this time... > > > > Yeah, no, enough is enough. The last pull was already big. > > > > This is too big, it touches non-bcachefs stuff, and it's not even > > remotely some kind of regression. > > > > At some point "fix something" just turns into development, and this is > > that point. > > > > Nobody sane uses bcachefs and expects it to be stable, so every single > > user is an experimental site. > > Eh? > > Universal consensus has been that bcachefs is _definitely_ more > trustworthy than brtfs, I'll believe that when there are major distros that use it and you have lots of varied use. But it doesn't even change the issue: you aren't fixing a regression, you are doing new development to fix some old probl;em, and now you are literally editing non-bcachefs files too. Enough is enough. Linus
On Sat, Aug 24, 2024 at 10:25:02AM GMT, Linus Torvalds wrote: > On Sat, 24 Aug 2024 at 10:14, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > On Sat, Aug 24, 2024 at 09:23:00AM GMT, Linus Torvalds wrote: > > > On Sat, 24 Aug 2024 at 02:54, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > > > Hi Linus, big one this time... > > > > > > Yeah, no, enough is enough. The last pull was already big. > > > > > > This is too big, it touches non-bcachefs stuff, and it's not even > > > remotely some kind of regression. > > > > > > At some point "fix something" just turns into development, and this is > > > that point. > > > > > > Nobody sane uses bcachefs and expects it to be stable, so every single > > > user is an experimental site. > > > > Eh? > > > > Universal consensus has been that bcachefs is _definitely_ more > > trustworthy than brtfs, > > I'll believe that when there are major distros that use it and you > have lots of varied use. Oh, I'm waiting for that hammer to drop too. But: all the data we've got so far is that it really is shaping up to be that solid, there's clearly been big upticks in users as it went upstream, as distros have been rolling it out, and the uptick in bug reports hasn't been there. > But it doesn't even change the issue: you aren't fixing a regression, > you are doing new development to fix some old probl;em, and now you > are literally editing non-bcachefs files too. What is to be gained by holding back fixes, if we've got every reason to believe that the fixes are solid? And yes, these _are_ solid, the rhashtable stuff was done months ago (minus the deadlock fix, that's more recent), and the rcu_pending stuff was mostly done months ago as well, and _heavily_ tested (including using it as replacement backend for kvfree_rcu, which is the eventual goal there). And the genradix code is code that I also wrote and maintain, and those are simple patches.
On Sat, 24 Aug 2024 at 10:33, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > What is to be gained by holding back fixes, if we've got every reason to > believe that the fixes are solid? What is to be gained by having release rules and a stable development environment? I wonder. Linus
On Sat, 24 Aug 2024 at 10:35, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > What is to be gained by having release rules and a stable development > environment? I wonder. But seriously - thinking that "I changed a thousand lines, there's no way that introduces new bugs" is the kind of thinking that I DO NOT WANT TO HEAR from a maintainer. What planet ARE you from? Stop being obtuse. Linus
On Sat, Aug 24, 2024 at 10:35:38AM GMT, Linus Torvalds wrote: > On Sat, 24 Aug 2024 at 10:33, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > What is to be gained by holding back fixes, if we've got every reason to > > believe that the fixes are solid? > > What is to be gained by having release rules and a stable development > environment? I wonder. Sure, which is why I'm not sending you anything here that isn't a fix for a real issue. (Ok, technically a few of those, the "missing trans_relock()" fixes are theoretical, but if they are real then they're bad).
On Sat, 24 Aug 2024 at 10:48, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > Sure, which is why I'm not sending you anything here that isn't a fix > for a real issue. Kent, bugs happen. The number of bugs that happen in "bug fixes" is in fact quite high. You should see the stable tree discussions when people get heated about the regressions introduced by fixes. This is, for example, why stable has the rule of fixes being small (which does get violated, but it is at least a goal: "It cannot be bigger than 100 lines, with context"), because small fixes are easier to think about and hopefully they have fewer problems of their own. It's also why my "development happens before the merge window" rule exists. If you have to do development to fix an old problem, it's for the next merge window. Exactly because new bugs happen. We want _stability_. The fixes after the merge window are supposed to be fixes for regressions, not "oh, I noticed a long-standing problem, and now I'm fixing that". But obviously the same kind of logic as for stable trees apply: if it's a small obvious fix that would be stable material *anyway*, then there is no reason to wait for the next release and then just put it in the stable pile. So I do end up taking small fixes, because at that point it is indeed a "it wouldn't help to wait" situation. But your pull requests haven't been "small fixes". And I admit, I've let it slide. You never saw the last pull request, when I sighed, did a "git fetch", and went through every commit just to see. And then did the pull for real. This time I did the same. And came to the conclusion that no, this was not a series of small fixes any more. Linus
On Sat, Aug 24, 2024 at 10:40:33AM GMT, Linus Torvalds wrote: > On Sat, 24 Aug 2024 at 10:35, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > What is to be gained by having release rules and a stable development > > environment? I wonder. > > But seriously - thinking that "I changed a thousand lines, there's no > way that introduces new bugs" is the kind of thinking that I DO NOT > WANT TO HEAR from a maintainer. > > What planet ARE you from? Stop being obtuse. Heh. No, I can't write 1000 lines of bug free code (I think when I was younger I pulled it off a few times...). But I do have really good automated testing (I put everything through lockdep, kasan, ubsan, and other variants now), and a bunch of testers willing to run my git branches on their crazy (and huge) filesystems. And enough experience to know when code is likely to be solid and when I should hold back on it. Are you seeing a ton of crazy last minute fixes for regressions in my pull requests? No, there's a few fixes for recent regressions here and there, but nothing that would cause major regrets. The worst in terms of needing last minute fixes was the member info btree bitmap stuff, and the superblock downgrade section... but those we did legitimately need.
On Sat, Aug 24, 2024 at 10:57:55AM GMT, Linus Torvalds wrote: > On Sat, 24 Aug 2024 at 10:48, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > Sure, which is why I'm not sending you anything here that isn't a fix > > for a real issue. > > Kent, bugs happen. I _know_. Look, filesystem development is as high stakes as it gets. Normal kernel development, you fuck up - you crash the machine, you lose some work, you reboot, people are annoyed but generally it's ok. In filesystem land, you can corrupt data and not find out about it until weeks later, or _worse_. I've got stories to give people literal nightmares. Hell, that stuff has fueled my own nightmares for years. You know how much grey my beard has now? Which is why I have spent many years of my life building a codebase and development process where I can work productively where I can not just catch but recover from pretty much any fuckup imaginable. Because peace of mind is priceless...
Kent, I'm not a kernel developer I'm just a user that is impressed with bcachefs, uses it on his personal systems, and eagerly waits for new features. I am one of the users who's been using bcachefs for years and has never lost any data using it. However I am going to be blunt: as someone who designs and builds Linux-based storage servers (well, I used to) as part of their job I would never, ever consider using bcachefs professionally as it is now and the way it appears to be developed currently. It is simply too much changed too fast without any separation between what is currently stable and working for customers and new development. Your work is excellent but **process** is equally and sometimes even more important. Some of the other hats I've worn professionally include as a lead C/C++ developer and as a product release manager so I've learned from very painful experience that large projects absolutely **must** have strict rules for process. I'm sure you realize that. Linus is not being a jerk about this. Just a couple of months ago Linus had to tell you the exact same thing he's telling you again here. And that wasn't the first time. Is your plan to just continue to break the rules and do whatever the heck you want until Linus stops bothering you? I don't think that's a good plan. Since I'm already being blunt I'm going to be even more blunt: you have a serious problem working with others. In the past and in this thread I've read where you seem to imply that other kernel developers are gatekeeping and resist some of your ideas because you've created something that (in your opinion) is already better in some ways than some of things they've created. But from where I'm sitting the problems you've experienced are 90% because of **you**. You're an adult and you need to understand that about yourself so you can do something about it. I get that I've way overstepped my bounds here. If the kernel developers wish to ban me from the kernel lists I understand. Carl > On 2024-08-23 7:59 PM PDT Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > On Sat, Aug 24, 2024 at 10:40:33AM GMT, Linus Torvalds wrote: > > On Sat, 24 Aug 2024 at 10:35, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > What is to be gained by having release rules and a stable development > > > environment? I wonder. > > > > But seriously - thinking that "I changed a thousand lines, there's no > > way that introduces new bugs" is the kind of thinking that I DO NOT > > WANT TO HEAR from a maintainer. > > > > What planet ARE you from? Stop being obtuse. > > Heh. > > No, I can't write 1000 lines of bug free code (I think when I was > younger I pulled it off a few times...). > > But I do have really good automated testing (I put everything through > lockdep, kasan, ubsan, and other variants now), and a bunch of testers > willing to run my git branches on their crazy (and huge) filesystems. > > And enough experience to know when code is likely to be solid and when I > should hold back on it. > > Are you seeing a ton of crazy last minute fixes for regressions in my > pull requests? No, there's a few fixes for recent regressions here and > there, but nothing that would cause major regrets. The worst in terms of > needing last minute fixes was the member info btree bitmap stuff, and > the superblock downgrade section... but those we did legitimately need.
On Fri, Aug 23, 2024 at 09:22:55PM GMT, Carl E. Thompson wrote: > Kent, I'm not a kernel developer I'm just a user that is impressed with bcachefs, uses it on his personal systems, and eagerly waits for new features. I am one of the users who's been using bcachefs for years and has never lost any data using it. > > However I am going to be blunt: as someone who designs and builds Linux-based storage servers (well, I used to) as part of their job I would never, ever consider using bcachefs professionally as it is now and the way it appears to be developed currently. It is simply too much changed too fast without any separation between what is currently stable and working for customers and new development. Your work is excellent but **process** is equally and sometimes even more important. Some of the other hats I've worn professionally include as a lead C/C++ developer and as a product release manager so I've learned from very painful experience that large projects absolutely **must** have strict rules for process. I'm sure you realize that. Linus is not being a jerk about this. Just a couple of months ago Linus had to tell you the exact same thing he's telling you again here. And that wasn't the first time. Is your plan to just continue to break the rules and do whatever the heck you want until You guys are freaked out because I'm moving quickly and you don't have visibility into my own internal process, that's all. I've got a test clusture, a community testing my code before I send it to Linus, and a codebase that I own and know like the back of my hand that's stuffed with assertions. And, the changes in question are algorithmically fairly simple and things that I have excellent test coverage for. These are all factors that let me say, with confidence, that there really aren't any bugs in this this pull request. Look, there will always be a natural tension between "strict rules and processes" vs. "weighing the situations and using your judgement". There isn't a right or wrong answer as to where on the spectrum we should be, we just all have to use our brains. No one is being jerks here, Linus and I are just sitting in different places with different perspectives. He has a resonsibility as someone managing a huge project to enforce rules as he sees best, while I have a responsibility to support users with working code, and to do that to the best of my abilities.