Message ID | 20240827061543.1235703-1-mhocko@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2,v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM | expand |
Thanks for fixing this up!
Reviewed-by: Christoph Hellwig <hch@lst.de>
On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new > inode to achieve GFP_NOWAIT semantic while holding locks. If this > allocation fails it will drop locks and use GFP_NOFS allocation context. > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really > dangerous to use if the caller doesn't control the full call chain with > this flag set. E.g. if any of the function down the chain needed > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and > cause unexpected failure. > > While this is not the case in this particular case using the scoped gfp > semantic is not really needed bacause we can easily pus the allocation > context down the chain without too much clutter. > > Acked-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Michal Hocko <mhocko@suse.com> Looks good to me. Reviewed-by: Dave Chinner <dchinner@redhat.com>
On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote: > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new > > inode to achieve GFP_NOWAIT semantic while holding locks. If this > > allocation fails it will drop locks and use GFP_NOFS allocation context. > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really > > dangerous to use if the caller doesn't control the full call chain with > > this flag set. E.g. if any of the function down the chain needed > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and > > cause unexpected failure. > > > > While this is not the case in this particular case using the scoped gfp > > semantic is not really needed bacause we can easily pus the allocation > > context down the chain without too much clutter. > > > > Acked-by: Christoph Hellwig <hch@lst.de> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > Looks good to me. > > Reviewed-by: Dave Chinner <dchinner@redhat.com> Reposting what I wrote in the other thread: This series is based on a fundamental misunderstanding of what a safe API is: a safe API is not one that doesn't return errors, it's one that never invokes undefined behaviour. It was decided years ago that the scoped APIs were the better replacement for the gfp APIs, and they need to exist. This "GFP_NOFAIL exists therefore we can't tell the memory allocator about situations wehre it would have to fail" is utter insanity - it's the exact opposite of defining a safe API. A safe API would be one where we /did/ always tell the memory allocator when we're in non-sleepable context, and the allocator always returned failure instead of context switching. This is utter brain damage; rule #1 of kernel programming is that _you check for errors_. If you don't know that your GFP_NOFAIL usage is in a safe context (and that's not just memory reclaim context, it's also the size of your alloction) then you have to check for errors.
On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote: > On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote: > > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote: > > > From: Michal Hocko <mhocko@suse.com> > > > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this > > > allocation fails it will drop locks and use GFP_NOFS allocation context. > > > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really > > > dangerous to use if the caller doesn't control the full call chain with > > > this flag set. E.g. if any of the function down the chain needed > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and > > > cause unexpected failure. > > > > > > While this is not the case in this particular case using the scoped gfp > > > semantic is not really needed bacause we can easily pus the allocation > > > context down the chain without too much clutter. > > > > > > Acked-by: Christoph Hellwig <hch@lst.de> > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > Looks good to me. > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > Reposting what I wrote in the other thread: I've read the thread. I've heard what you have had to say. Like several other people, I think your position is just not practical or reasonable. I don't care about the purity or the safety of the API - the practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL allocation can now fail and that will cause unexpected kernel crashes. Keeping existing code and API semantics working correctly (i.e. regression free) takes precedence over new functionality or API features that people want to introduce. That's all there is to it. This is not a hill you need to die on. -Dave.
On Thu, Aug 29, 2024 at 11:12:18PM GMT, Dave Chinner wrote: > On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote: > > On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote: > > > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote: > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new > > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this > > > > allocation fails it will drop locks and use GFP_NOFS allocation context. > > > > > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really > > > > dangerous to use if the caller doesn't control the full call chain with > > > > this flag set. E.g. if any of the function down the chain needed > > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and > > > > cause unexpected failure. > > > > > > > > While this is not the case in this particular case using the scoped gfp > > > > semantic is not really needed bacause we can easily pus the allocation > > > > context down the chain without too much clutter. > > > > > > > > Acked-by: Christoph Hellwig <hch@lst.de> > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > > > Looks good to me. > > > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > > > Reposting what I wrote in the other thread: > > I've read the thread. I've heard what you have had to say. Like > several other people, I think your position is just not practical or > reasonable. > > I don't care about the purity or the safety of the API - the > practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL > allocation can now fail and that will cause unexpected kernel > crashes. Keeping existing code and API semantics working correctly > (i.e. regression free) takes precedence over new functionality or > API features that people want to introduce. > > That's all there is to it. This is not a hill you need to die on. If you use GFP_NOFAIL in a context where you're not allowed to sleep, that's a bug, same as any other bug where you get the gfp flags wrong (e.g. GFP_KERNEL in non sleepable context). This isn't going to affect you unless you start going around inserting PF_MEMALLOC_NORECLAIM where it doesn't need to be. Why would you do that? But the lack of gfp flags for pte allocation means that this actually is a serious gap we need to be fixing.
On Thu, Aug 29, 2024 at 11:12:18PM GMT, Dave Chinner wrote: > On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote: > > On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote: > > > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote: > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new > > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this > > > > allocation fails it will drop locks and use GFP_NOFS allocation context. > > > > > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really > > > > dangerous to use if the caller doesn't control the full call chain with > > > > this flag set. E.g. if any of the function down the chain needed > > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and > > > > cause unexpected failure. > > > > > > > > While this is not the case in this particular case using the scoped gfp > > > > semantic is not really needed bacause we can easily pus the allocation > > > > context down the chain without too much clutter. > > > > > > > > Acked-by: Christoph Hellwig <hch@lst.de> > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > > > Looks good to me. > > > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > > > Reposting what I wrote in the other thread: > > I've read the thread. I've heard what you have had to say. Like > several other people, I think your position is just not practical or > reasonable. > > I don't care about the purity or the safety of the API - the > practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL > allocation can now fail and that will cause unexpected kernel > crashes. Keeping existing code and API semantics working correctly > (i.e. regression free) takes precedence over new functionality or > API features that people want to introduce. > > That's all there is to it. This is not a hill you need to die on. And more than that, this is coming from you saying "We didn't have to handle memory allocation failures in IRIX, why can't we be like IRIX? All those error paths are a pain to test, why can't we get rid of them?" Except that's bullshit; at the very least any dynamically sized allocation _definitely_ has to have an error path that's tested, and if there's questions about the context a code path might run in, that that's another reason. GFP_NOFAIL is the problem here, and if it's encouraging this brain damaged "why can't we just get rid of error paths?" thinking, then it should be removed. Error paths have to exist, and they have to be tested.
On Thu, Aug 29, 2024 at 9:32 PM Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Thu, Aug 29, 2024 at 11:12:18PM GMT, Dave Chinner wrote: > > On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote: > > > On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote: > > > > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote: > > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new > > > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this > > > > > allocation fails it will drop locks and use GFP_NOFS allocation context. > > > > > > > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really > > > > > dangerous to use if the caller doesn't control the full call chain with > > > > > this flag set. E.g. if any of the function down the chain needed > > > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and > > > > > cause unexpected failure. > > > > > > > > > > While this is not the case in this particular case using the scoped gfp > > > > > semantic is not really needed bacause we can easily pus the allocation > > > > > context down the chain without too much clutter. > > > > > > > > > > Acked-by: Christoph Hellwig <hch@lst.de> > > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > > > > > Looks good to me. > > > > > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > > > > > Reposting what I wrote in the other thread: > > > > I've read the thread. I've heard what you have had to say. Like > > several other people, I think your position is just not practical or > > reasonable. > > > > I don't care about the purity or the safety of the API - the > > practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL > > allocation can now fail and that will cause unexpected kernel > > crashes. Keeping existing code and API semantics working correctly > > (i.e. regression free) takes precedence over new functionality or > > API features that people want to introduce. > > > > That's all there is to it. This is not a hill you need to die on. > > And more than that, this is coming from you saying "We didn't have to > handle memory allocation failures in IRIX, why can't we be like IRIX? > All those error paths are a pain to test, why can't we get rid of them?" > > Except that's bullshit; at the very least any dynamically sized > allocation _definitely_ has to have an error path that's tested, and if > there's questions about the context a code path might run in, that > that's another reason. > > GFP_NOFAIL is the problem here, and if it's encouraging this brain > damaged "why can't we just get rid of error paths?" thinking, then it > should be removed. > > Error paths have to exist, and they have to be tested. I completely agree. Adding a dead loop in the core of the page allocator just to bypass error handling is a reckless idea.
On Thu, Aug 29, 2024 at 09:32:52AM -0400, Kent Overstreet wrote: > On Thu, Aug 29, 2024 at 11:12:18PM GMT, Dave Chinner wrote: > > On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote: > > > On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote: > > > > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko > > > > wrote: > > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to > > > > > allocate a new inode to achieve GFP_NOWAIT semantic while > > > > > holding locks. If this allocation fails it will drop locks > > > > > and use GFP_NOFS allocation context. > > > > > > > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is > > > > > really dangerous to use if the caller doesn't control the > > > > > full call chain with this flag set. E.g. if any of the > > > > > function down the chain needed GFP_NOFAIL request the > > > > > PF_MEMALLOC_NORECLAIM would override this and cause > > > > > unexpected failure. > > > > > > > > > > While this is not the case in this particular case using > > > > > the scoped gfp semantic is not really needed bacause we > > > > > can easily pus the allocation context down the chain > > > > > without too much clutter. > > > > > > > > > > Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: > > > > > Michal Hocko <mhocko@suse.com> > > > > > > > > Looks good to me. > > > > > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > > > > > Reposting what I wrote in the other thread: > > > > I've read the thread. I've heard what you have had to say. Like > > several other people, I think your position is just not > > practical or reasonable. > > > > I don't care about the purity or the safety of the API - the > > practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL > > allocation can now fail and that will cause unexpected kernel > > crashes. Keeping existing code and API semantics working > > correctly (i.e. regression free) takes precedence over new > > functionality or API features that people want to introduce. > > > > That's all there is to it. This is not a hill you need to die > > on. > > And more than that, this is coming from you saying "We didn't have > to handle memory allocation failures in IRIX, why can't we be like > IRIX? All those error paths are a pain to test, why can't we get > rid of them?" > You're not listening, Kent. We are not eliding error paths because they aren't (or cannot be) tested. It's a choice between whether a transient error (e.g. ENOMEM) should be considered a fatal error or not. The architectural choice that was made for XFS back in the 1990s was that the filesystem should never fail when transient errors occur. The choice was to wait for the transient error to go away and then continue on. The rest of the filesystem was build around these fundamental behavioural choices. This goes beyond memory allocation - we do it for IO errors, too. e.g. metadata writeback keeps trying to write back the metadata repeatedly on -EIO. On every EIO from a metadata write, we will immediately attempt a rewrite without a backoff. If that rewrite then fails, wei requeue the write for later resubmission. That means we back off and wait for up to 30s before attempting the next rewrite. Hence -EIO on async metadata writeback won't fail/shutdown the filesystem until a (configurable) number of repeated failures occurs or the filesystem unmounts before the metadata could be written back successfully. There's good reason for this "wait for transients to resolve" method of error handling - go back to the late 1990s and early 2000s and high-end multi-path FC SAN based storage was well known to have transient path failures that can take minutes to resolve before a secondary path takes over. That was the sort of storage environment XFS was designed to operate in, and those users expected the filesystem to be extremely tolerant of transient failure conditions. Hence failing an IO and shutting down the filesystem because there are transient errors occuring in either the storage or the OS was absolutely the wrong thing to be doing. It still is the wrong thing to be doing - we want to wait until the transient error has progressed to being classified as a permanent error before we take drastic action like denying service to the filesystem. Memory allocation failure has always been considered a transient error by XFS that the OS will resolve in one way or another in a realtively short period of time. If we're prepared to wait minutes for IO path failures to resolve, waiting a couple of seconds for transient low memory situations to resolve isn't a big deal. Ranting about how we handle errors without understanding the error model we are working within is not productive. bcachefs has a different error handling model to almost every other filesystem out there, but that doesn't mean every other filesystem must work the same way that bcachefs does. If changing this transient error handling model was as simple as detecting an allocation failure and returning -ENOMEM, we would have done that 20 years ago. But it isn't - the error handling model is "block until transients resolve" so that the error handling paths only need to handle fatal errors. Therein lies the problem - those error handling paths need to be substantially changed to be able to handle transient errors such as ENOMEM. We'd need to either be able to back out of a dirty transaction or restart the transaction in some way rather than shutting down the filesystem. Put simply: reclassifying ENOMEM from a "wait for transient to resolve" handler to a "back out and restart" mechanism like bcachefs uses requires re-architecting the entire log item architecture for metadata modification tracking and journal space management. Even if I knew how to implement this right now, it would require years worth of engineering effort and resources before it would be completed and ready for merge. Then it will take years more for all the existing kernels to cycle out of production. Further: this "ENOMEM is transient so retry" model has been used without any significant issues in production systems for mission critical infrastructure for the past 25+ years. There's a very strong "if it ain't broke, don't fix it" argument to be made here. The cost-benefit analysis comes out very strongly on the side of keeping __GFP_NOFAIL semantics as they currently stand. > Except that's bullshit; at the very least any dynamically sized > allocation _definitely_ has to have an error path that's tested, and if > there's questions about the context a code path might run in, that > that's another reason. We know the context we run __GFP_NOFAIL allocations in - transaction context absolutely requires a task context because we take sleeping locks, submit and wait on IO, do blocking memory allocation, etc. We also know the size of the allocations because we've bounds checked everything before we do an allocation. Hence this argument of "__GFP_NOFAIL aboslutely requires error checking because an invalid size or wonrg context might be used" is completely irrelevant to XFS. If you call into the filesytsem from an atomic context, you've lost long before we get to memory allocation because filesystems take sleeping locks.... > GFP_NOFAIL is the problem here, and if it's encouraging this brain > damaged "why can't we just get rid of error paths?" thinking, then it > should be removed. > > Error paths have to exist, and they have to be tested. __GFP_NOFAIL is *not the problem*, and we are not "avoiding error handling". Backing off, looping and trying again is a valid mechanism for handling transient failure conditions. Having a flag that tells the allocator to "backoff, loop and try again" is a perfectly good way of providing a generic error handling mechanism. IOWs, Using __GFP_NOFAIL doesn't mean we are "not handling errors"; it simply means we have moved the error handling we were doing inside the allocator. And yes, we test the hell out of this error handling path.... -Dave.
On Mon, Sep 02, 2024 at 10:23:06AM GMT, Dave Chinner wrote: > On Thu, Aug 29, 2024 at 09:32:52AM -0400, Kent Overstreet wrote: > > And more than that, this is coming from you saying "We didn't have > > to handle memory allocation failures in IRIX, why can't we be like > > IRIX? All those error paths are a pain to test, why can't we get > > rid of them?" > > > You're not listening, Kent. We are not eliding error paths because > they aren't (or cannot be) tested. > > It's a choice between whether a transient error (e.g. ENOMEM) should > be considered a fatal error or not. The architectural choice that > was made for XFS back in the 1990s was that the filesystem should > never fail when transient errors occur. The choice was to wait for > the transient error to go away and then continue on. The rest of the > filesystem was build around these fundamental behavioural choices. <snipped> > Hence failing an IO and shutting down the filesystem because there > are transient errors occuring in either the storage or the OS was > absolutely the wrong thing to be doing. It still is the wrong thing > to be doing - we want to wait until the transient error has > progressed to being classified as a permanent error before we take > drastic action like denying service to the filesystem. Two different things are getting conflated here. I'm saying forcefully that __GFP_NOFAIL isn't a good design decision, and it's bad for system behaviour when we're thrashing, because you, willy and others have been talking openly about making something like that the norm, and that concerns me. I'm _not_ saying that you should have to rearchitect your codebase to deal with transient -ENOMEMs... that's clearly not going to happen. But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL in the case of bugs, because that's going to be an improvement w.r.t. system robustness, in exactly the same way we don't use BUG_ON() if it's something that we can't guarantee won't happen in the wild - we WARN() and try to handle the error as best we can. If we could someday get PF_MEMALLOC_NORECLAIM annotated everywhere we're not in a sleepable context (or more likely, teach kmalloc() about preempt count) - that turns a whole class of "scheduling while atomic" bugs into recoverable errors. That'd be a good thing. In XFS's case, the only thing you'd ever want to do on failure of a __GFP_NOFAIL allocation is do an emergency shutdown - the code is buggy, and those bugs tend to be quickly found and fixed (even quicker when the system stays usable and the user can collect a backtrace). > > Except that's bullshit; at the very least any dynamically sized > > allocation _definitely_ has to have an error path that's tested, and if > > there's questions about the context a code path might run in, that > > that's another reason. > > We know the context we run __GFP_NOFAIL allocations in - transaction > context absolutely requires a task context because we take sleeping > locks, submit and wait on IO, do blocking memory allocation, etc. We > also know the size of the allocations because we've bounds checked > everything before we do an allocation. *nod* > Hence this argument of "__GFP_NOFAIL aboslutely requires error > checking because an invalid size or wonrg context might be used" > is completely irrelevant to XFS. If you call into the filesytsem > from an atomic context, you've lost long before we get to memory > allocation because filesystems take sleeping locks.... Yeah, if you know the context of your allocation and you have bounds on the allocation size that's all fine - that's your call. _As a general policy_, I will say that even __GFP_NOFAIL allocations should have error paths - becuase lots of allocations have sizes that userspace can control, so it becomes a security issue and we need to be careful what we tell people. > > GFP_NOFAIL is the problem here, and if it's encouraging this brain > > damaged "why can't we just get rid of error paths?" thinking, then it > > should be removed. > > > > Error paths have to exist, and they have to be tested. > > __GFP_NOFAIL is *not the problem*, and we are not "avoiding error > handling". Backing off, looping and trying again is a valid > mechanism for handling transient failure conditions. Having a flag > that tells the allocator to "backoff, loop and try again" is a > perfectly good way of providing a generic error handling mechanism. But we do have to differentiate between transient allocation failures and failures that are a result of bugs. Because just looping means a buggy allocation call becomes, at best, a hung task you can't kill.
On Sun 01-09-24 21:35:30, Kent Overstreet wrote: [...] > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > in the case of bugs, because that's going to be an improvement w.r.t. > system robustness, in exactly the same way we don't use BUG_ON() if it's > something that we can't guarantee won't happen in the wild - we WARN() > and try to handle the error as best we can. We have discussed that in a different email thread. And I have to say that I am not convinced that returning NULL makes a broken code much better. Why? Because we can expect that broken NOFAIL users will not have a error checking path. Even valid NOFAIL users will not have one because they _know_ they do not have a different than retry for ever recovery path. That means that an unexpected NULL return either means OOPS or a subtle silent error - e.g. memory corruption. The former is a actually a saner recovery model because the execution is stopped before more harm can be done. I suspect most of those buggy users will simply OOPS but systematically checking for latter is a lot of work and needs to be constantly because code evolves... I have tried to argue that if allocator cannot or refuse to satisfy GFP_NOFAIL request because it is trying to use unsupported allocation mode or size then we should terminate the allocation context. That would make the API more predictable and therefore safer to use. This is not what the allocator does today though. Atomic NOFAIL allocations fail same as kvmalloc requests which are clearly overflows. Especially the later could become a risk if they are reachable from the userspace with controlable allocation size.
On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > [...] > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > > in the case of bugs, because that's going to be an improvement w.r.t. > > system robustness, in exactly the same way we don't use BUG_ON() if it's > > something that we can't guarantee won't happen in the wild - we WARN() > > and try to handle the error as best we can. > > We have discussed that in a different email thread. And I have to say > that I am not convinced that returning NULL makes a broken code much > better. Why? Because we can expect that broken NOFAIL users will not have a > error checking path. Even valid NOFAIL users will not have one because > they _know_ they do not have a different than retry for ever recovery > path. You mean where I asked you for a link to the discussion and rationale you claimed had happened? Still waiting on that > That means that an unexpected NULL return either means OOPS or a subtle > silent error - e.g. memory corruption. The former is a actually a saner > recovery model because the execution is stopped before more harm can be > done. I suspect most of those buggy users will simply OOPS but > systematically checking for latter is a lot of work and needs to be > constantly because code evolves... > > I have tried to argue that if allocator cannot or refuse to satisfy > GFP_NOFAIL request because it is trying to use unsupported allocation > mode or size then we should terminate the allocation context. That would > make the API more predictable and therefore safer to use. You're arguing that we treat it like an oops? Yeah, enough of this insanity.
On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > > [...] > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > > > in the case of bugs, because that's going to be an improvement w.r.t. > > > system robustness, in exactly the same way we don't use BUG_ON() if it's > > > something that we can't guarantee won't happen in the wild - we WARN() > > > and try to handle the error as best we can. > > > > We have discussed that in a different email thread. And I have to say > > that I am not convinced that returning NULL makes a broken code much > > better. Why? Because we can expect that broken NOFAIL users will not have a > > error checking path. Even valid NOFAIL users will not have one because > > they _know_ they do not have a different than retry for ever recovery > > path. > > You mean where I asked you for a link to the discussion and rationale > you claimed had happened? Still waiting on that I am not your assistent to be tasked and search through lore archives. Find one if you need that. Anyway, if you read the email and even tried to understand what is written there rather than immediately started shouting a response then you would have noticed I have put actual arguments here. You are free to disagree with them and lay down your arguments. You have decided to [...] > Yeah, enough of this insanity. so I do not think you are able to do that. Again...
On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: > On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > > > [...] > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > > > > in the case of bugs, because that's going to be an improvement w.r.t. > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's > > > > something that we can't guarantee won't happen in the wild - we WARN() > > > > and try to handle the error as best we can. > > > > > > We have discussed that in a different email thread. And I have to say > > > that I am not convinced that returning NULL makes a broken code much > > > better. Why? Because we can expect that broken NOFAIL users will not have a > > > error checking path. Even valid NOFAIL users will not have one because > > > they _know_ they do not have a different than retry for ever recovery > > > path. > > > > You mean where I asked you for a link to the discussion and rationale > > you claimed had happened? Still waiting on that > > I am not your assistent to be tasked and search through lore archives. > Find one if you need that. > > Anyway, if you read the email and even tried to understand what is > written there rather than immediately started shouting a response then > you would have noticed I have put actual arguments here. You are free to > disagree with them and lay down your arguments. You have decided to > > [...] > > > Yeah, enough of this insanity. > > so I do not think you are able to do that. Again... Michal, if you think crashing processes is an acceptable alternative to error handling _you have no business writing kernel code_. You have been stridently arguing for one bad idea after another, and it's an insult to those of us who do give a shit about writing reliable software. You're arguing against basic precepts of kernel programming. Get your head examined. And get the fuck out of here with this shit.
Kent Overstreet <kent.overstreet@linux.dev> writes: > You're arguing against basic precepts of kernel programming. > > Get your head examined. And get the fuck out of here with this shit. Kent, this is not the way to deal with your peers in this community, no matter how strongly you disagree with them. Might I suggest calming down a bit, followed by an apology? Thanks, jon
On 9/2/24 03:51, Kent Overstreet wrote: > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: >> On Mon 02-09-24 04:52:49, Kent Overstreet wrote: >>> On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: >>>> On Sun 01-09-24 21:35:30, Kent Overstreet wrote: >>>> [...] >>>>> But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL >>>>> in the case of bugs, because that's going to be an improvement w.r.t. >>>>> system robustness, in exactly the same way we don't use BUG_ON() if it's >>>>> something that we can't guarantee won't happen in the wild - we WARN() >>>>> and try to handle the error as best we can. >>>> >>>> We have discussed that in a different email thread. And I have to say >>>> that I am not convinced that returning NULL makes a broken code much >>>> better. Why? Because we can expect that broken NOFAIL users will not have a >>>> error checking path. Even valid NOFAIL users will not have one because >>>> they _know_ they do not have a different than retry for ever recovery >>>> path. >>> >>> You mean where I asked you for a link to the discussion and rationale >>> you claimed had happened? Still waiting on that >> >> I am not your assistent to be tasked and search through lore archives. >> Find one if you need that. >> >> Anyway, if you read the email and even tried to understand what is >> written there rather than immediately started shouting a response then >> you would have noticed I have put actual arguments here. You are free to >> disagree with them and lay down your arguments. You have decided to >> >> [...] >> >>> Yeah, enough of this insanity. >> >> so I do not think you are able to do that. Again... > > Michal, if you think crashing processes is an acceptable alternative to > error handling _you have no business writing kernel code_. > > You have been stridently arguing for one bad idea after another, and > it's an insult to those of us who do give a shit about writing reliable > software. > > You're arguing against basic precepts of kernel programming. > > Get your head examined. And get the fuck out of here with this shit. > Kent, Using language like this is clearly unacceptable and violates the Code of Conduct. This type of language doesn't promote respectful and productive discussions and is detrimental to the health of the community. You should be well aware that this type of language and personal attack is a clear violation of the Linux kernel Contributor Covenant Code of Conduct as outlined in the following: https://www.kernel.org/doc/html/latest/process/code-of-conduct.html Refer to the Code of Conduct and refrain from violating the Code of Conduct in the future. thanks, -- Shuah (On behalf of the Code of Conduct Committee)
On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote: > On 9/2/24 03:51, Kent Overstreet wrote: > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > > > > > [...] > > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > > > > > > in the case of bugs, because that's going to be an improvement w.r.t. > > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's > > > > > > something that we can't guarantee won't happen in the wild - we WARN() > > > > > > and try to handle the error as best we can. > > > > > > > > > > We have discussed that in a different email thread. And I have to say > > > > > that I am not convinced that returning NULL makes a broken code much > > > > > better. Why? Because we can expect that broken NOFAIL users will not have a > > > > > error checking path. Even valid NOFAIL users will not have one because > > > > > they _know_ they do not have a different than retry for ever recovery > > > > > path. > > > > > > > > You mean where I asked you for a link to the discussion and rationale > > > > you claimed had happened? Still waiting on that > > > > > > I am not your assistent to be tasked and search through lore archives. > > > Find one if you need that. > > > > > > Anyway, if you read the email and even tried to understand what is > > > written there rather than immediately started shouting a response then > > > you would have noticed I have put actual arguments here. You are free to > > > disagree with them and lay down your arguments. You have decided to > > > > > > [...] > > > > > > > Yeah, enough of this insanity. > > > > > > so I do not think you are able to do that. Again... > > > > Michal, if you think crashing processes is an acceptable alternative to > > error handling _you have no business writing kernel code_. > > > > You have been stridently arguing for one bad idea after another, and > > it's an insult to those of us who do give a shit about writing reliable > > software. > > > > You're arguing against basic precepts of kernel programming. > > > > Get your head examined. And get the fuck out of here with this shit. > > > > Kent, > > Using language like this is clearly unacceptable and violates the > Code of Conduct. This type of language doesn't promote respectful > and productive discussions and is detrimental to the health of the > community. > > You should be well aware that this type of language and personal > attack is a clear violation of the Linux kernel Contributor Covenant > Code of Conduct as outlined in the following: > > https://www.kernel.org/doc/html/latest/process/code-of-conduct.html > > Refer to the Code of Conduct and refrain from violating the Code of > Conduct in the future. I believe Michal and I have more or less worked this out privately (and you guys have been copied on that as well).
On 11/20/24 13:34, Kent Overstreet wrote: > On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote: >> On 9/2/24 03:51, Kent Overstreet wrote: >>> On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: >>>> On Mon 02-09-24 04:52:49, Kent Overstreet wrote: >>>>> On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: >>>>>> On Sun 01-09-24 21:35:30, Kent Overstreet wrote: >>>>>> [...] >>>>>>> But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL >>>>>>> in the case of bugs, because that's going to be an improvement w.r.t. >>>>>>> system robustness, in exactly the same way we don't use BUG_ON() if it's >>>>>>> something that we can't guarantee won't happen in the wild - we WARN() >>>>>>> and try to handle the error as best we can. >>>>>> >>>>>> We have discussed that in a different email thread. And I have to say >>>>>> that I am not convinced that returning NULL makes a broken code much >>>>>> better. Why? Because we can expect that broken NOFAIL users will not have a >>>>>> error checking path. Even valid NOFAIL users will not have one because >>>>>> they _know_ they do not have a different than retry for ever recovery >>>>>> path. >>>>> >>>>> You mean where I asked you for a link to the discussion and rationale >>>>> you claimed had happened? Still waiting on that >>>> >>>> I am not your assistent to be tasked and search through lore archives. >>>> Find one if you need that. >>>> >>>> Anyway, if you read the email and even tried to understand what is >>>> written there rather than immediately started shouting a response then >>>> you would have noticed I have put actual arguments here. You are free to >>>> disagree with them and lay down your arguments. You have decided to >>>> >>>> [...] >>>> >>>>> Yeah, enough of this insanity. >>>> >>>> so I do not think you are able to do that. Again... >>> >>> Michal, if you think crashing processes is an acceptable alternative to >>> error handling _you have no business writing kernel code_. >>> >>> You have been stridently arguing for one bad idea after another, and >>> it's an insult to those of us who do give a shit about writing reliable >>> software. >>> >>> You're arguing against basic precepts of kernel programming. >>> >>> Get your head examined. And get the fuck out of here with this shit. >>> >> >> Kent, >> >> Using language like this is clearly unacceptable and violates the >> Code of Conduct. This type of language doesn't promote respectful >> and productive discussions and is detrimental to the health of the >> community. >> >> You should be well aware that this type of language and personal >> attack is a clear violation of the Linux kernel Contributor Covenant >> Code of Conduct as outlined in the following: >> >> https://www.kernel.org/doc/html/latest/process/code-of-conduct.html >> >> Refer to the Code of Conduct and refrain from violating the Code of >> Conduct in the future. > > I believe Michal and I have more or less worked this out privately (and > you guys have been copied on that as well). Thank you for updating us on the behind the scenes work between you and Michal. I will make one correction to your statement, "you guys have been copied on that as well" - which is inaccurate. You have shared your email exchanges with Michal with us to let us know that the issue has been sorted out. You might have your reasons and concerns about the direction of the code and design that pertains to the discussion in this email thread. You might have your reasons for expressing your frustration. However, those need to be worked out as separate from this Code of Conduct violation. In the case of unacceptable behaviors as defined in the Code of Conduct document, the process is to work towards restoring productive and respectful discussions. It is reasonable to ask for an apology to help us get to the goal as soon as possible. I urge you once again to apologize for using language that negatively impacts productive discussions. thanks, -- Shuah (On behalf of the Code of Conduct Committee)
On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote: > On 11/20/24 13:34, Kent Overstreet wrote: > > On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote: > > > On 9/2/24 03:51, Kent Overstreet wrote: > > > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: > > > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > > > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > > > > > > > [...] > > > > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > > > > > > > > in the case of bugs, because that's going to be an improvement w.r.t. > > > > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's > > > > > > > > something that we can't guarantee won't happen in the wild - we WARN() > > > > > > > > and try to handle the error as best we can. > > > > > > > > > > > > > > We have discussed that in a different email thread. And I have to say > > > > > > > that I am not convinced that returning NULL makes a broken code much > > > > > > > better. Why? Because we can expect that broken NOFAIL users will not have a > > > > > > > error checking path. Even valid NOFAIL users will not have one because > > > > > > > they _know_ they do not have a different than retry for ever recovery > > > > > > > path. > > > > > > > > > > > > You mean where I asked you for a link to the discussion and rationale > > > > > > you claimed had happened? Still waiting on that > > > > > > > > > > I am not your assistent to be tasked and search through lore archives. > > > > > Find one if you need that. > > > > > > > > > > Anyway, if you read the email and even tried to understand what is > > > > > written there rather than immediately started shouting a response then > > > > > you would have noticed I have put actual arguments here. You are free to > > > > > disagree with them and lay down your arguments. You have decided to > > > > > > > > > > [...] > > > > > > > > > > > Yeah, enough of this insanity. > > > > > > > > > > so I do not think you are able to do that. Again... > > > > > > > > Michal, if you think crashing processes is an acceptable alternative to > > > > error handling _you have no business writing kernel code_. > > > > > > > > You have been stridently arguing for one bad idea after another, and > > > > it's an insult to those of us who do give a shit about writing reliable > > > > software. > > > > > > > > You're arguing against basic precepts of kernel programming. > > > > > > > > Get your head examined. And get the fuck out of here with this shit. > > > > > > > > > > Kent, > > > > > > Using language like this is clearly unacceptable and violates the > > > Code of Conduct. This type of language doesn't promote respectful > > > and productive discussions and is detrimental to the health of the > > > community. > > > > > > You should be well aware that this type of language and personal > > > attack is a clear violation of the Linux kernel Contributor Covenant > > > Code of Conduct as outlined in the following: > > > > > > https://www.kernel.org/doc/html/latest/process/code-of-conduct.html > > > > > > Refer to the Code of Conduct and refrain from violating the Code of > > > Conduct in the future. > > > > I believe Michal and I have more or less worked this out privately (and > > you guys have been copied on that as well). > > Thank you for updating us on the behind the scenes work between you > and Michal. > > I will make one correction to your statement, "you guys have been copied on > that as well" - which is inaccurate. You have shared your email exchanges > with Michal with us to let us know that the issue has been sorted out. That seems to be what I just said. > You might have your reasons and concerns about the direction of the code > and design that pertains to the discussion in this email thread. You might > have your reasons for expressing your frustration. However, those need to be > worked out as separate from this Code of Conduct violation. > > In the case of unacceptable behaviors as defined in the Code of Conduct > document, the process is to work towards restoring productive and > respectful discussions. It is reasonable to ask for an apology to help > us get to the goal as soon as possible. > > I urge you once again to apologize for using language that negatively impacts > productive discussions. Shuah, I'd be happy to give you that after the discussion I suggested. Failing that, I urge you to stick to what we agreed to last night.
On 11/20/24 14:20, Kent Overstreet wrote: > On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote: >> On 11/20/24 13:34, Kent Overstreet wrote: >>> On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote: >>>> On 9/2/24 03:51, Kent Overstreet wrote: >>>>> On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: >>>>>> On Mon 02-09-24 04:52:49, Kent Overstreet wrote: >>>>>>> On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: >>>>>>>> On Sun 01-09-24 21:35:30, Kent Overstreet wrote: >>>>>>>> [...] >>>>>>>>> But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL >>>>>>>>> in the case of bugs, because that's going to be an improvement w.r.t. >>>>>>>>> system robustness, in exactly the same way we don't use BUG_ON() if it's >>>>>>>>> something that we can't guarantee won't happen in the wild - we WARN() >>>>>>>>> and try to handle the error as best we can. >>>>>>>> >>>>>>>> We have discussed that in a different email thread. And I have to say >>>>>>>> that I am not convinced that returning NULL makes a broken code much >>>>>>>> better. Why? Because we can expect that broken NOFAIL users will not have a >>>>>>>> error checking path. Even valid NOFAIL users will not have one because >>>>>>>> they _know_ they do not have a different than retry for ever recovery >>>>>>>> path. >>>>>>> >>>>>>> You mean where I asked you for a link to the discussion and rationale >>>>>>> you claimed had happened? Still waiting on that >>>>>> >>>>>> I am not your assistent to be tasked and search through lore archives. >>>>>> Find one if you need that. >>>>>> >>>>>> Anyway, if you read the email and even tried to understand what is >>>>>> written there rather than immediately started shouting a response then >>>>>> you would have noticed I have put actual arguments here. You are free to >>>>>> disagree with them and lay down your arguments. You have decided to >>>>>> >>>>>> [...] >>>>>> >>>>>>> Yeah, enough of this insanity. >>>>>> >>>>>> so I do not think you are able to do that. Again... >>>>> >>>>> Michal, if you think crashing processes is an acceptable alternative to >>>>> error handling _you have no business writing kernel code_. >>>>> >>>>> You have been stridently arguing for one bad idea after another, and >>>>> it's an insult to those of us who do give a shit about writing reliable >>>>> software. >>>>> >>>>> You're arguing against basic precepts of kernel programming. >>>>> >>>>> Get your head examined. And get the fuck out of here with this shit. >>>>> >>>> >>>> Kent, >>>> >>>> Using language like this is clearly unacceptable and violates the >>>> Code of Conduct. This type of language doesn't promote respectful >>>> and productive discussions and is detrimental to the health of the >>>> community. >>>> >>>> You should be well aware that this type of language and personal >>>> attack is a clear violation of the Linux kernel Contributor Covenant >>>> Code of Conduct as outlined in the following: >>>> >>>> https://www.kernel.org/doc/html/latest/process/code-of-conduct.html >>>> >>>> Refer to the Code of Conduct and refrain from violating the Code of >>>> Conduct in the future. >>> >>> I believe Michal and I have more or less worked this out privately (and >>> you guys have been copied on that as well). >> >> Thank you for updating us on the behind the scenes work between you >> and Michal. >> >> I will make one correction to your statement, "you guys have been copied on >> that as well" - which is inaccurate. You have shared your email exchanges >> with Michal with us to let us know that the issue has been sorted out. > > That seems to be what I just said. > >> You might have your reasons and concerns about the direction of the code >> and design that pertains to the discussion in this email thread. You might >> have your reasons for expressing your frustration. However, those need to be >> worked out as separate from this Code of Conduct violation. >> >> In the case of unacceptable behaviors as defined in the Code of Conduct >> document, the process is to work towards restoring productive and >> respectful discussions. It is reasonable to ask for an apology to help >> us get to the goal as soon as possible. >> >> I urge you once again to apologize for using language that negatively impacts >> productive discussions. > > Shuah, I'd be happy to give you that after the discussion I suggested. > Failing that, I urge you to stick to what we agreed to last night. You have the right to the discussion and debate and engage the community in the discussion. Discuss and debate away. thanks, -- Shuah (On behalf of the Code of Conduct Committee)
On 11/20/24 14:37, Shuah Khan wrote: > On 11/20/24 14:20, Kent Overstreet wrote: >> On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote: >>> On 11/20/24 13:34, Kent Overstreet wrote: >>>> On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote: >>>>> On 9/2/24 03:51, Kent Overstreet wrote: >>>>>> On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: >>>>>>> On Mon 02-09-24 04:52:49, Kent Overstreet wrote: >>>>>>>> On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: >>>>>>>>> On Sun 01-09-24 21:35:30, Kent Overstreet wrote: >>>>>>>>> [...] >>>>>>>>>> But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL >>>>>>>>>> in the case of bugs, because that's going to be an improvement w.r.t. >>>>>>>>>> system robustness, in exactly the same way we don't use BUG_ON() if it's >>>>>>>>>> something that we can't guarantee won't happen in the wild - we WARN() >>>>>>>>>> and try to handle the error as best we can. >>>>>>>>> >>>>>>>>> We have discussed that in a different email thread. And I have to say >>>>>>>>> that I am not convinced that returning NULL makes a broken code much >>>>>>>>> better. Why? Because we can expect that broken NOFAIL users will not have a >>>>>>>>> error checking path. Even valid NOFAIL users will not have one because >>>>>>>>> they _know_ they do not have a different than retry for ever recovery >>>>>>>>> path. >>>>>>>> >>>>>>>> You mean where I asked you for a link to the discussion and rationale >>>>>>>> you claimed had happened? Still waiting on that >>>>>>> >>>>>>> I am not your assistent to be tasked and search through lore archives. >>>>>>> Find one if you need that. >>>>>>> >>>>>>> Anyway, if you read the email and even tried to understand what is >>>>>>> written there rather than immediately started shouting a response then >>>>>>> you would have noticed I have put actual arguments here. You are free to >>>>>>> disagree with them and lay down your arguments. You have decided to >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>> Yeah, enough of this insanity. >>>>>>> >>>>>>> so I do not think you are able to do that. Again... >>>>>> >>>>>> Michal, if you think crashing processes is an acceptable alternative to >>>>>> error handling _you have no business writing kernel code_. >>>>>> >>>>>> You have been stridently arguing for one bad idea after another, and >>>>>> it's an insult to those of us who do give a shit about writing reliable >>>>>> software. >>>>>> >>>>>> You're arguing against basic precepts of kernel programming. >>>>>> >>>>>> Get your head examined. And get the fuck out of here with this shit. >>>>>> >>>>> >>>>> Kent, >>>>> >>>>> Using language like this is clearly unacceptable and violates the >>>>> Code of Conduct. This type of language doesn't promote respectful >>>>> and productive discussions and is detrimental to the health of the >>>>> community. >>>>> >>>>> You should be well aware that this type of language and personal >>>>> attack is a clear violation of the Linux kernel Contributor Covenant >>>>> Code of Conduct as outlined in the following: >>>>> >>>>> https://www.kernel.org/doc/html/latest/process/code-of-conduct.html >>>>> >>>>> Refer to the Code of Conduct and refrain from violating the Code of >>>>> Conduct in the future. >>>> >>>> I believe Michal and I have more or less worked this out privately (and >>>> you guys have been copied on that as well). >>> >>> Thank you for updating us on the behind the scenes work between you >>> and Michal. >>> >>> I will make one correction to your statement, "you guys have been copied on >>> that as well" - which is inaccurate. You have shared your email exchanges >>> with Michal with us to let us know that the issue has been sorted out. >> >> That seems to be what I just said. >> >>> You might have your reasons and concerns about the direction of the code >>> and design that pertains to the discussion in this email thread. You might >>> have your reasons for expressing your frustration. However, those need to be >>> worked out as separate from this Code of Conduct violation. >>> >>> In the case of unacceptable behaviors as defined in the Code of Conduct >>> document, the process is to work towards restoring productive and >>> respectful discussions. It is reasonable to ask for an apology to help >>> us get to the goal as soon as possible. >>> >>> I urge you once again to apologize for using language that negatively impacts >>> productive discussions. >> >> Shuah, I'd be happy to give you that after the discussion I suggested. >> Failing that, I urge you to stick to what we agreed to last night. The only thing we agreed upon is that you would respond the thread to update your sorting things out with Michal. As for the discussion, I will repeat what I said in our conversation that the discussion will be lot more productive after making amends with the community. I stand by that assessment. I will also repeat what I said that the discussion and debate is outside the scope of the current issue the Code of Conduct Committee is trying to resolve. I didn't pick up on your desire to apologize after the discussion in our conversation. Are you saying you will be happy to make amends with an apology after the discussion and debate? thanks, -- Shuah (On behalf of the Code of Conduct Committee)
On Wed, Nov 20, 2024 at 03:21:06PM -0700, Shuah Khan wrote: > On 11/20/24 14:37, Shuah Khan wrote: > > On 11/20/24 14:20, Kent Overstreet wrote: > > > On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote: > > > > On 11/20/24 13:34, Kent Overstreet wrote: > > > > > On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote: > > > > > > On 9/2/24 03:51, Kent Overstreet wrote: > > > > > > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: > > > > > > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > > > > > > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > > > > > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > > > > > > > > > > [...] > > > > > > > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > > > > > > > > > > > in the case of bugs, because that's going to be an improvement w.r.t. > > > > > > > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's > > > > > > > > > > > something that we can't guarantee won't happen in the wild - we WARN() > > > > > > > > > > > and try to handle the error as best we can. > > > > > > > > > > > > > > > > > > > > We have discussed that in a different email thread. And I have to say > > > > > > > > > > that I am not convinced that returning NULL makes a broken code much > > > > > > > > > > better. Why? Because we can expect that broken NOFAIL users will not have a > > > > > > > > > > error checking path. Even valid NOFAIL users will not have one because > > > > > > > > > > they _know_ they do not have a different than retry for ever recovery > > > > > > > > > > path. > > > > > > > > > > > > > > > > > > You mean where I asked you for a link to the discussion and rationale > > > > > > > > > you claimed had happened? Still waiting on that > > > > > > > > > > > > > > > > I am not your assistent to be tasked and search through lore archives. > > > > > > > > Find one if you need that. > > > > > > > > > > > > > > > > Anyway, if you read the email and even tried to understand what is > > > > > > > > written there rather than immediately started shouting a response then > > > > > > > > you would have noticed I have put actual arguments here. You are free to > > > > > > > > disagree with them and lay down your arguments. You have decided to > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > Yeah, enough of this insanity. > > > > > > > > > > > > > > > > so I do not think you are able to do that. Again... > > > > > > > > > > > > > > Michal, if you think crashing processes is an acceptable alternative to > > > > > > > error handling _you have no business writing kernel code_. > > > > > > > > > > > > > > You have been stridently arguing for one bad idea after another, and > > > > > > > it's an insult to those of us who do give a shit about writing reliable > > > > > > > software. > > > > > > > > > > > > > > You're arguing against basic precepts of kernel programming. > > > > > > > > > > > > > > Get your head examined. And get the fuck out of here with this shit. > > > > > > > > > > > > > > > > > > > Kent, > > > > > > > > > > > > Using language like this is clearly unacceptable and violates the > > > > > > Code of Conduct. This type of language doesn't promote respectful > > > > > > and productive discussions and is detrimental to the health of the > > > > > > community. > > > > > > > > > > > > You should be well aware that this type of language and personal > > > > > > attack is a clear violation of the Linux kernel Contributor Covenant > > > > > > Code of Conduct as outlined in the following: > > > > > > > > > > > > https://www.kernel.org/doc/html/latest/process/code-of-conduct.html > > > > > > > > > > > > Refer to the Code of Conduct and refrain from violating the Code of > > > > > > Conduct in the future. > > > > > > > > > > I believe Michal and I have more or less worked this out privately (and > > > > > you guys have been copied on that as well). > > > > > > > > Thank you for updating us on the behind the scenes work between you > > > > and Michal. > > > > > > > > I will make one correction to your statement, "you guys have been copied on > > > > that as well" - which is inaccurate. You have shared your email exchanges > > > > with Michal with us to let us know that the issue has been sorted out. > > > > > > That seems to be what I just said. > > > > > > > You might have your reasons and concerns about the direction of the code > > > > and design that pertains to the discussion in this email thread. You might > > > > have your reasons for expressing your frustration. However, those need to be > > > > worked out as separate from this Code of Conduct violation. > > > > > > > > In the case of unacceptable behaviors as defined in the Code of Conduct > > > > document, the process is to work towards restoring productive and > > > > respectful discussions. It is reasonable to ask for an apology to help > > > > us get to the goal as soon as possible. > > > > > > > > I urge you once again to apologize for using language that negatively impacts > > > > productive discussions. > > > > > > Shuah, I'd be happy to give you that after the discussion I suggested. > > > Failing that, I urge you to stick to what we agreed to last night. > The only thing we agreed upon is that you would respond the thread > to update your sorting things out with Michal. ...Shall I quote you? > > As for the discussion, I will repeat what I said in our conversation > that the discussion will be lot more productive after making amends > with the community. I stand by that assessment. > > I will also repeat what I said that the discussion and debate is > outside the scope of the current issue the Code of Conduct Committee > is trying to resolve. > > I didn't pick up on your desire to apologize after the discussion in > our conversation. > > Are you saying you will be happy to make amends with an apology after > the discussion and debate? Look, I just want to be done with this, so let me lay it all out as I see it, starting from the beginning of where things went off the rails between myself and Michal: Michal's (as well as Steve's) behaviour in the memory allocation profiling review process was, in my view, unacceptable (this included such things as crashing our LSF presentation with ideas they'd come up with that morning, and persistent dismissive axegrinding on the list). The project was nearly killed because of his inability to listen to the reasons for a design and being stubbornly stuck on his right to be heard as the maintainer. In my view, being a good maintainer has a lot more to do with stewardship and leadership, than stubbornly insisting for - whatever that was. In any event, that was where I came to the conclusion "I just cannot work that guy". Next up, PF_MEMALLOC_NORECLAIM over Michal's nack - I was wrong there, I only did it because it really seemed to me that Michal was axe grinding against _anything_ I was posting, but I still shouldn't have and that was more serious infraction in my view; that sort of thing causes a real loss of trust, and no I will not do it again. The subsequent PF_MEMALLOC_NORECLAIM discussion was such a trainwreck that I don't think I will go into it. Except to say that yes, if it makes you happy, I shouldn't have used that language and I won't do it again. But I do have to call out you, the CoC board's behaviour, and I think that ony fair since you call out other people's behaviour publically. Greg's behaviour when he approached me at Plumbers was beyond unprofessional, and since it wasn't exactly public and you guys have already heard about it privately I won't repeat exactly what happened, but it is an issue. Shuah, you weren't much better. There were concerns raised in the recent CoC enforcement thread, by someone with experience in such matters, that your aproach seemed extremeely heavy handed and I find myself in 100% agreement. The approach you take is that of a bad HR department: all about image, no understanding. When tensions arise, it's important get to the bottom of things, to at least try to take the time to listen with an open mind. People have real frustrations, and it's amazing what you can learn and what you can accomplish by having real conversations. But that's not what you guys do: you say "Ok, if someone's being too much of an asshole, we'll just be an even bigger asshole!". No. Cut that out. I've done the hard work of stepping in and building bridges when relations have broken down (on quite a large scale), so I'm offended by what you guys do.
On Wed, Nov 20, 2024 at 05:39:19PM -0500, Kent Overstreet wrote: > On Wed, Nov 20, 2024 at 03:21:06PM -0700, Shuah Khan wrote: > > On 11/20/24 14:37, Shuah Khan wrote: > > > On 11/20/24 14:20, Kent Overstreet wrote: > > > > On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote: > > > > > On 11/20/24 13:34, Kent Overstreet wrote: > > > > > > On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote: > > > > > > > On 9/2/24 03:51, Kent Overstreet wrote: > > > > > > > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: > > > > > > > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > > > > > > > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > > > > > > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > > > > > > > > > > > [...] > > > > > > > > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > > > > > > > > > > > > in the case of bugs, because that's going to be an improvement w.r.t. > > > > > > > > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's > > > > > > > > > > > > something that we can't guarantee won't happen in the wild - we WARN() > > > > > > > > > > > > and try to handle the error as best we can. > > > > > > > > > > > > > > > > > > > > > > We have discussed that in a different email thread. And I have to say > > > > > > > > > > > that I am not convinced that returning NULL makes a broken code much > > > > > > > > > > > better. Why? Because we can expect that broken NOFAIL users will not have a > > > > > > > > > > > error checking path. Even valid NOFAIL users will not have one because > > > > > > > > > > > they _know_ they do not have a different than retry for ever recovery > > > > > > > > > > > path. > > > > > > > > > > > > > > > > > > > > You mean where I asked you for a link to the discussion and rationale > > > > > > > > > > you claimed had happened? Still waiting on that > > > > > > > > > > > > > > > > > > I am not your assistent to be tasked and search through lore archives. > > > > > > > > > Find one if you need that. > > > > > > > > > > > > > > > > > > Anyway, if you read the email and even tried to understand what is > > > > > > > > > written there rather than immediately started shouting a response then > > > > > > > > > you would have noticed I have put actual arguments here. You are free to > > > > > > > > > disagree with them and lay down your arguments. You have decided to > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > Yeah, enough of this insanity. > > > > > > > > > > > > > > > > > > so I do not think you are able to do that. Again... > > > > > > > > > > > > > > > > Michal, if you think crashing processes is an acceptable alternative to > > > > > > > > error handling _you have no business writing kernel code_. > > > > > > > > > > > > > > > > You have been stridently arguing for one bad idea after another, and > > > > > > > > it's an insult to those of us who do give a shit about writing reliable > > > > > > > > software. > > > > > > > > > > > > > > > > You're arguing against basic precepts of kernel programming. > > > > > > > > > > > > > > > > Get your head examined. And get the fuck out of here with this shit. > > > > > > > > > > > > > > > > > > > > > > Kent, > > > > > > > > > > > > > > Using language like this is clearly unacceptable and violates the > > > > > > > Code of Conduct. This type of language doesn't promote respectful > > > > > > > and productive discussions and is detrimental to the health of the > > > > > > > community. > > > > > > > > > > > > > > You should be well aware that this type of language and personal > > > > > > > attack is a clear violation of the Linux kernel Contributor Covenant > > > > > > > Code of Conduct as outlined in the following: > > > > > > > > > > > > > > https://www.kernel.org/doc/html/latest/process/code-of-conduct.html > > > > > > > > > > > > > > Refer to the Code of Conduct and refrain from violating the Code of > > > > > > > Conduct in the future. > > > > > > > > > > > > I believe Michal and I have more or less worked this out privately (and > > > > > > you guys have been copied on that as well). > > > > > > > > > > Thank you for updating us on the behind the scenes work between you > > > > > and Michal. > > > > > > > > > > I will make one correction to your statement, "you guys have been copied on > > > > > that as well" - which is inaccurate. You have shared your email exchanges > > > > > with Michal with us to let us know that the issue has been sorted out. > > > > > > > > That seems to be what I just said. > > > > > > > > > You might have your reasons and concerns about the direction of the code > > > > > and design that pertains to the discussion in this email thread. You might > > > > > have your reasons for expressing your frustration. However, those need to be > > > > > worked out as separate from this Code of Conduct violation. > > > > > > > > > > In the case of unacceptable behaviors as defined in the Code of Conduct > > > > > document, the process is to work towards restoring productive and > > > > > respectful discussions. It is reasonable to ask for an apology to help > > > > > us get to the goal as soon as possible. > > > > > > > > > > I urge you once again to apologize for using language that negatively impacts > > > > > productive discussions. > > > > > > > > Shuah, I'd be happy to give you that after the discussion I suggested. > > > > Failing that, I urge you to stick to what we agreed to last night. > > The only thing we agreed upon is that you would respond the thread > > to update your sorting things out with Michal. > > ...Shall I quote you? > > > > > As for the discussion, I will repeat what I said in our conversation > > that the discussion will be lot more productive after making amends > > with the community. I stand by that assessment. > > > > I will also repeat what I said that the discussion and debate is > > outside the scope of the current issue the Code of Conduct Committee > > is trying to resolve. > > > > I didn't pick up on your desire to apologize after the discussion in > > our conversation. > > > > Are you saying you will be happy to make amends with an apology after > > the discussion and debate? > > Look, I just want to be done with this, so let me lay it all out as I > see it, starting from the beginning of where things went off the rails > between myself and Michal: > > Michal's (as well as Steve's) behaviour in the memory allocation > profiling review process was, in my view, unacceptable (this included > such things as crashing our LSF presentation with ideas they'd come up > with that morning, and persistent dismissive axegrinding on the list). > The project was nearly killed because of his inability to listen to the > reasons for a design and being stubbornly stuck on his right to be heard > as the maintainer. > > In my view, being a good maintainer has a lot more to do with > stewardship and leadership, than stubbornly insisting for - whatever > that was. In any event, that was where I came to the conclusion "I just > cannot work that guy". > > Next up, PF_MEMALLOC_NORECLAIM over Michal's nack - I was wrong there, I > only did it because it really seemed to me that Michal was axe grinding > against _anything_ I was posting, but I still shouldn't have and that > was more serious infraction in my view; that sort of thing causes a real > loss of trust, and no I will not do it again. > > The subsequent PF_MEMALLOC_NORECLAIM discussion was such a trainwreck > that I don't think I will go into it. Except to say that yes, if it > makes you happy, I shouldn't have used that language and I won't do it > again. > > But I do have to call out you, the CoC board's behaviour, and I think > that ony fair since you call out other people's behaviour publically. > > Greg's behaviour when he approached me at Plumbers was beyond > unprofessional, and since it wasn't exactly public and you guys have > already heard about it privately I won't repeat exactly what happened, > but it is an issue. > > Shuah, you weren't much better. > > There were concerns raised in the recent CoC enforcement thread, by > someone with experience in such matters, that your aproach seemed > extremeely heavy handed and I find myself in 100% agreement. > > The approach you take is that of a bad HR department: all about image, > no understanding. When tensions arise, it's important get to the bottom > of things, to at least try to take the time to listen with an open mind. > People have real frustrations, and it's amazing what you can learn and > what you can accomplish by having real conversations. > > But that's not what you guys do: you say "Ok, if someone's being too > much of an asshole, we'll just be an even bigger asshole!". > > No. Cut that out. > > I've done the hard work of stepping in and building bridges when > relations have broken down (on quite a large scale), so I'm offended by > what you guys do. Now, I've said two things I'll do differently, or not do in the future. Michal, would you be willing to consider changing your approach a bit in similar situations? Try to lead a little bit less by "I'm the mainainer, my concerns must be addressed" and a little bit more by incorporating the best of everyone's ideas, and showing respect to others who have studied their problems, as you want to be respected as maintainer? Shuah, would you be willing to entertain the notion of modifying your approach a bit as well? More in the direction of genuine conversations and understanding, less of just following a process and slapping people if they don't comply? We've got people in the community who are good at this sort of thing, and might be willing to help if they were asked - it doesn't have to just be you guys, and if we started encouraging this sort of thing it could be a real learning experience for everyone.
Lastly, the thing that motivated me to make an issue out of this was several recent complaints, by my funders, that it's gotten increasingly difficult to get work done on the lists lately without showing up at conferences and shmoozing with the right people. I've noticed that as well. That's something we do need to address, and I see a common thread between that and dismissive/authoritarian behaviour, and I think those of us at the highest level (i.e. CoC board members) should be mindful of how we set the tone for everyone else. Yes, we're all Busy Important People (TM), but doing our jobs well requires us to engage well with people. I think that should be prioritized at least as much as "language". It's not just about what words we use to communicate, it's about whether we're able to communicate effectively or at all. Couple's therapists say they can tell in a few minutes if a relationship is worth salvaging or if it's beyond repair - and it comes down to if they come in displaying anger or dismissiventess. Anger can be worked through, dismissiveness means they no longer care. I find the same is true with engineers. When people are pissed off about something, that anger is often pointing to some important issue underneath that, and getting to the bottom of it is going to hava a big payoff. But when teams stop being able to work together - when people start getting silod, afraid to stick their head up - that's really bad. Vannevar Bush said that all he did was get people to talk to each other.
On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote:
> Shuah, would you be willing to entertain the notion of modifying your...
Kent, I'd like to gently remind you that Shuah is not speaking in her
personal capacity, but as a representative of the Code of Conduct
Committee[1], as she has noted in her signature. The Code of Conduct
Committee is appointed by, and reports to, the TAB[2], which is an
elected body composed of kernel developers and maintainers.
[1] https://www.kernel.org/code-of-conduct.html
[2] https://www.kernel.org/doc/html/latest/process/code-of-conduct-interpretation.html
Speaking purely in a personal capacity, and not as a member of the TAB
(although I do serve as vice-chair of that body) I am extremely
grateful of the work of Shuah and her colleages (listed in [1]). I
believe that their work is important in terms of establishing guard
rails regarding the minimum standards of behavior in our community.
If you look at the git history of the kernel sources, you will see
that a large number of your fellow maintainers assented to this
approach --- for example by providing their Acked-by in commit
1279dbeed36f ("Code of Conduct Interpretation: Add document explaining
how the Code of Conduct is to be interpreted").
Best regards,
- Ted
On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote: > On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote: > > Shuah, would you be willing to entertain the notion of modifying your... > > Kent, I'd like to gently remind you that Shuah is not speaking in her > personal capacity, but as a representative of the Code of Conduct > Committee[1], as she has noted in her signature. The Code of Conduct > Committee is appointed by, and reports to, the TAB[2], which is an > elected body composed of kernel developers and maintainers. > > [1] https://www.kernel.org/code-of-conduct.html > [2] https://www.kernel.org/doc/html/latest/process/code-of-conduct-interpretation.html > > Speaking purely in a personal capacity, and not as a member of the TAB > (although I do serve as vice-chair of that body) I am extremely > grateful of the work of Shuah and her colleages (listed in [1]). I > believe that their work is important in terms of establishing guard > rails regarding the minimum standards of behavior in our community. > > If you look at the git history of the kernel sources, you will see > that a large number of your fellow maintainers assented to this > approach --- for example by providing their Acked-by in commit > 1279dbeed36f ("Code of Conduct Interpretation: Add document explaining > how the Code of Conduct is to be interpreted"). Authored by Shuah, so I do seem to be talking to the correct person. Regardless, we talked privately about opening some sort of public discussion, this is pursuant to that.
On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote: > On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote: > > Shuah, would you be willing to entertain the notion of modifying your... > > Kent, I'd like to gently remind you that Shuah is not speaking in her > personal capacity, but as a representative of the Code of Conduct > Committee[1], as she has noted in her signature. The Code of Conduct > Committee is appointed by, and reports to, the TAB[2], which is an > elected body composed of kernel developers and maintainers. > > [1] https://www.kernel.org/code-of-conduct.html > [2] https://www.kernel.org/doc/html/latest/process/code-of-conduct-interpretation.html > > Speaking purely in a personal capacity, and not as a member of the TAB > (although I do serve as vice-chair of that body) I am extremely > grateful of the work of Shuah and her colleages (listed in [1]). I > believe that their work is important in terms of establishing guard > rails regarding the minimum standards of behavior in our community. > > If you look at the git history of the kernel sources, you will see > that a large number of your fellow maintainers assented to this > approach --- for example by providing their Acked-by in commit > 1279dbeed36f ("Code of Conduct Interpretation: Add document explaining > how the Code of Conduct is to be interpreted"). And Ted, I don't think you realize just how at my limit I am here with what I'm willing to put up with. I'm coming off of, what, 6+ months of getting roasted and having my work quested by Linus every pull request (and he did stop that, but not before it had done real damage, both completely changing the tone of public conversations and nearly scaring off people I've been trying to hire). It's gotten harder and harder, not easier, for me to get work done in other parts of the kernel; I gave up long ago in the block layer after the two people in charge there had repeatedly introduced silent data corruption bugs into core block layer code that I'd written, without CCing me, which I then had to debug, which they then ignored or put up ridiculous fights over when reported, and now have turned petty on subsequent block layer patches. Filesystem people have been good to work with, thank god, but now getting anything done in mm is looking like more and more of what the block layer has turned into. And you guys, because the system works for you, keep saying "nah, everything is fine and this has already been decided, you don't get any say". Meanwhile, I'm seeing more and more heisenbugs in the rest of the kernel as I'm stabilizing bcachefs, and my users are reporting the same - in compaction, in the block layer, now in the scheduler or locking, I'm not sure on the last one. And I'm sitting here wondering how the hell I'm supposed to debug my own code when I don't even have a stable base to work on anymore. This is turning into an utter farce.
On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote: > On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote: > > Shuah, would you be willing to entertain the notion of modifying your... > > Kent, I'd like to gently remind you that Shuah is not speaking in her > personal capacity, but as a representative of the Code of Conduct > Committee[1], as she has noted in her signature. The Code of Conduct > Committee is appointed by, and reports to, the TAB[2], which is an > elected body composed of kernel developers and maintainers. FYI, without taking any stance on the issue under debate here, I find the language used by Shuah on behalf of the Code of Conduct committee extremely patronising and passive aggressive. This might be because I do not have an American academic class background, but I would suggest that the code of conduct committee should educate itself about communicating without projecting this implicit cultural and class bias so blatantly.
On Thu, Nov 21, 2024 at 05:25:58AM +0100, Christoph Hellwig wrote: > On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote: > > On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote: > > > Shuah, would you be willing to entertain the notion of modifying your... > > > > Kent, I'd like to gently remind you that Shuah is not speaking in her > > personal capacity, but as a representative of the Code of Conduct > > Committee[1], as she has noted in her signature. The Code of Conduct > > Committee is appointed by, and reports to, the TAB[2], which is an > > elected body composed of kernel developers and maintainers. > > FYI, without taking any stance on the issue under debate here, I find the > language used by Shuah on behalf of the Code of Conduct committee > extremely patronising and passive aggressive. This might be because I > do not have an American academic class background, but I would suggest > that the code of conduct committee should educate itself about > communicating without projecting this implicit cultural and class bias > so blatantly. I wasn't even thinking about the language issue, but I think that's a very good point. We have a very strong culture of personal responsibility and taking responsibility for our work here, and I think that's one of the main thing that enables us to function and work together even when we're constantly butting heads. Broadly speaking, what that means to me is: I will justify, explain and give the reasoning for my actions if asked, and if I can't because I made a mistake, then I will make it right or at least acknowledge my mistake. So, the very passive language from Shuah (i.e. "I'm not the one 'doing' this, I'm just implementing the policy that we all decided on, even though I totally wrote and advocated for that policy") does seem problematic to me as well.
On Wed, Nov 20, 2024 at 05:55:10PM -0500, Kent Overstreet wrote: > On Wed, Nov 20, 2024 at 05:39:19PM -0500, Kent Overstreet wrote: > > On Wed, Nov 20, 2024 at 03:21:06PM -0700, Shuah Khan wrote: > > > On 11/20/24 14:37, Shuah Khan wrote: > > > > On 11/20/24 14:20, Kent Overstreet wrote: > > > > > On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote: > > > > > > On 11/20/24 13:34, Kent Overstreet wrote: > > > > > > > On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote: > > > > > > > > On 9/2/24 03:51, Kent Overstreet wrote: > > > > > > > > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: > > > > > > > > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > > > > > > > > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > > > > > > > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > > > > > > > > > > > > [...] > > > > > > > > > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > > > > > > > > > > > > > in the case of bugs, because that's going to be an improvement w.r.t. > > > > > > > > > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's > > > > > > > > > > > > > something that we can't guarantee won't happen in the wild - we WARN() > > > > > > > > > > > > > and try to handle the error as best we can. > > > > > > > > > > > > > > > > > > > > > > > > We have discussed that in a different email thread. And I have to say > > > > > > > > > > > > that I am not convinced that returning NULL makes a broken code much > > > > > > > > > > > > better. Why? Because we can expect that broken NOFAIL users will not have a > > > > > > > > > > > > error checking path. Even valid NOFAIL users will not have one because > > > > > > > > > > > > they _know_ they do not have a different than retry for ever recovery > > > > > > > > > > > > path. > > > > > > > > > > > > > > > > > > > > > > You mean where I asked you for a link to the discussion and rationale > > > > > > > > > > > you claimed had happened? Still waiting on that > > > > > > > > > > > > > > > > > > > > I am not your assistent to be tasked and search through lore archives. > > > > > > > > > > Find one if you need that. > > > > > > > > > > > > > > > > > > > > Anyway, if you read the email and even tried to understand what is > > > > > > > > > > written there rather than immediately started shouting a response then > > > > > > > > > > you would have noticed I have put actual arguments here. You are free to > > > > > > > > > > disagree with them and lay down your arguments. You have decided to > > > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > > > Yeah, enough of this insanity. > > > > > > > > > > > > > > > > > > > > so I do not think you are able to do that. Again... > > > > > > > > > > > > > > > > > > Michal, if you think crashing processes is an acceptable alternative to > > > > > > > > > error handling _you have no business writing kernel code_. > > > > > > > > > > > > > > > > > > You have been stridently arguing for one bad idea after another, and > > > > > > > > > it's an insult to those of us who do give a shit about writing reliable > > > > > > > > > software. > > > > > > > > > > > > > > > > > > You're arguing against basic precepts of kernel programming. > > > > > > > > > > > > > > > > > > Get your head examined. And get the fuck out of here with this shit. > > > > > > > > > > > > > > > > > > > > > > > > > Kent, > > > > > > > > > > > > > > > > Using language like this is clearly unacceptable and violates the > > > > > > > > Code of Conduct. This type of language doesn't promote respectful > > > > > > > > and productive discussions and is detrimental to the health of the > > > > > > > > community. > > > > > > > > > > > > > > > > You should be well aware that this type of language and personal > > > > > > > > attack is a clear violation of the Linux kernel Contributor Covenant > > > > > > > > Code of Conduct as outlined in the following: > > > > > > > > > > > > > > > > https://www.kernel.org/doc/html/latest/process/code-of-conduct.html > > > > > > > > > > > > > > > > Refer to the Code of Conduct and refrain from violating the Code of > > > > > > > > Conduct in the future. > > > > > > > > > > > > > > I believe Michal and I have more or less worked this out privately (and > > > > > > > you guys have been copied on that as well). > > > > > > > > > > > > Thank you for updating us on the behind the scenes work between you > > > > > > and Michal. > > > > > > > > > > > > I will make one correction to your statement, "you guys have been copied on > > > > > > that as well" - which is inaccurate. You have shared your email exchanges > > > > > > with Michal with us to let us know that the issue has been sorted out. > > > > > > > > > > That seems to be what I just said. > > > > > > > > > > > You might have your reasons and concerns about the direction of the code > > > > > > and design that pertains to the discussion in this email thread. You might > > > > > > have your reasons for expressing your frustration. However, those need to be > > > > > > worked out as separate from this Code of Conduct violation. > > > > > > > > > > > > In the case of unacceptable behaviors as defined in the Code of Conduct > > > > > > document, the process is to work towards restoring productive and > > > > > > respectful discussions. It is reasonable to ask for an apology to help > > > > > > us get to the goal as soon as possible. > > > > > > > > > > > > I urge you once again to apologize for using language that negatively impacts > > > > > > productive discussions. > > > > > > > > > > Shuah, I'd be happy to give you that after the discussion I suggested. > > > > > Failing that, I urge you to stick to what we agreed to last night. > > > The only thing we agreed upon is that you would respond the thread > > > to update your sorting things out with Michal. > > > > ...Shall I quote you? > > > > > > > > As for the discussion, I will repeat what I said in our conversation > > > that the discussion will be lot more productive after making amends > > > with the community. I stand by that assessment. > > > > > > I will also repeat what I said that the discussion and debate is > > > outside the scope of the current issue the Code of Conduct Committee > > > is trying to resolve. > > > > > > I didn't pick up on your desire to apologize after the discussion in > > > our conversation. > > > > > > Are you saying you will be happy to make amends with an apology after > > > the discussion and debate? > > > > Look, I just want to be done with this, so let me lay it all out as I > > see it, starting from the beginning of where things went off the rails > > between myself and Michal: > > > > Michal's (as well as Steve's) behaviour in the memory allocation > > profiling review process was, in my view, unacceptable (this included > > such things as crashing our LSF presentation with ideas they'd come up > > with that morning, and persistent dismissive axegrinding on the list). > > The project was nearly killed because of his inability to listen to the > > reasons for a design and being stubbornly stuck on his right to be heard > > as the maintainer. > > > > In my view, being a good maintainer has a lot more to do with > > stewardship and leadership, than stubbornly insisting for - whatever > > that was. In any event, that was where I came to the conclusion "I just > > cannot work that guy". > > > > Next up, PF_MEMALLOC_NORECLAIM over Michal's nack - I was wrong there, I > > only did it because it really seemed to me that Michal was axe grinding > > against _anything_ I was posting, but I still shouldn't have and that > > was more serious infraction in my view; that sort of thing causes a real > > loss of trust, and no I will not do it again. > > > > The subsequent PF_MEMALLOC_NORECLAIM discussion was such a trainwreck > > that I don't think I will go into it. Except to say that yes, if it > > makes you happy, I shouldn't have used that language and I won't do it > > again. > > > > But I do have to call out you, the CoC board's behaviour, and I think > > that ony fair since you call out other people's behaviour publically. > > > > Greg's behaviour when he approached me at Plumbers was beyond > > unprofessional, and since it wasn't exactly public and you guys have > > already heard about it privately I won't repeat exactly what happened, > > but it is an issue. > > > > Shuah, you weren't much better. > > > > There were concerns raised in the recent CoC enforcement thread, by > > someone with experience in such matters, that your aproach seemed > > extremeely heavy handed and I find myself in 100% agreement. > > > > The approach you take is that of a bad HR department: all about image, > > no understanding. When tensions arise, it's important get to the bottom > > of things, to at least try to take the time to listen with an open mind. > > People have real frustrations, and it's amazing what you can learn and > > what you can accomplish by having real conversations. > > > > But that's not what you guys do: you say "Ok, if someone's being too > > much of an asshole, we'll just be an even bigger asshole!". > > > > No. Cut that out. > > > > I've done the hard work of stepping in and building bridges when > > relations have broken down (on quite a large scale), so I'm offended by > > what you guys do. > > Now, I've said two things I'll do differently, or not do in the future. > > Michal, would you be willing to consider changing your approach a bit in > similar situations? Try to lead a little bit less by "I'm the mainainer, > my concerns must be addressed" and a little bit more by incorporating > the best of everyone's ideas, and showing respect to others who have > studied their problems, as you want to be respected as maintainer? > > Shuah, would you be willing to entertain the notion of modifying your > approach a bit as well? More in the direction of genuine conversations > and understanding, less of just following a process and slapping people > if they don't comply? > > We've got people in the community who are good at this sort of thing, > and might be willing to help if they were asked - it doesn't have to > just be you guys, and if we started encouraging this sort of thing it > could be a real learning experience for everyone. Also, let's rehash a bit: - This was worked out months ago, at Linus's behest - Scanning back through the original thread, I'm reminded by how very much not one sided this was - And there hasn't been any hope of intelligent conversation with Shuah, just broken record "we need you to do to this", so... This is no longer a process I can take seriously. Linus will pull my code, or he won't; it's out of my hands... Night all.
On Wed 20-11-24 17:39:09, Kent Overstreet wrote: > Michal's (as well as Steve's) behaviour in the memory allocation > profiling review process was, in my view, unacceptable (this included > such things as crashing our LSF presentation with ideas they'd come up > with that morning, and persistent dismissive axegrinding on the list). > The project was nearly killed because of his inability to listen to the > reasons for a design and being stubbornly stuck on his right to be heard > as the maintainer. Couple of entry points that might be helful for that. https://lore.kernel.org/all/YxBc1xuGbB36f8zC@dhcp22.suse.cz/ I have expressed my concerns and set expectations to move the work forward. I've had couple of back and forth with Suren about specifics of overhead assumptions from the stack unwinding IIRC. For the first non-RFC version my feedback was https://lore.kernel.org/all/ZFIMaflxeHS3uR%2FA@dhcp22.suse.cz/#t not really "maintenance burden only" but a request to show that alternative approaches have been explored. It was not particularly helpful that you had expected tracing people would implement the feature for you. https://lore.kernel.org/all/20230503092128.1a120845@gandalf.local.home/ Other people have also expressed that this is not completely impossible https://lore.kernel.org/all/ZFKNZZwC8EUbOLMv@slm.duckdns.org/ The rest of the email thread is mostly a combat zone that I have avoided participating as much as possible. I didn't have any reaction to v2 at all. v3 was aiming to be merged and I've stepped up as there was no single review at the time https://lore.kernel.org/all/Zctfa2DvmlTYSfe8@tiehlicka/ I admit that I was really open that I do not like the solution and I've said reasons for that. Allocator APIs have always been a large mess of macros, static inlines that makes it really far from free to maintain and alternative ways should be considered before going that route. I was also clear that support by MM people was necessary to get this merged. I have explicitly _not_ NAKed the series and backed off for you guys to gain that support. So essentially there was a clear outline for you and Sure how to achieve that. I would really like to hear from other maintainers. Is tnis really unacceptable maintainer behavior? I am OK to apologize but the above is in line of my understanding of how to ack properly. [...] > Next up, PF_MEMALLOC_NORECLAIM over Michal's nack - I was wrong there, I > only did it because it really seemed to me that Michal was axe grinding > against _anything_ I was posting, but I still shouldn't have and that > was more serious infraction in my view; that sort of thing causes a real > loss of trust, and no I will not do it again. Yes, this is simply unacceptable! Just to put full context. We are talking about eab0af905bfc ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). I have pushed back on that https://lore.kernel.org/all/Zbu_yyChbCO6b2Lj@tiehlicka/ Rather than searching for support elsewhere you have completely bypassed the whole MM community including Andrew and pushed that hidden in bcachefs PR. This is breaking trust model we are all relying on. I am cutting the rest of as something that is not really material to maintainership discussion.
On Thu, Nov 21, 2024 at 09:43:21AM +0100, Michal Hocko wrote: > On Wed 20-11-24 17:39:09, Kent Overstreet wrote: > > Michal's (as well as Steve's) behaviour in the memory allocation > > profiling review process was, in my view, unacceptable (this included > > such things as crashing our LSF presentation with ideas they'd come up > > with that morning, and persistent dismissive axegrinding on the list). > > The project was nearly killed because of his inability to listen to the > > reasons for a design and being stubbornly stuck on his right to be heard > > as the maintainer. > > Couple of entry points that might be helful for that. > https://lore.kernel.org/all/YxBc1xuGbB36f8zC@dhcp22.suse.cz/ > I have expressed my concerns and set expectations to move the > work forward. I've had couple of back and forth with Suren about > specifics of overhead assumptions from the stack unwinding IIRC. > > For the first non-RFC version my feedback was > https://lore.kernel.org/all/ZFIMaflxeHS3uR%2FA@dhcp22.suse.cz/#t > not really "maintenance burden only" but a request to show that alternative > approaches have been explored. It was not particularly helpful that you > had expected tracing people would implement the feature for you. > https://lore.kernel.org/all/20230503092128.1a120845@gandalf.local.home/ > Other people have also expressed that this is not completely impossible > https://lore.kernel.org/all/ZFKNZZwC8EUbOLMv@slm.duckdns.org/ > The rest of the email thread is mostly a combat zone that I have avoided > participating as much as possible. > > I didn't have any reaction to v2 at all. > > v3 was aiming to be merged and I've stepped up as there was no single > review at the time https://lore.kernel.org/all/Zctfa2DvmlTYSfe8@tiehlicka/ > > I admit that I was really open that I do not like the solution and I've > said reasons for that. Allocator APIs have always been a large mess of > macros, static inlines that makes it really far from free to maintain > and alternative ways should be considered before going that route. > > I was also clear that support by MM people was necessary to get this > merged. I have explicitly _not_ NAKed the series and backed off for you > guys to gain that support. > > So essentially there was a clear outline for you and Sure how to achieve > that. > > I would really like to hear from other maintainers. Is tnis really > unacceptable maintainer behavior? I am OK to apologize but the above is > in line of my understanding of how to ack properly. I wonder if I was reading too much into what you were saying in the off-list thread, when I said "argument to authority". Can you tell me if this might be closer to the mark? If I read you correctly, when you said you were "voicing your concerns", I took it as you being pushy because that was the effects: I need you to take me at my word, because you didn't see everything behind the scenes, that this did derail and nearly kill the project. But I should've been taking you at your word, that you just really were that concerned. I think the best way I can explain this issue is with an analogy to parenting: when we're parenting, our first and foremost job isn't really to make sure there's a roof, shelter, food - that part is easy in today's world. The main job really is to make sure that people feel safe, that they can go out and explore the world without being terrified. In order to do that, we have to master our own fears, we can't be projecting them all the time. Being a maintainer, or any kind of leader, is the exact same thing. My big lesson on this was watching Andrew, back during the process of merging MGLRU - I couldn't believe at the time how chill he was about it. But he understood the process, and he's a master at this. Part of mastering our fears in a group setting like this is learning to trust other people. It's not that your concerns didn't have any validity - but we were already doing as much as we could to address them, and you didn't show any trust that we, I especially, knew what we were doing. And that led to a _lot_ of wasted effort down blind alleys that I already knew weren't going to work. I think that may be what this is really about, sometimes you do have to trust that people know what they're doing; you share your knowledge if you have knowledge to share, otherwise you have to back off and let people do their work. Otherwise the whole thing breaks down and no one can get anything done. The main thing is I just want to ask yourself if you could have done anything better in the memory allocation profiling process. I don't need you to apologize for anything specific, if you can just try to work better together in the future that's all I need.
On Wed, Nov 20, 2024 at 05:39:09PM -0500, Kent Overstreet wrote: > There were concerns raised in the recent CoC enforcement thread, by > someone with experience in such matters, that your aproach seemed > extremeely heavy handed and I find myself in 100% agreement. Ehrm ... Yes, I did quite strongly criticize the new coc enforcement process. No, you would not appreciate what I'd do instead, not at all. -Sima
Hi Shuah, hi everyone. Shuah, I appreciate your effort to resolve the Code of Conduct issue. Also I make no judgment about the technical matter at hand. Basically I do not even have a clear idea on what it is about. So I am just commenting on the Code of Conduct enforcement process: Shuah Khan - 20.11.24, 23:21:06 MEZ: > I didn't pick up on your desire to apologize after the discussion in > our conversation. > > Are you saying you will be happy to make amends with an apology after > the discussion and debate? Do you really think that power-playing Kent into submission by doing a public apology is doing anything good to resolve the issue at hand? While it may not really compare to some of the wording Linus has used before having been convinced to change his behavior… I do not agree with the wording Kent has used. I certainly do not condone it. But this forced public apology approach in my point of view is very likely just to cement the division instead of heal it. While I publicly disagreed with Kent before, I also publicly disagree with this kind of Code of Conduct enforcement. I have seen similar patterns within the Debian community and in my point of view this lead to the loss of several Debian developers who contributed a lot to the project while leaving behind frustration and unresolved conflict. No amount of power play is going to resolve this. Just exercising authority is not doing any good in here. This needs mediation, not forced public humiliation. To me, honestly written, this whole interaction feels a bit like I'd imagine children may be fighting over a toy. With a majority of the children grouping together to single out someone who does not appear to fit in at first glance. I mean no offense with that. This is just the impression I got so far. The whole interaction just does not remind me of respectful communication between adult human beings. I have seen it with myself… in situations where it was challenging for me to access what I learned, for whatever reason, I had been acting similarly to a child. So really no offense meant. This is just an impression I got and wanted to mirror back to you for your consideration. I'd make three changes to the current approach regarding Kent's behavior: 1) Take it to private mediation. 2) Move it from mail to actually talking with one another. Resolving conflicts by mail exchange is hard. Maybe voice / video chat. Or meeting in person, in case it possible. In other words: *Talk to each other*! Mail is really very bad for things like that. 3) Assume good intentions! And the best first step for everyone involved may just be: Take a deep breath and let it sit for a while. Maybe there is something to learn from this for everyone involved, including myself. I have and claim no standing in kernel community. So take this for whatever it is worth for you. Best,
Hi Ted, hi everyone. Theodore Ts'o - 21.11.24, 00:47:59 MEZ: > If you look at the git history of the kernel sources, you will see > that a large number of your fellow maintainers assented to this > approach --- for example by providing their Acked-by in commit > 1279dbeed36f ("Code of Conduct Interpretation: Add document explaining > how the Code of Conduct is to be interpreted"). A large number of people agreeing on a process like this does not automatically make it an effective idea for resolving conflict. As I outlined in my other mail, this kind of forced public apology approach in my point of view is just serving to escalate matters. And actually it seems that exactly that just happened right now. See my other mail for suggestions on what I think might work better. A large number of people agreeing on anything does not automatically make it right. I'd suggest to avoid any kind of power-play like "we are more than you" in here. What would respectful communication would look like? What does happen if *everyone* involved considers how it might feel in the shoes of the other one? I have and claim no standing in kernel community. So take this for whatever it is worth for you. I won't be offended in case you disregard it. Also I do not need any reply. And again, just for clarity: I certainly do not condone of the tone Kent has used. Best,
On 11/20/24 21:25, Christoph Hellwig wrote: > On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote: >> On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote: >>> Shuah, would you be willing to entertain the notion of modifying your... >> >> Kent, I'd like to gently remind you that Shuah is not speaking in her >> personal capacity, but as a representative of the Code of Conduct >> Committee[1], as she has noted in her signature. The Code of Conduct >> Committee is appointed by, and reports to, the TAB[2], which is an >> elected body composed of kernel developers and maintainers. > > FYI, without taking any stance on the issue under debate here, I find the > language used by Shuah on behalf of the Code of Conduct committee > extremely patronising and passive aggressive. This might be because I > do not have an American academic class background, but I would suggest > that the code of conduct committee should educate itself about > communicating without projecting this implicit cultural and class bias > so blatantly. I tend to use formal style when I speak on behalf of the Code of Conduct committee. I would label it as very formal bordering on pedantic. Would you prefer less formal style in the CoC communication? I am open to educating myself. thanks, -- Shuah
On Thu, Nov 21, 2024 at 04:53:48PM -0700, Shuah Khan wrote: > On 11/20/24 21:25, Christoph Hellwig wrote: > > On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote: > > > On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote: > > > > Shuah, would you be willing to entertain the notion of modifying your... > > > > > > Kent, I'd like to gently remind you that Shuah is not speaking in her > > > personal capacity, but as a representative of the Code of Conduct > > > Committee[1], as she has noted in her signature. The Code of Conduct > > > Committee is appointed by, and reports to, the TAB[2], which is an > > > elected body composed of kernel developers and maintainers. > > > > FYI, without taking any stance on the issue under debate here, I find the > > language used by Shuah on behalf of the Code of Conduct committee > > extremely patronising and passive aggressive. This might be because I > > do not have an American academic class background, but I would suggest > > that the code of conduct committee should educate itself about > > communicating without projecting this implicit cultural and class bias > > so blatantly. > > I tend to use formal style when I speak on behalf of the Code of Conduct > committee. I would label it as very formal bordering on pedantic. > Would you prefer less formal style in the CoC communication? Personally, it's always easier to take requests from a human, than a faceless process that I have no input into. I'll always rebell against the latter, but I'll always work to help out or make things right with people in the community.
On Thu, Nov 21, 2024 at 12:49 AM Theodore Ts'o <tytso@mit.edu> wrote: > If you look at the git history of the kernel sources, you will see > that a large number of your fellow maintainers assented to this > approach --- for example by providing their Acked-by in commit > 1279dbeed36f ("Code of Conduct Interpretation: Add document explaining > how the Code of Conduct is to be interpreted"). 404 The correct reference is commit 79dbeed36f7335ec ("Code of Conduct Interpretation: Add document explaining how the Code of Conduct is to be interpreted"). Gr{oetje,eeting}s, Geert
On Thu, Nov 21, 2024 at 04:53:48PM -0700, Shuah Khan wrote: > I tend to use formal style when I speak on behalf of the Code of Conduct > committee. I would label it as very formal bordering on pedantic. > Would you prefer less formal style in the CoC communication? I would prefer a less passive agressive and more to the point one.
Kent Overstreet wrote: > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: [..] Kent, The Code of Conduct Committee received reports about your conduct in this email discussion. Link to email where the violation took place: https://lore.kernel.org/citv2v6f33hoidq75xd2spaqxf7nl5wbmmzma4wgmrwpoqidhj@k453tmq7vdrk Our community works on trust and respect and has agreed to abide by the Code of Conduct: Reference: https://docs.kernel.org/process/code-of-conduct.html The code of Conduct Committee has determined that your written abuse of another community member required action on your part to repair the damage to the individual and the community. You took insufficient action to restore the community's faith in having otherwise productive technical discussions without the fear of personal attacks. Following the Code of Conduct Interpretation process the TAB has approved has approved the following recommendation: -- Restrict Kent Overstreet's participation in the kernel development process during the Linux 6.13 kernel development cycle. - Scope: Decline all pull requests from Kent Overstreet during the Linux 6.13 kernel development cycle.
On Fri, Nov 22, 2024 at 01:48:42PM -0800, Dan Williams wrote: > Kent Overstreet wrote: > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > [..] > > Kent, > > The Code of Conduct Committee received reports about your conduct in > this email discussion. > > Link to email where the violation took place: > > https://lore.kernel.org/citv2v6f33hoidq75xd2spaqxf7nl5wbmmzma4wgmrwpoqidhj@k453tmq7vdrk > > Our community works on trust and respect and has agreed to abide by the > Code of Conduct: > > Reference: https://docs.kernel.org/process/code-of-conduct.html > > The code of Conduct Committee has determined that your written abuse > of another community member required action on your part to repair the > damage to the individual and the community. You took insufficient action > to restore the community's faith in having otherwise productive technical > discussions without the fear of personal attacks. > > Following the Code of Conduct Interpretation process the TAB has approved > has approved the following recommendation: > > -- Restrict Kent Overstreet's participation in the kernel development > process during the Linux 6.13 kernel development cycle. > > - Scope: Decline all pull requests from Kent Overstreet during > the Linux 6.13 kernel development cycle. Ok. Just to be clear on what this is about though, I'm going to post publically what I wrote Michal back in September. This is about a CoC board that on the one hand, doesn't wish to follow its own rules, and on the other - I can't even make sense of. From kent.overstreet@linux.dev Wed Sep 4 14:22:39 2024 Date: Wed, 4 Sep 2024 14:22:39 -0400 From: Kent Overstreet <kent.overstreet@linux.dev> To: Michal Hocko <mhocko@suse.com> Subject: Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Message-ID: <5qukfetlxkadrdjp443xlzbyjhw26l3zzgcdlmfkxgbkpkrv65@hobl73hoc5ip> References: <20240827061543.1235703-1-mhocko@kernel.org> <Zs6jFb953AR2Raec@dread.disaster.area> <ylycajqc6yx633f4sh5g3mdbco7zrjdc5bg267sox2js6ok4qb@7j7zut5drbyy> <ZtBzstXltxowPOhR@dread.disaster.area> <myb6fw5v2l2byxn4raxlaqozwfdpezdmn3mnacry3y2qxmdxtl@bxbsf4v4qbmg> <ZtUFaq3vD+zo0gfC@dread.disaster.area> <nawltogcoffous3zv4kd2eerrrwhihbulz7pi2qyfjvslp6g3f@j3qkqftra2qm> <ZtV6OwlFRu4ZEuSG@tiehlicka> <v664cj6evwv7zu3b77gf2lx6dv5sp4qp2rm7jjysddi2wc2uzl@qvnj4kmc6xhq> <ZtWH3SkiIEed4NDc@tiehlicka> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <ZtWH3SkiIEed4NDc@tiehlicka> Status: RO Content-Length: 4224 Lines: 88 On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote: > On Mon 02-09-24 04:52:49, Kent Overstreet wrote: > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote: > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote: > > > [...] > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL > > > > in the case of bugs, because that's going to be an improvement w.r.t. > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's > > > > something that we can't guarantee won't happen in the wild - we WARN() > > > > and try to handle the error as best we can. > > > > > > We have discussed that in a different email thread. And I have to say > > > that I am not convinced that returning NULL makes a broken code much > > > better. Why? Because we can expect that broken NOFAIL users will not have a > > > error checking path. Even valid NOFAIL users will not have one because > > > they _know_ they do not have a different than retry for ever recovery > > > path. > > > > You mean where I asked you for a link to the discussion and rationale > > you claimed had happened? Still waiting on that > > I am not your assistent to be tasked and search through lore archives. > Find one if you need that. > > Anyway, if you read the email and even tried to understand what is > written there rather than immediately started shouting a response then > you would have noticed I have put actual arguments here. You are free to > disagree with them and lay down your arguments. You have decided to > > [...] > > > Yeah, enough of this insanity. > > so I do not think you are able to do that. Again... BTW - (after getting emails for three different people about this, heh) I do want to apologize for things getting this heated the other day, but I need to also tell you why I reacted the way I did. Firstly, it's nothing personal: I'm not axe grinding against you (although you were a major source of frustration for myself and Suren in the memory allocation profiling discussions, and I hope you can recognize that as well). But I do take correctness issues very seriously, and I will get frosty or genuinely angry if they're being ignored or brushed aside. The reality as that experience, and to be frank standards of professionalism, do vary within the kernel community, and I have had some _outrageous_ fights over things as bad as silent data corruption bugs (introduced in code I wrote by people who did not CC me, no less; it was _bad_, and yes it has happened more than once). So - I am _not_ inclined to let things slide, even if it means being the asshole at times. Thankfully, most people aren't like that. Dave, Willy, Linus - we can be shouting at each other, but we still listen, and we know how not to take it personally and focus on the technical when there's something serious going on. Usually when one of us is shouting, you'll find there's a good reason and some history behind it, even if we also recognize the need to try to tone things down and not be _too_ much of an asshole. Linus was reminding me of that yesterday... So for the record: I'm not trying to roadblock you or anyone else, I'm just trying to make sure we all have shit that _works_. And I have been noticing you stepping up in discussions more, and I'd like to encourage that, if I may. MM has been lacking in strong technical leadership for a _long_ time - Andrew's great on the process side, he makes sure things move along, but we haven't had anyone who's trying to keep everything important in their heads, who's able to point out to people where their work fits in the larger picture, and there's been some messy things that have built up over time. And a word on technical leadership - it doesn't mean being the one who's "right" all the time, although it does involve a lot of saying "no" to people. The people who seem the smartest - it's not raw IQ that they've got (although that helps!), it's the ability to listen and quickly incorporate other people's ideas without drama or attachment, and the ability to maintain perspective; notice what people are blocked on, without getting stuck on it, and think about how it fits into the wider perspective. Cheers, Kent
diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c index 15fc41e63b6c..d151a2f28d12 100644 --- a/fs/bcachefs/fs.c +++ b/fs/bcachefs/fs.c @@ -231,9 +231,9 @@ static struct inode *bch2_alloc_inode(struct super_block *sb) BUG(); } -static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c) +static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c, gfp_t gfp) { - struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, GFP_NOFS); + struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, gfp); if (!inode) return NULL; @@ -245,7 +245,7 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c) mutex_init(&inode->ei_quota_lock); memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush)); - if (unlikely(inode_init_always(c->vfs_sb, &inode->v))) { + if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v, gfp))) { kmem_cache_free(bch2_inode_cache, inode); return NULL; } @@ -258,12 +258,10 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c) */ static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans) { - struct bch_inode_info *inode = - memalloc_flags_do(PF_MEMALLOC_NORECLAIM|PF_MEMALLOC_NOWARN, - __bch2_new_inode(trans->c)); + struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWAIT); if (unlikely(!inode)) { - int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c)) ? 0 : -ENOMEM); + int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c, GFP_NOFS)) ? 0 : -ENOMEM); if (ret && inode) { __destroy_inode(&inode->v); kmem_cache_free(bch2_inode_cache, inode); @@ -328,7 +326,7 @@ __bch2_create(struct mnt_idmap *idmap, if (ret) return ERR_PTR(ret); #endif - inode = __bch2_new_inode(c); + inode = __bch2_new_inode(c, GFP_NOFS); if (unlikely(!inode)) { inode = ERR_PTR(-ENOMEM); goto err; diff --git a/fs/inode.c b/fs/inode.c index 86670941884b..a2aabbcffbe4 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -153,7 +153,7 @@ static int no_open(struct inode *inode, struct file *file) * These are initializations that need to be done on every inode * allocation as the fields are not initialised by slab allocation. */ -int inode_init_always(struct super_block *sb, struct inode *inode) +int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp) { static const struct inode_operations empty_iops; static const struct file_operations no_open_fops = {.open = no_open}; @@ -230,14 +230,14 @@ int inode_init_always(struct super_block *sb, struct inode *inode) #endif inode->i_flctx = NULL; - if (unlikely(security_inode_alloc(inode))) + if (unlikely(security_inode_alloc(inode, gfp))) return -ENOMEM; this_cpu_inc(nr_inodes); return 0; } -EXPORT_SYMBOL(inode_init_always); +EXPORT_SYMBOL(inode_init_always_gfp); void free_inode_nonrcu(struct inode *inode) { diff --git a/include/linux/fs.h b/include/linux/fs.h index fd34b5755c0b..d46ca71a7855 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3027,7 +3027,12 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence); extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence); -extern int inode_init_always(struct super_block *, struct inode *); +extern int inode_init_always_gfp(struct super_block *, struct inode *, gfp_t); +static inline int inode_init_always(struct super_block *sb, struct inode *inode) +{ + return inode_init_always_gfp(sb, inode, GFP_NOFS); +} + extern void inode_init_once(struct inode *); extern void address_space_init_once(struct address_space *mapping); extern struct inode * igrab(struct inode *); diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index a2ade0ffe9e7..b08472d64765 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -150,6 +150,6 @@ extern struct lsm_info __start_early_lsm_info[], __end_early_lsm_info[]; __used __section(".early_lsm_info.init") \ __aligned(sizeof(unsigned long)) -extern int lsm_inode_alloc(struct inode *inode); +extern int lsm_inode_alloc(struct inode *inode, gfp_t gfp); #endif /* ! __LINUX_LSM_HOOKS_H */ diff --git a/include/linux/security.h b/include/linux/security.h index 1390f1efb4f0..7c6b9b038a0d 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -336,7 +336,7 @@ int security_dentry_create_files_as(struct dentry *dentry, int mode, struct cred *new); int security_path_notify(const struct path *path, u64 mask, unsigned int obj_type); -int security_inode_alloc(struct inode *inode); +int security_inode_alloc(struct inode *inode, gfp_t gfp); void security_inode_free(struct inode *inode); int security_inode_init_security(struct inode *inode, struct inode *dir, const struct qstr *qstr, @@ -769,7 +769,7 @@ static inline int security_path_notify(const struct path *path, u64 mask, return 0; } -static inline int security_inode_alloc(struct inode *inode) +static inline int security_inode_alloc(struct inode *inode, gfp_t gfp) { return 0; } diff --git a/security/security.c b/security/security.c index 8cee5b6c6e6d..3581262da5ee 100644 --- a/security/security.c +++ b/security/security.c @@ -660,14 +660,14 @@ static int lsm_file_alloc(struct file *file) * * Returns 0, or -ENOMEM if memory can't be allocated. */ -int lsm_inode_alloc(struct inode *inode) +int lsm_inode_alloc(struct inode *inode, gfp_t gfp) { if (!lsm_inode_cache) { inode->i_security = NULL; return 0; } - inode->i_security = kmem_cache_zalloc(lsm_inode_cache, GFP_NOFS); + inode->i_security = kmem_cache_zalloc(lsm_inode_cache, gfp); if (inode->i_security == NULL) return -ENOMEM; return 0; @@ -1582,9 +1582,9 @@ int security_path_notify(const struct path *path, u64 mask, * * Return: Return 0 if operation was successful. */ -int security_inode_alloc(struct inode *inode) +int security_inode_alloc(struct inode *inode, gfp_t gfp) { - int rc = lsm_inode_alloc(inode); + int rc = lsm_inode_alloc(inode, gfp); if (unlikely(rc)) return rc;