Message ID | lfypw4vqq3rkohlh2iwhub3igjopdy26lfforfcjws2dfizk7d@32yk5dnemi4u (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] bcachefs updates for 6.9 | expand |
On Tue, 12 Mar 2024 at 18:10, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > Hi Linus, few patches for you - plus a simple merge conflict with VFS > changes: The conflicts are trivial. The "make random bcachefs code be a library function" stuff I looked at, decided is senseless, and ended up meaning that I'm not pulling this without a lot more explanation (and honestly, I don't think the explanations would hold water). That "stdio_redirect_printf()" and darray_char stuff is just horrendous interfaces with no explanations. The interfaces are disgusting. Keep it in your own code where it belongs, don't try to make it some generic library thing. And if you *do* make it a library thing, it needs to be (a) much more explained (b) have much saner naming, and fewer disgusting and completely nonsensical interfaces ("DARRAY()"). And no, finding one other filesystem to share this kind of code is not sufficient to try to claim it's a sane interface and sane naming. But the main dealbreaker is the insane math. And dammit, we talked about the idiotic "mean and variance" garbage long ago. It was wrong back then, it's *still* wrong. You didn't explain why it couldn't use the *much* simpler MAD (median absolute deviation) instead of using variance. That bad decision directly results in that pointless use of overly complex 128-bit math. I called it insanely over-engineered back then, and as far as I can tell, absolutely *NOTHING* has changed apart from some slight type name details. As long as you made it some kind of bcachefs-only thing, I don't mind. But now you're trying to push this garbage as some kind of generic library code that others would use, and that immediately means that I *do* mind insanely overengineered interfaces. The time_stats stuff otherwise looks at leask like a sane interface with names and uses, but the use of that horrendous infrastructure scuttles it. Linus
On Wed, Mar 13, 2024 at 01:47:59PM -0700, Linus Torvalds wrote: > On Tue, 12 Mar 2024 at 18:10, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > Hi Linus, few patches for you - plus a simple merge conflict with VFS > > changes: > > The conflicts are trivial. > > The "make random bcachefs code be a library function" stuff I looked > at, decided is senseless, and ended up meaning that I'm not pulling > this without a lot more explanation (and honestly, I don't think the > explanations would hold water). > > That "stdio_redirect_printf()" and darray_char stuff is just > horrendous interfaces with no explanations. The interfaces are > disgusting. It's a bidirectional pipe between a kthread and an fd. Not sure what's complicated about that? > And if you *do* make it a library thing, it needs to be > > (a) much more explained > > (b) have much saner naming, and fewer disgusting and completely > nonsensical interfaces ("DARRAY()"). DARRAY() is just a dynamic array, aka a c++ vector; we open code those so much it's _stupid_. I wouldn't be opposed to changing the name to something more standard (Rust calls it a vector too); I started out with the CCAN version and rewrote it later for hte kernel. > And no, finding one other filesystem to share this kind of code is not > sufficient to try to claim it's a sane interface and sane naming. > > But the main dealbreaker is the insane math. > > And dammit, we talked about the idiotic "mean and variance" garbage > long ago. It was wrong back then, it's *still* wrong. > > You didn't explain why it couldn't use the *much* simpler MAD (median > absolute deviation) instead of using variance. I most certainly did. I liked your MAD suggestion, but the catch was that we need an exponentially weighted version, not just the standard version, and I haven't seen an derivation of exponentially weighted MAD and doing that is a bit above my statistical pay grade. I explained all this at the time. Besides that, the existing code works fine, the u128 stuff is right out of Knuth (divide is the only even vaguely tricky one), and it's nicely self contained. It's fine. > I called it insanely over-engineered back then, and as far as I can > tell, absolutely *NOTHING* has changed apart from some slight type > name details. > > As long as you made it some kind of bcachefs-only thing, I don't mind. > > But now you're trying to push this garbage as some kind of generic > library code that others would use, and that immediately means that I > *do* mind insanely overengineered interfaces. > > The time_stats stuff otherwise looks at leask like a sane interface > with names and uses, but the use of that horrendous infrastructure > scuttles it. Well, that leaves us at a bit of an impasse then because Darrick wants this stuff for XFS (he was discovering useful stuff with it pretty much right away) and I'm just not doing a MAD conversion, sorry. I'm just being practical here, I like MAD in principle but that's too far outside my wheelhouse. Maybe we can get someone else interested? I have a feeling Peter could whip it out in about 5 minutes...
On Wed, 13 Mar 2024 at 14:34, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > I liked your MAD suggestion, but the catch was that we need an > exponentially weighted version, The code for the weighted version literally doesn't change. The variance value is different, but the difference between MAD and standard deviation is basically just a constant factor (which will be different for different distributions, but so what? Any _particular_ case will have a particular distribution). So why would a constant factor make _any_ difference for any exponential weighting? Anyway, feel free to keep your code in bcachefs. And maybe xfs even wants to copy that code. I don't care, it seems stupid, but that's a filesystem choice. But if we're making it a generic kernel library, it needs to be sane. Not making people do 64-bit square roots and 128-bit divides just for a random statistical element. Linus
On Wed, Mar 13, 2024 at 02:51:38PM -0700, Linus Torvalds wrote: > On Wed, 13 Mar 2024 at 14:34, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > I liked your MAD suggestion, but the catch was that we need an > > exponentially weighted version, > > The code for the weighted version literally doesn't change. Well, no, and there's another problem I can't believe I missed until now. MAD is defined as median of the absolute deviations, not mean, and you can't compute a median incrementally. So MAD doesn't work here at all.
On Wed, Mar 13, 2024 at 06:22:57PM -0400, Kent Overstreet wrote: > On Wed, Mar 13, 2024 at 02:51:38PM -0700, Linus Torvalds wrote: > > On Wed, 13 Mar 2024 at 14:34, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > > I liked your MAD suggestion, but the catch was that we need an > > > exponentially weighted version, > > > > The code for the weighted version literally doesn't change. > > Well, no, and there's another problem I can't believe I missed until > now. MAD is defined as median of the absolute deviations, not mean, and > you can't compute a median incrementally. > > So MAD doesn't work here at all. Sorry, you were talking about mean absolute deviation; that does work here.
On Wed, 13 Mar 2024 at 15:28, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > Sorry, you were talking about mean absolute deviation; that does work > here. Yes, I meant mean, not median. But the confusion is my fault - I wrote MAD and then to "explain" that, I put "median" in my own email - so you read it right the first time, and it was just me being sloppy and confusing things. They are both called MAD in their own contexts, and they are much too easy to confuse. My bad, Linus
diff --cc fs/bcachefs/super-io.c index 010daebf987b,bd64eb68e84a..000000000000 --- a/fs/bcachefs/super-io.c