Message ID | 163551653404.1877519.12363794970541005441.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | fscache: Replace and remove old I/O API | expand |
On Fri, Oct 29, 2021 at 7:09 AM David Howells <dhowells@redhat.com> wrote: > > (1) A simple fallback API is added that can read or write a single page > synchronously. The functions for this have "fallback" in their names > as they have to be removed at some point. David, I still don't understand WHY. I read the explanations in the commits, and that didn't help either. Why, of why, do you insist of adding this intermediate interface that is already documented to "must be removed" at the point it is even added? What's the point of adding garbage that is useless in the long run? Why is the first step not just "remove fscache"? Why is there this addition of the "deprecated" interface - that you have now renamed "fallback"? I agree that "fallback" is a less annoying name, so that renaming is an improvement, but WHY? I absolutely detested your whole "move old garbage around before removal", and I also detest this "add new garbage that will be removed". What's the point? Why isn't the fix just "remove CONFIG_FSCACHE and all the code". You already *HAVE* the "fallback" code - it's all that #else /* CONFIG_NFS_FSCACHE */ static inline int nfs_fscache_register(void) { return 0; } static inline void nfs_fscache_unregister(void) {} ... stuff in <nfs/fscache.h> and friends. So why do you need _new_ fallback code, when CONFIG_FSCACHE already exists and already has a "this disables fscache"? Maybe there is some really good reason, but that really good reason sure as hell isn't documented anywhere, and I really don't see the point. So let me say this again: - it would be much better if you could incrementally just improve the existing FSCACHE so that it just _works_ all the time, and fixes the problems in it, and a bisection works, and there is no flag-day. - but dammit, if you have to have a flag-day, then there is NO POINT in all this "move the old code around before moving it", or "add a fallback interface before removing it again". Oh, I can understand wanting to keep the header files around in case the interfaces end up being similar enough in the end that it all matters. But I don't understand why you do this kind of crud: fs/cachefiles/io.c | 28 ++++++++- fs/fscache/io.c | 137 +++++++++++++++++++++++++++++++++++++++------ when the neither of those directories will ever even be *compiled* if CONFIG_FSCACHE isn't true (because CACHEFILES has a "depends on FSCACHE"). See my argument? If FSCACHE isn't usable during the transition, then don't make these kinds of pointless code movement or creation things that are just dead. There's absolutely no point in having some "fallback" interface that allows people to test some configuration that simply isn't relevant. It doesn't help anything. It just adds more noise and more configurations, and no actual value that I can see. It doesn't help bisectability: if some bug ever bisects to the fallback code, what does that tell us? Nothing. Sure, it trivially tells us the fallback code was buggy, but since the fallback code has been removed afterwards, the _value_ of that information is nil, zilch, nada. It's not "information", it's just "worthless data". And hey, maybe there's some issue that I don't understand, and I don't see. But if there is some subtle value here, it should have been documented. So I say exactly the same thing I said last time: if the old fscache code is not usable, and you can't incrementally fix it so that it works all the time, then JUST REMOVE IT ALL. Moving it elsewhere before the removal is only pointless noise. But adding some fallback intermediate code before removal is ALSO just pointless noise. Doing a flag-day with "switch from A to B" is already painful and wrong. I don't like it. But I like it even _less_, if it's a "switch from A to B to C". If you do want t9o have a "halfway state", the only halfway state that makes sense to me is something like (a) make all the changes to the old FSCACHE - keeping it all _working_ during this phase - to make it have the same _interfaces_ as the new fscache will have. (b) then remove the old FSCACHE entirely (c) then plop in the new FSCACHE But note how there was no "fallback" stage anywhere. No code that lies around dead at any point. At each point it was either all working old or all working new (or nothing at all). Yes, in this case that "step (a)" is extra work and you're basically modifying code that you know will be removed, but the advantage now is - at least the fscache _users_ are being modified while the old and tested world is still working, and the interface change is "bisectable" in that sense. That's useful in itself. - if it turns out that people have problems with the new generation FSCACHE, they can reverse steps (b) and (c) without having to touch and revert all the other filesystems changes. IOW, if a "same interfaces" state exists, that's fine. But for it to make sense, those same interfaces have to be actually _useful_, not some fallback code that is neither the old nor the new. And maybe you can't do that "step (a)" because the interfaces are part of the fundamental problem with the old FSCACHE. But if you drop (a), then don't add some stage between (b) and (c), because it's not helpful. And again, maybe I'm missing something. But really, I don't see why this "remove old FSCACHE" stage should *ever* make any modifications to fs/fscache/* and fs/cachefiles/* when disabling the config option means that it just won't get built at all. Linus
Linus Torvalds <torvalds@linux-foundation.org> wrote: > - it would be much better if you could incrementally just improve the > existing FSCACHE so that it just _works_ all the time, and fixes the > problems in it, and a bisection works, and there is no flag-day. As I stated in the cover letters, the advent of the kiocb and tmpfile interfaces provide a way to massively simplify the way fscache and cachefiles work - and, as a result, remove over five thousand lines of code. With the changes I'm looking at making, I can get rid of the object state machine as it stands and the operation scheduling - which means all of the code in: fs/fscache/object.c fs/fscache/operation.c gets deleted along with: fs/fscache/page.c fs/cachefiles/rdwr.c when I remove the deprecated I/O code (which is what this patchset does). A large chunk of code in fs/fscache/cookie.c gets removed and replaced too Further, I've been looking at simplifying the index management with an eye to completely replacing the cache backend with something more akin to a tagged cache with fixed-size storage units. This also allows the cachefiles code to be simplified a bit too. This means bisection is of limited value and why I'm looking at a 'flag day'. There is one particular problem that *this* patchset was intended to address: I want to convert the filesystems that are going to be accessing fscache to use netfslib, but they're not all there yet so that there's one common body of code that does VM interface, cache I/O and content crypto (eg. fscrypt). Jeff and I have converted afs and ceph already and I've got a conversion for 9p in this patchset. I have a partial patch for cifs/smb and have been discussing that with Steve and Shyam. Dave Wysochanski has a set of patches for nfs, but there's pushback on it from the nfs maintainers. This particular patchset is intended to enable removal of the old I/O routines by changing nfs and cifs to use a "fallback" method to use the new kiocb-using API and thus allow me to get on with the rest of it. However, if you would rather I just removed all of fscache and (most of[*]) cachefiles, that I can do. I have the patches arrayed in a way that makes them easier to explain logically and, hopefully, easier to review. It would, however, leave any netfs that isn't converted to use netfslib unable to use caching until converted. David [*] The code that deals with the cachefilesd UAPI will be mostly unchanged.
On Fri, Oct 29, 2021 at 10:55 AM David Howells <dhowells@redhat.com> wrote: > > This means bisection is of limited value and why I'm looking at a 'flag day'. So I'm kind of resigned to that by now, I just wanted to again clarify that the rest of my comments are about "if we have to deal with a flag dat anyway, then make it as simple and straightforward as possible, rather than adding extra steps that are only noise". > [ Snip explanation of netfslib ] > This particular patchset is intended to enable removal of the old I/O routines > by changing nfs and cifs to use a "fallback" method to use the new kiocb-using > API and thus allow me to get on with the rest of it. Ok, at least that explains that part. But: > However, if you would rather I just removed all of fscache and (most of[*]) > cachefiles, that I can do. I assume and think that if you just do that part first, then the "convert to netfslib" of afs and ceph at that later stage will mean that the fallback code will never be needed? So I would much prefer that streamlined model over one that adds that temporary intermediate stage only for it to be deleted. Linus
Linus Torvalds <torvalds@linux-foundation.org> wrote: > But: > > > However, if you would rather I just removed all of fscache and (most of[*]) > > cachefiles, that I can do. > > I assume and think that if you just do that part first, then the > "convert to netfslib" of afs and ceph at that later stage will mean > that the fallback code will never be needed? The netfslib coversions for afs and ceph are already in your tree and I have a patch here to do that for 9p (if you're willing to take that in the upcoming merge window?). The issue is cifs[*] and nfs. I could leave caching in those disabled, pending approved patches for those filesystems. This would mean that I wouldn't need the fallback code. An alternative is that I could move the "fallback" code into fs/nfs/fscache.c and fs/cifs/fscache.c if that would be easier and merge it into the functions there. The problem will come when the cache wants to do I/O in larger units than page size to suit its own block size[**]. David [*] As it happens, it turns out that cifs seems to have a bug in it that causes the entire cache for a superblock to be discarded each time that superblock is mounted. [**] At some point the cache *has* to start keeping track of what data it is holding rather than relying on bmap/SEEK_DATA/SEEK_HOLE to get round the extent-bridging problem. I'm trying to take a leaf out of the book of other caching filesystems and use larger block sizes (e.g. 256K) to reduce the overhead of cache metadata.