mbox series

[v2,00/53] fscache: Rewrite index API and management system

Message ID 163492911924.1038219.13107463173777870713.stgit@warthog.procyon.org.uk (mailing list archive)
Headers show
Series fscache: Rewrite index API and management system | expand

Message

David Howells Oct. 22, 2021, 6:58 p.m. UTC
Here's a set of patches implements a rewrite of the fscache driver,
significantly simplifying the code compared to what's upstream, removing
the complex operation scheduling and object state machine in favour of
something much smaller and simpler.  It is built on top of the set of
patches that removes the old API[1].

[!] Note: I've reworked the patches at Jeff Layton's request so that the
    old fscache and cachefiles drivers are moved aside, but retained, and
    the new drivers are built up from empty directories.  I've made it so
    that the end result is practically the same and can be directly diff'd
    against the first version.

    This allows the filesystems to retain access to the old drivers for the
    moment, though you have to decide at configuration time whether you
    want the old drivers or the new.

    The git branch mentioned below also contains a patch to remove the old
    drivers (and disable ceph as that's the only one I don't have patches
    for the conversion of - Jeff is working on that).


The operation scheduling API was intended to handle sequencing of cache
operations, which were all required (where possible) to run asynchronously
in parallel with the operations being done by the network filesystem, while
allowing the cache to be brought online and offline and interrupt service
with invalidation.

However, with the advent of the tmpfile capacity in the VFS, an opportunity
arises to do invalidation much more easily, without having to wait for I/O
that's actually in progress: Cachefiles can simply cut over its file
pointer for the backing object attached to a cookie and abandon the
in-progress I/O, dismissing it upon completion.

Future work there would involve using Omar Sandoval's vfs_link() with
AT_LINK_REPLACE[2] to allow an extant file to be displaced by a new hard
link from a tmpfile as currently I have to unlink the old file first.

These patches can also simplify the object state handling as I/O operations
to the cache don't all have to be brought to a stop in order to invalidate
a file.  To that end, and with an eye on to writing a new backing cache
model in the future, I've taken the opportunity to simplify the indexing
structure.

I've separated the index cookie concept from the file cookie concept by
type now.  The former is now called a "volume cookie" (struct
fscache_volume) and there is a container of file cookies.  There are then
just the two levels.  All the index cookieage is collapsed into a single
volume cookie, and this has a single printable string as a key.  For
instance, an AFS volume would have a key of something like
"afs,example.com,1000555", combining the filesystem name, cell name and
volume ID.  This is freeform, but must not have '/' chars in it.

I've also eliminated all pointers back from fscache into the network
filesystem.  This required the duplication of a little bit of data in the
cookie (cookie key, coherency data and file size), but it's not actually
that much.  This gets rid of problems with making sure we keep netfs data
structures around so that the cache can access them.

I have changed afs throughout the patch series, but I also have patches for
9p, nfs and cifs.  Jeff Layton is handling ceph support.


BITS THAT MAY BE CONTROVERSIAL
==============================

There are some bits I've added that may be controversial:

 (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check if
     a files is already being used by some other kernel service (e.g. a
     duplicate cachefiles cache in the same directory) and reject it if it
     is.  This isn't entirely necessary, but it helps prevent accidental
     data corruption.

     I don't want to use S_SWAPFILE as that has other effects, but quite
     possibly swapon() should set S_KERNEL_FILE too.

     Note that it doesn't prevent userspace from interfering, though
     perhaps it should.

 (2) Cachefiles wants to keep the backing file for a cookie open whilst we
     might need to write to it from network filesystem writeback.  The
     problem is that the network filesystem unuses its cookie when its file
     is closed, and so we have nothing pinning the cachefiles file open and
     it will get closed automatically after a short time to avoid
     EMFILE/ENFILE problems.

     Reopening the cache file, however, is a problem if this is being done
     due to writeback triggered by exit().  Some filesystems will oops if
     we try to open a file in that context because they want to access
     current->fs or suchlike.

     To get around this, I added the following:

     (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
     	 filesystem inode to indicate that we have a usage count on the
     	 cookie caching that inode.

     (B) A flag in struct writeback_control, unpinned_fscache_wb, that is
     	 set when __writeback_single_inode() clears the last dirty page
     	 from i_pages - at which point it clears I_PINNING_FSCACHE_WB and
     	 sets this flag.

	 This has to be done here so that clearing I_PINNING_FSCACHE_WB can
	 be done atomically with the check of PAGECACHE_TAG_DIRTY that
	 clears I_DIRTY_PAGES.

     (C) A function, fscache_set_page_dirty(), which if it is not set, sets
     	 I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the
     	 cache resources.

     (D) A function, fscache_unpin_writeback(), to be called by
     	 ->write_inode() to unuse the cookie.

     (E) A function, fscache_clear_inode_writeback(), to be called when the
     	 inode is evicted, before clear_inode() is called.  This cleans up
     	 any lingering I_PINNING_FSCACHE_WB.

     The network filesystem can then use these tools to make sure that
     fscache_write_to_cache() can write locally modified data to the cache
     as well as to the server.

     For the future, I'm working on write helpers for netfs lib that should
     allow this facility to be removed by keeping track of the dirty
     regions separately - but that's incomplete at the moment and is also
     going to be affected by folios, one way or another, since it deals
     with pages.


Changes
=======
ver #2)
  - Fix fscache_unuse_cookie() to use atomic_dec_and_lock() to avoid a
    potential race.
  - Fix a number of oopses due to the cache not withdrawing the object by
    the correct procedure upon lookup failure.
  - Disable a debugging statement.


These patches can be found also on:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite-indexing-2

David

Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/cover.1580251857.git.osandov@fb.com/ [2]
Link: https://lore.kernel.org/r/163456861570.2614702.14754548462706508617.stgit@warthog.procyon.org.uk/ # v1

---
Dave Wysochanski (1):
      nfs: Convert to new fscache volume/cookie API

David Howells (52):
      fscache_old: Move the old fscache driver to one side
      fscache_old: Rename CONFIG_FSCACHE* to CONFIG_FSCACHE_OLD*
      cachefiles_old:  Move the old cachefiles driver to one side
      cachefiles_old: Rename CONFIG_CACHEFILES* to CONFIG_CACHEFILES_OLD*
      netfs: Display the netfs inode number in the netfs_read tracepoint
      netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd space
      fscache: Introduce new driver
      fscache: Implement a hash function
      fscache: Implement cache registration
      fscache: Implement volume registration
      fscache: Implement cookie registration
      fscache: Implement cache-level access helpers
      fscache: Implement volume-level access helpers
      fscache: Implement cookie-level access helpers
      fscache: Implement functions add/remove a cache
      fscache: Provide and use cache methods to lookup/create/free a volume
      fscache: Add a function for a cache backend to note an I/O error
      fscache: Implement simple cookie state machine
      fscache: Implement cookie user counting and resource pinning
      fscache: Implement cookie invalidation
      fscache: Provide a means to begin an operation
      fscache: Provide read/write stat counters for the cache
      fscache: Provide a function to let the netfs update its coherency data
      fscache: Implement I/O interface
      fscache: Provide fallback I/O functions
      vfs, fscache: Implement pinning of cache usage for writeback
      fscache: Provide a function to note the release of a page
      fscache: Provide a function to resize a cookie
      cachefiles: Introduce new driver
      cachefiles: Add some error injection support
      cachefiles: Define structs
      cachefiles: Add a couple of tracepoints for logging errors
      cachefiles: Add I/O error reporting macros
      cachefiles: Provide a function to check how much space there is
      cachefiles: Implement a function to get/create a directory in the cache
      cachefiles: Implement daemon UAPI and cache registration
      cachefiles: Implement volume support
      cachefiles: Implement data storage object handling
      cachefiles: Implement begin and end I/O
      cachefiles: Implement the I/O routines
      afs: Handle len being extending over page end in write_begin/write_end
      afs: Fix afs_write_end() to handle len > page size
      afs: Make afs_write_begin() return the THP subpage
      afs: Convert afs to use the new fscache API
      afs: Copy local writes to the cache when writing to the server
      afs: Skip truncation on the server of data we haven't written yet
      afs: Add synchronous O_DIRECT support
      9p: Use fscache indexing rewrite and reenable caching
      9p: Copy local writes to the cache when writing to the server
      cifs: Support fscache indexing rewrite (untested)
      fscache, cachefiles: Display stats of no-space events
      fscache, cachefiles: Display stat of culling events


 fs/9p/cache.c                         |  184 +----
 fs/9p/cache.h                         |   25 +-
 fs/9p/v9fs.c                          |   14 +-
 fs/9p/v9fs.h                          |   13 +-
 fs/9p/vfs_addr.c                      |   55 +-
 fs/9p/vfs_dir.c                       |   11 +
 fs/9p/vfs_file.c                      |    7 +-
 fs/9p/vfs_inode.c                     |   24 +-
 fs/9p/vfs_inode_dotl.c                |    3 +-
 fs/9p/vfs_super.c                     |    3 +
 fs/Kconfig                            |    4 +-
 fs/Makefile                           |    4 +-
 fs/afs/Makefile                       |    3 -
 fs/afs/cache.c                        |   68 --
 fs/afs/cell.c                         |   12 -
 fs/afs/file.c                         |   83 +-
 fs/afs/fsclient.c                     |    2 +-
 fs/afs/inode.c                        |  101 ++-
 fs/afs/internal.h                     |   37 +-
 fs/afs/main.c                         |   14 -
 fs/afs/super.c                        |    1 +
 fs/afs/volume.c                       |   15 +-
 fs/afs/write.c                        |  170 ++++-
 fs/cachefiles/Kconfig                 |    7 +
 fs/cachefiles/Makefile                |    3 +
 fs/cachefiles/bind.c                  |  190 +++--
 fs/cachefiles/daemon.c                |   40 +-
 fs/cachefiles/error_inject.c          |   46 ++
 fs/cachefiles/interface.c             |  662 +++++++---------
 fs/cachefiles/internal.h              |  203 +++--
 fs/cachefiles/io.c                    |  315 +++++---
 fs/cachefiles/key.c                   |  205 +++--
 fs/cachefiles/main.c                  |   22 +-
 fs/cachefiles/namei.c                 |  983 ++++++++++--------------
 fs/cachefiles/security.c              |    2 +-
 fs/cachefiles/volume.c                |  128 ++++
 fs/cachefiles/xattr.c                 |  369 +++------
 fs/cachefiles_old/Kconfig             |   25 +
 fs/cachefiles_old/Makefile            |   17 +
 fs/cachefiles_old/bind.c              |  278 +++++++
 fs/cachefiles_old/daemon.c            |  748 ++++++++++++++++++
 fs/cachefiles_old/interface.c         |  557 ++++++++++++++
 fs/cachefiles_old/internal.h          |  312 ++++++++
 fs/cachefiles_old/io.c                |  446 +++++++++++
 fs/cachefiles_old/key.c               |  155 ++++
 fs/cachefiles_old/main.c              |   94 +++
 fs/cachefiles_old/namei.c             | 1018 +++++++++++++++++++++++++
 fs/cachefiles_old/security.c          |  112 +++
 fs/cachefiles_old/xattr.c             |  324 ++++++++
 fs/ceph/Kconfig                       |    2 +-
 fs/cifs/Makefile                      |    2 +-
 fs/cifs/cache.c                       |  105 ---
 fs/cifs/cifsfs.c                      |   11 +-
 fs/cifs/cifsglob.h                    |    5 +-
 fs/cifs/connect.c                     |    3 -
 fs/cifs/file.c                        |   37 +-
 fs/cifs/fscache.c                     |  201 ++---
 fs/cifs/fscache.h                     |   53 +-
 fs/cifs/inode.c                       |   18 +-
 fs/fs-writeback.c                     |    8 +
 fs/fscache/Kconfig                    |   40 +
 fs/fscache/Makefile                   |   16 +
 fs/fscache/cache.c                    |  353 +++++++++
 fs/fscache/cookie.c                   |  990 ++++++++++++++++++++++++
 fs/fscache/internal.h                 |  249 ++++++
 fs/fscache/io.c                       |  381 +++++++++
 fs/fscache/main.c                     |  120 +++
 fs/fscache/proc.c                     |   54 ++
 fs/fscache/stats.c                    |  106 +++
 fs/fscache/volume.c                   |  449 +++++++++++
 fs/fscache_old/Kconfig                |   16 +-
 fs/fscache_old/Makefile               |    4 +-
 fs/fscache_old/internal.h             |    4 +-
 fs/fscache_old/object.c               |    2 +-
 fs/fscache_old/proc.c                 |   12 +-
 fs/netfs/read_helper.c                |    2 +-
 fs/nfs/Makefile                       |    2 +-
 fs/nfs/client.c                       |    4 -
 fs/nfs/direct.c                       |    2 +
 fs/nfs/file.c                         |    7 +-
 fs/nfs/fscache-index.c                |  114 ---
 fs/nfs/fscache.c                      |  264 +++----
 fs/nfs/fscache.h                      |   91 +--
 fs/nfs/inode.c                        |   11 +-
 fs/nfs/super.c                        |    7 +-
 fs/nfs/write.c                        |    1 +
 include/linux/fs.h                    |    4 +
 include/linux/fscache-cache.h         |  199 +++++
 include/linux/fscache.h               |  680 +++++++++++++++++
 include/linux/fscache_old.h           |    1 +
 include/linux/netfs.h                 |    4 +-
 include/linux/nfs_fs_sb.h             |    9 +-
 include/linux/writeback.h             |    1 +
 include/trace/events/cachefiles.h     |  485 +++++++++---
 include/trace/events/cachefiles_old.h |  321 ++++++++
 include/trace/events/fscache.h        |  448 +++++++++++
 include/trace/events/netfs.h          |    5 +-
 97 files changed, 11269 insertions(+), 2748 deletions(-)
 delete mode 100644 fs/afs/cache.c
 create mode 100644 fs/cachefiles/error_inject.c
 create mode 100644 fs/cachefiles/volume.c
 create mode 100644 fs/cachefiles_old/Kconfig
 create mode 100644 fs/cachefiles_old/Makefile
 create mode 100644 fs/cachefiles_old/bind.c
 create mode 100644 fs/cachefiles_old/daemon.c
 create mode 100644 fs/cachefiles_old/interface.c
 create mode 100644 fs/cachefiles_old/internal.h
 create mode 100644 fs/cachefiles_old/io.c
 create mode 100644 fs/cachefiles_old/key.c
 create mode 100644 fs/cachefiles_old/main.c
 create mode 100644 fs/cachefiles_old/namei.c
 create mode 100644 fs/cachefiles_old/security.c
 create mode 100644 fs/cachefiles_old/xattr.c
 delete mode 100644 fs/cifs/cache.c
 create mode 100644 fs/fscache/Kconfig
 create mode 100644 fs/fscache/Makefile
 create mode 100644 fs/fscache/cache.c
 create mode 100644 fs/fscache/cookie.c
 create mode 100644 fs/fscache/internal.h
 create mode 100644 fs/fscache/io.c
 create mode 100644 fs/fscache/main.c
 create mode 100644 fs/fscache/proc.c
 create mode 100644 fs/fscache/stats.c
 create mode 100644 fs/fscache/volume.c
 delete mode 100644 fs/nfs/fscache-index.c
 create mode 100644 include/linux/fscache-cache.h
 create mode 100644 include/linux/fscache.h
 create mode 100644 include/trace/events/cachefiles_old.h
 create mode 100644 include/trace/events/fscache.h

Comments

Linus Torvalds Oct. 22, 2021, 7:21 p.m. UTC | #1
On Fri, Oct 22, 2021 at 8:58 AM David Howells <dhowells@redhat.com> wrote:
>
> David Howells (52):
>       fscache_old: Move the old fscache driver to one side
>       fscache_old: Rename CONFIG_FSCACHE* to CONFIG_FSCACHE_OLD*
>       cachefiles_old:  Move the old cachefiles driver to one side

Honestly, I don't see the point of this when it ends up just being
dead code basically immediately.

You don't actually support picking one or the other at build time,
just a hard switch-over.

That makes the old fscache driver useless. You can't say "use the old
one because I don't trust the new". You just have a legacy
implementation with no users.

              Linus
David Howells Oct. 22, 2021, 7:40 p.m. UTC | #2
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Honestly, I don't see the point of this when it ends up just being
> dead code basically immediately.
> 
> You don't actually support picking one or the other at build time,
> just a hard switch-over.
> 
> That makes the old fscache driver useless. You can't say "use the old
> one because I don't trust the new". You just have a legacy
> implementation with no users.

What's the best way to do this?  Is it fine to disable caching in all the
network filesystems and then directly remove the fscache and cachefiles
drivers and replace them?  It won't stop the network filesystems actually
working - it'll just mean that they don't do any caching until converted and
have caching reenabled.

David
Linus Torvalds Oct. 22, 2021, 7:58 p.m. UTC | #3
On Fri, Oct 22, 2021 at 9:40 AM David Howells <dhowells@redhat.com> wrote:
>
> What's the best way to do this?  Is it fine to disable caching in all the
> network filesystems and then directly remove the fscache and cachefiles
> drivers and replace them?

So the basic issue with this whole "total rewrite" is that there's no
way to bisect anything.

And there's no way for people to say "I don't trust the rewrite, I
want to keep using the old tested model".

Which makes this all painful and generally the wrong way to do
anything like this, and there's fundamentally no "best way".

The real best way would be if the conversion could be done truly
incrementally. Flag-days simply aren't good for development, because
even if the patch to enable the new code might be some trivial
one-liner, that doesn't _help_ anything. The switch-over switches from
one code-base to another, with no help from "this is where the problem
started".

So in order of preference:

 (a) actual incremental changes where the code keeps working all the
time, and no flag days

 (b) same interfaces, so at least you can do A/B testing and people
can choose one or the other

 (c) total rewrite

and if (c) is the thing that all the network filesystem people want,
then what the heck is the point in keeping dead code around? At that
point, all the rename crap is just extra work, extra noise, and only a
distraction. There's no upside that I can see.

                   Linus
Linus Torvalds Oct. 22, 2021, 8:24 p.m. UTC | #4
On Fri, Oct 22, 2021 at 9:58 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> and if (c) is the thing that all the network filesystem people want,
> then what the heck is the point in keeping dead code around? At that
> point, all the rename crap is just extra work, extra noise, and only a
> distraction. There's no upside that I can see.

Again, I'm not a fan of (c) as an option, but if done, then the
simplest model would appear to be:

 - remove the old fscache code, obviously disabling the Kconfig for it
for each filesystem, all in one fell swoop.

 - add the new fscache code (possibly preferably in sane chunks that
explains the parts).

 - then do a "convert to new world order and enable" commit
individually for each filesystem

but as mentioned, there's no sane way to bisect things, or have a sane
development history in this kind of situation.

                Linus