[GIT,PULL] fscache: I/O API modernisation and netfs helper library

Message ID	591237.1612886997@warthog.procyon.org.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-cifs-owner@kernel.org> Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells <dhowells@redhat.com> To: torvalds@linux-foundation.org cc: dhowells@redhat.com, Matthew Wilcox <willy@infradead.org>, Jeff Layton <jlayton@redhat.com>, David Wysochanski <dwysocha@redhat.com>, Anna Schumaker <anna.schumaker@netapp.com>, Trond Myklebust <trondmy@hammerspace.com>, Steve French <sfrench@samba.org>, Dominique Martinet <asmadeus@codewreck.org>, Alexander Viro <viro@zeniv.linux.org.uk>, ceph-devel@vger.kernel.org, linux-afs@lists.infradead.org, linux-cachefs@redhat.com, linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: [GIT PULL] fscache: I/O API modernisation and netfs helper library MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <591236.1612886997.1@warthog.procyon.org.uk> Content-Transfer-Encoding: quoted-printable Date: Tue, 09 Feb 2021 16:09:57 +0000 Message-ID: <591237.1612886997@warthog.procyon.org.uk> Precedence: bulk
Series	[GIT,PULL] fscache: I/O API modernisation and netfs helper library \| expand [GIT,PULL] fscache: I/O API modernisation and netfs helper library

David Howells Feb. 9, 2021, 4:09 p.m. UTC

Hi Linus,

Can you pull this during the upcoming merge window?  It provides a more
modern I/O API for fscache and moves some common pieces out of network
filesystems into a common helper library.  This request only includes
modifications for afs and ceph.

Dave Wysochanski has a patch series for nfs.  Normal nfs works fine and
passes various tests, but it turned out pnfs has a problem that wasn't
discovered until quite late - pnfs does splitting of requests itself and
sending them to various places, but it will need to cooperate more closely
with the netfs lib over this.

I've given Dominique Martinet a patch for 9p and Steve French a partial
patch for cifs, but neither of those is going to be ready for this merge
window.

The main features of this request are:

 (1) Institution of a helper library for network filesystems.  The first
     phase of this handles ->readpage(), ->readahead() and part of
     ->write_begin() on behalf of the netfs, requiring the netfs to provide
     a common vector to perform a read to some part of a file.

     This allows handling of the following to be (at least partially) moved
     out of all the network filesystems and consolidated in one place:

	- changes in VM vectors (Matthew Wilcox's work)
	  - transparent huge page support
	- shaping of reads
	  - readahead expansion
	  - fs alignment/granularity (ceph, pnfs)
	  - cache alignment/granularity
	- slicing of reads
	  - rsize
	  - keeping multiple read in flight	} Steve French would like
	  - multichannel distribution		} but for the future
	  - multiserver distribution (ceph, pnfs)
	  - stitching together reads from the cache and reads from the net
	- copying data read from the server into the cache
	- retry/reissue handling
	  - fallback after cache failure
     	- short reads
	- fscrypt data crypting (Jeff Layton is considering for the future)

 (2) Adding an alternate cache I/O API for use with the netfs lib that
     makes use of kiocbs in the cache to do direct I/O between the cache
     files and the netfs pages.

     This is intended to replace the current I/O API that calls the backing
     fs readpage op and than snooping the wait queues for completion to
     read and using vfs_write() to write.  It wasn't possible to do
     in-kernel DIO when I first wrote cachefiles - but using kiocbs makes
     it a lot simpler and more robust (and it uses a lot less memory).

 (3) Add an ITER_XARRAY iov_iter that allows I/O iteration to be done on an
     xarray of pinned pages (such as inode->i_mapping->i_pages), thereby
     avoiding the need to allocate a bvec array to represent this.

     This is used to present a set of netfs pages to the cache to do DIO on
     and is also used by afs to present netfs pages to sendmsg.  It could
     also be used by unencrypted cifs to pass the pages to the TCP socket
     it uses (if it's doing TCP) and my patch for 9p (which isn't included
     here) can make use of it too.

 (4) Make afs use the above.  It passes the same xfstests (and has the same
     failures) as the unpatched afs client.

 (5) Make ceph use the above (I've merged a branch from Jeff Layton for this).
     This also passes xfstests.

David
---
The following changes since commit 9791581c049c10929e97098374dd1716a81fefcc:

  Merge tag 'for-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux (2021-01-20 14:15:33 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/fscache-ioapi-20210203

for you to fetch changes up to 1df6bf2cc0fad1a5b2b32b7b0066b13175ad1ce4:

  netfs: Fix kerneldoc on netfs_subreq_terminated() (2021-02-03 11:17:57 +0000)

----------------------------------------------------------------
fscache I/O API rework and netfs changes

----------------------------------------------------------------
David Howells (29):
      iov_iter: Add ITER_XARRAY
      vm: Add wait/unlock functions for PG_fscache
      mm: Implement readahead_control pageset expansion
      vfs: Export rw_verify_area() for use by cachefiles
      netfs: Make a netfs helper module
      netfs: Provide readahead and readpage netfs helpers
      netfs: Add tracepoints
      netfs: Gather stats
      netfs: Add write_begin helper
      netfs: Define an interface to talk to a cache
      fscache, cachefiles: Add alternate API to use kiocb for read/write to cache
      afs: Disable use of the fscache I/O routines
      afs: Pass page into dirty region helpers to provide THP size
      afs: Print the operation debug_id when logging an unexpected data version
      afs: Move key to afs_read struct
      afs: Don't truncate iter during data fetch
      afs: Log remote unmarshalling errors
      afs: Set up the iov_iter before calling afs_extract_data()
      afs: Use ITER_XARRAY for writing
      afs: Wait on PG_fscache before modifying/releasing a page
      afs: Extract writeback extension into its own function
      afs: Prepare for use of THPs
      afs: Use the fs operation ops to handle FetchData completion
      afs: Use new fscache read helper API
      Merge branch 'fscache-netfs-lib' into fscache-next
      Merge branch 'ceph-netfs-lib' of https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux into fscache-next
      netfs: Fix various bits of error handling
      afs: Fix error handling in afs_req_issue_op()
      netfs: Fix kerneldoc on netfs_subreq_terminated()

Jeff Layton (7):
      ceph: disable old fscache readpage handling
      ceph: rework PageFsCache handling
      ceph: fix fscache invalidation
      ceph: convert readpage to fscache read helper
      ceph: plug write_begin into read helper
      ceph: convert ceph_readpages to ceph_readahead
      ceph: fix an oops in error handling in ceph_netfs_issue_op

 fs/Kconfig                    |    1 +
 fs/Makefile                   |    1 +
 fs/afs/Kconfig                |    1 +
 fs/afs/dir.c                  |  225 +++++---
 fs/afs/file.c                 |  470 ++++-------------
 fs/afs/fs_operation.c         |    4 +-
 fs/afs/fsclient.c             |  108 ++--
 fs/afs/inode.c                |    7 +-
 fs/afs/internal.h             |   58 +-
 fs/afs/rxrpc.c                |  150 ++----
 fs/afs/write.c                |  610 ++++++++++++----------
 fs/afs/yfsclient.c            |   82 +--
 fs/cachefiles/Makefile        |    1 +
 fs/cachefiles/interface.c     |    5 +-
 fs/cachefiles/internal.h      |    9 +
 fs/cachefiles/rdwr2.c         |  412 +++++++++++++++
 fs/ceph/Kconfig               |    1 +
 fs/ceph/addr.c                |  535 ++++++++-----------
 fs/ceph/cache.c               |  125 -----
 fs/ceph/cache.h               |  101 +---
 fs/ceph/caps.c                |   10 +-
 fs/ceph/inode.c               |    1 +
 fs/ceph/super.h               |    1 +
 fs/fscache/Kconfig            |    1 +
 fs/fscache/Makefile           |    3 +-
 fs/fscache/internal.h         |    3 +
 fs/fscache/page.c             |    2 +-
 fs/fscache/page2.c            |  117 +++++
 fs/fscache/stats.c            |    1 +
 fs/internal.h                 |    5 -
 fs/netfs/Kconfig              |   23 +
 fs/netfs/Makefile             |    5 +
 fs/netfs/internal.h           |   97 ++++
 fs/netfs/read_helper.c        | 1161 +++++++++++++++++++++++++++++++++++++++++
 fs/netfs/stats.c              |   59 +++
 fs/read_write.c               |    1 +
 include/linux/fs.h            |    1 +
 include/linux/fscache-cache.h |    4 +
 include/linux/fscache.h       |   40 +-
 include/linux/netfs.h         |  167 ++++++
 include/linux/pagemap.h       |   16 +
 include/linux/uio.h           |   11 +
 include/net/af_rxrpc.h        |    2 +-
 include/trace/events/afs.h    |   74 ++-
 include/trace/events/netfs.h  |  201 +++++++
 lib/iov_iter.c                |  313 ++++++++++-
 mm/filemap.c                  |   18 +
 mm/readahead.c                |   70 +++
 net/rxrpc/recvmsg.c           |    9 +-
 49 files changed, 3749 insertions(+), 1573 deletions(-)
 create mode 100644 fs/cachefiles/rdwr2.c
 create mode 100644 fs/fscache/page2.c
 create mode 100644 fs/netfs/Kconfig
 create mode 100644 fs/netfs/Makefile
 create mode 100644 fs/netfs/internal.h
 create mode 100644 fs/netfs/read_helper.c
 create mode 100644 fs/netfs/stats.c
 create mode 100644 include/linux/netfs.h
 create mode 100644 include/trace/events/netfs.h

Linus Torvalds Feb. 9, 2021, 7:06 p.m. UTC | #1

So I'm looking at this early, because I have more time now than I will
have during the merge window, and honestly, your pull requests have
been problematic in the past.

The PG_fscache bit waiting functions are completely crazy. The comment
about "this will wake up others" is actively wrong, and the waiting
function looks insane, because you're mixing the two names for
"fscache" which makes the code look totally incomprehensible. Why
would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
why, but the code looks entirely nonsensical.

So just looking at the support infrastructure changes, I get a big "Hmm".

But the thing that makes me go "No, I won't pull this", is that it has
all the same hallmark signs of trouble that I've complained about
before: I see absolutely zero sign of "this has more developers
involved".

There's not a single ack from a VM person for the VM changes. There's
no sign that this isn't yet another "David Howells went off alone and
did something that absolutely nobody else cared about".

See my problem? I need to be convinced that this makes sense outside
of your world, and it's not yet another thing that will cause problems
down the line because nobody else really ever used it or cared about
it until we hit a snag.

                  Linus

Jeff Layton Feb. 9, 2021, 7:45 p.m. UTC | #2

On Tue, 2021-02-09 at 11:06 -0800, Linus Torvalds wrote:
> So I'm looking at this early, because I have more time now than I will
> have during the merge window, and honestly, your pull requests have
> been problematic in the past.
> 
> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong, and the waiting
> function looks insane, because you're mixing the two names for
> "fscache" which makes the code look totally incomprehensible. Why
> would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> why, but the code looks entirely nonsensical.
> 
> So just looking at the support infrastructure changes, I get a big "Hmm".
> 
> But the thing that makes me go "No, I won't pull this", is that it has
> all the same hallmark signs of trouble that I've complained about
> before: I see absolutely zero sign of "this has more developers
> involved".
> 
> There's not a single ack from a VM person for the VM changes. There's
> no sign that this isn't yet another "David Howells went off alone and
> did something that absolutely nobody else cared about".
> 
> See my problem? I need to be convinced that this makes sense outside
> of your world, and it's not yet another thing that will cause problems
> down the line because nobody else really ever used it or cared about
> it until we hit a snag.
> 
>                   Linus
> 

I (and several other developers) have been working with David on this
for the last year or so. Would it help if I gave this on the netfs lib
work and the fscache patches?

    Reviewed-and-tested-by: Jeff Layton <jlayton@redhat.com>

My testing has mainly been with ceph. My main interest is that this
allows us to drop a fairly significant chunk of rather nasty code from
fs/ceph. The netfs read helper infrastructure makes a _lot_ more sense
for a networked filesystem, IMO.

The legacy fscache code has some significant bugs too, and this gives it
a path to making better use of more modern kernel features. It should
also be set up so that filesystems can be converted piecemeal.

I'd really like to see this go in.

Cheers,
Jeff

Matthew Wilcox Feb. 9, 2021, 8:21 p.m. UTC | #3

On Tue, Feb 09, 2021 at 11:06:41AM -0800, Linus Torvalds wrote:
> So I'm looking at this early, because I have more time now than I will
> have during the merge window, and honestly, your pull requests have
> been problematic in the past.

Thanks for looking at this early.

> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong, and the waiting
> function looks insane, because you're mixing the two names for
> "fscache" which makes the code look totally incomprehensible. Why
> would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> why, but the code looks entirely nonsensical.

Yeah, I have trouble with the private2 vs fscache bit too.  I've been
trying to persuade David that he doesn't actually need an fscache
bit at all; he can just increment the page's refcount to prevent it
from being freed while he writes data to the cache.

> But the thing that makes me go "No, I won't pull this", is that it has
> all the same hallmark signs of trouble that I've complained about
> before: I see absolutely zero sign of "this has more developers
> involved".

I've been involved!  I really want to get rid of the address_space
readpages op.  The only remaining users are the filesystems which
use fscache and it's hard to convert them with the old infrastructure.
I'm not 100% convinced that the new infrastructure is good, but I am
convinced that it's better than the old infrastructure.

> There's not a single ack from a VM person for the VM changes. There's
> no sign that this isn't yet another "David Howells went off alone and
> did something that absolutely nobody else cared about".

I'm pretty bad about sending R-b for patches when I've only given them
a quick once-over.  I tend to only do it for patches that I've given an
appropriately deep amount of thought to (eg almost none for spelling
fixes and days for page cache related patches).  I'll see what I feel
comfortable with for this patchset.

> See my problem? I need to be convinced that this makes sense outside
> of your world, and it's not yet another thing that will cause problems
> down the line because nobody else really ever used it or cared about
> it until we hit a snag.

My major concern is that we haven't had any feedback from Trond or Anna.
I don't know if they're just too busy or if there's something else going
on, but it'd be nice to hear something.

David Howells Feb. 9, 2021, 9:10 p.m. UTC | #4

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong,

You mean this?

/**
 * unlock_page_fscache - Unlock a page pinned with PG_fscache
 * @page: The page
 *
 * Unlocks the page and wakes up sleepers in wait_on_page_fscache().  Also
 * wakes those waiting for the lock and writeback bits because the wakeup
 * mechanism is shared.  But that's OK - those sleepers will just go back to
 * sleep.
 */

Actually, you're right.  The wakeup check func is evaluated by the
waker-upper.  I can fix the comment with a patch.

> and the waiting function looks insane, because you're mixing the two names
> for "fscache" which makes the code look totally incomprehensible. Why would
> we wait for PF_fscache, when PG_private_2 was set? Yes, I know why, but the
> code looks entirely nonsensical.

IIRC someone insisted that I should make it a generic name and put the
accessor functions in the fscache headers (which means they aren't available
to core code), but I don't remember who (maybe Andrew? it was before mid-2007)
- kind of like PG_checked is an alias for PG_owner_priv_1.

I'd be quite happy to move the accessors for PG_fscache to the
linux/page-flags.h as that would simplify things.

David

Linus Torvalds Feb. 9, 2021, 9:19 p.m. UTC | #5

On Tue, Feb 9, 2021 at 12:21 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> Yeah, I have trouble with the private2 vs fscache bit too.  I've been
> trying to persuade David that he doesn't actually need an fscache
> bit at all; he can just increment the page's refcount to prevent it
> from being freed while he writes data to the cache.

Does the code not hold a refcount already?

Honestly, the fact that writeback doesn't take a refcount, and then
has magic "if writeback is set, don't free" code in other parts of the
VM layer has been a problem already, when the wakeup ended up
"leaking" from a previous page to a new allocation.

I very much hope the fscache bit does not make similar mistakes,
because the rest of the VM will _not_ have special "if fscache is set,
then we won't do X" the way we do for writeback.

So I think the fscache code needs to hold a refcount regardless, and
that the fscache bit is set the page has to have a reference.

So what are the current lifetime rules for the fscache bit?

             Linus

David Howells Feb. 9, 2021, 9:25 p.m. UTC | #6

Matthew Wilcox <willy@infradead.org> wrote:

> Yeah, I have trouble with the private2 vs fscache bit too.  I've been
> trying to persuade David that he doesn't actually need an fscache
> bit at all; he can just increment the page's refcount to prevent it
> from being freed while he writes data to the cache.

That's not what the bit is primarily being used for.  It's being used to
prevent the starting of a second write to the cache whilst the first is in
progress and also to prevent modification whilst DMA to the cache is in
progress.  This isn't so obvious in this cut-down patchset, but comes more in
to play with full caching of local writes in my fscache-iter branch.

I can't easily share PG_writeback for this because each bit covers a write to
a different place.  PG_writeback covers the write to the server and PG_fscache
the write to the cache.  These writes may get split up differently and will
most likely finish at different times.

If I have to share PG_writeback, that will mean storing both states for each
page somewhere else and then "OR'ing" them together to drive PG_writeback.

David

David Howells Feb. 9, 2021, 9:55 p.m. UTC | #7

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > Yeah, I have trouble with the private2 vs fscache bit too.  I've been
> > trying to persuade David that he doesn't actually need an fscache
> > bit at all; he can just increment the page's refcount to prevent it
> > from being freed while he writes data to the cache.
> 
> Does the code not hold a refcount already?

AIUI, Willy wanted me to drop the refcount and rely on PG_locked alone during
I/O triggered by the new ->readahead() method, so when it comes to setting
PG_fscache after a successful read from the server, I don't hold any page refs
- the assumption being that the waits in releasepage and invalidatepage
suffice.  If that isn't sufficient, I can make it take page refs on the pages
to be written out - that should be easy enough to do.

> Honestly, the fact that writeback doesn't take a refcount, and then
> has magic "if writeback is set, don't free" code in other parts of the
> VM layer has been a problem already, when the wakeup ended up
> "leaking" from a previous page to a new allocation.
> 
> I very much hope the fscache bit does not make similar mistakes,
> because the rest of the VM will _not_ have special "if fscache is set,
> then we won't do X" the way we do for writeback.

The VM can't do that because PG_private_2 might not be being used for
PG_fscache.  It does, however, treat PG_private_2 like PG_private when
triggering calls to releasepage and invalidatepage.

> So I think the fscache code needs to hold a refcount regardless, and
> that the fscache bit is set the page has to have a reference.
> 
> So what are the current lifetime rules for the fscache bit?

It depends which 'current' you're referring to.

The old fscache I/O API (ie. what's upstream) - in which PG_fscache is set on
a page to note that fscache knows about the page - does not keep a separate
ref on such pages.

The new fscache I/O API simplifies things.  With that, pages are only known
about for the duration of a write to the cache.  I've tried to analogise the
way PG_writeback works[*], including waiting for it in places like
invalidation, releasepage, page_mkwrite (though in the netfs, not the core VM)
as it may represent DMA.

Note that with the new I/O API, fscache and cachefiles know nothing about the
PG_fscache bit or netfs pages; they just deal with an iov_iter and a
completion function.  Dealing with PG_fscache is done by the netfs and the new
netfs helper lib.

[*] Though I see that 073861ed77b6b made a change to end_page_writeback() for
    an issue that probably affects unlock_page_fscache() too[**].

[**] This may mean that both PG_fscache and PG_writeback need to hold a ref on
     the page.

David

David Wysochanski Feb. 9, 2021, 10:42 p.m. UTC | #8

On Tue, Feb 9, 2021 at 2:07 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I'm looking at this early, because I have more time now than I will
> have during the merge window, and honestly, your pull requests have
> been problematic in the past.
>
> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong, and the waiting
> function looks insane, because you're mixing the two names for
> "fscache" which makes the code look totally incomprehensible. Why
> would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> why, but the code looks entirely nonsensical.
>
> So just looking at the support infrastructure changes, I get a big "Hmm".
>
> But the thing that makes me go "No, I won't pull this", is that it has
> all the same hallmark signs of trouble that I've complained about
> before: I see absolutely zero sign of "this has more developers
> involved".
>
> There's not a single ack from a VM person for the VM changes. There's
> no sign that this isn't yet another "David Howells went off alone and
> did something that absolutely nobody else cared about".
>

I care about it.

I cannot speak to your concerns about the infrastructure changes, nor
can I comment about a given maintainers involvement or lack thereof.
However, I can tell you what my involvement has been.  I got involved
with it because some of our customers use fscache with NFS and I've
supported it.  I saw dhowells rewriting it to greatly simplify the
code and make it easier to debug and wanted to support the effort.

I have been working on the NFS conversion as dhowells has been
evolving the fscache-iter API.  David first posted the series I think
in Dec 2019 and I started with NFS about mid-year 2020, and had my
first post of NFS patches in July:
https://marc.info/?l=linux-nfs&m=159482591232752&w=2

One thing that came out of the earlier iterations as a result of my
testing was the need for the network filesystem to be able to 'cap'
the IO size based on its parameters, hence the "clamp_length()"
function.  So the next iteration dhowells further evolved it into a
'netfs' API and a 'fscache' API, and my November post was based on
that:
https://marc.info/?l=linux-nfs&m=160596540022461&w=2

Each iteration has greatly simplified the interface to the network
filesystem until today where the API is pretty simple.  I have done
extensive tests with each iteration with all the main NFS versions,
specific unit tests, xfstests, etc.  However my test matrix did not
hit enough fscache + pNFS servers, and I found a problem too late to
include in his pull request.  This is mostly why my patches were not
included to convert NFS to the new fscache API, but I intend to work
out the remaining issues for the next merge window, and I'll have an
opportunity to do more testing last week of Feb with the NFS "remote
bakeathon".  My most recent post was at the end of Jan, and Anna is
taking the first 5 refactoring patches in the next merge window:
https://marc.info/?l=linux-nfs&m=161184595127618&w=2

I do not have the skills of a Trond or Anna NFS developers, but I have
worked in this in earnest and intend to see it through to completion
and support NFS and fscache work.  I have received some feedback on
the NFS patches though it's not been a lot, I do know I have some
things to address still.  With open source, no feedback is hard to
draw conclusions other than it's not "super popular" area, but we
always knew that about fscache - it's an "add on" that some customers
require but not everyone. I know Trond speaks up when I make a mistake
and/or something will cause a problem, so I consider the silence
mostly a positive sign.

> See my problem? I need to be convinced that this makes sense outside
> of your world, and it's not yet another thing that will cause problems
> down the line because nobody else really ever used it or cared about
> it until we hit a snag.
>
>                   Linus
>

David Howells Feb. 10, 2021, 4:29 p.m. UTC | #9

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong, and the waiting
> function looks insane, because you're mixing the two names for
> "fscache" which makes the code look totally incomprehensible. Why
> would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> why, but the code looks entirely nonsensical.

How about the attached change to make it more coherent and fix the doc
comment?

David
---
commit 9a28f7e68602193ce020a41f855f71cc55f693b9
Author: David Howells <dhowells@redhat.com>
Date:   Wed Feb 10 10:53:02 2021 +0000

    netfs: Rename unlock_page_fscache() and wait_on_page_fscache()
    
    Rename unlock_page_fscache() to unlock_page_private_2() and
    wait_on_page_fscache() to wait_on_page_private_2() and change the
    references to PG_fscache to PG_private_2 also.  This makes these functions
    look more generic and doesn't mix the terminology.
    
    Fix the kdoc comment as the wake up mechanism doesn't wake up all the
    sleepers.  Note the example usage case for the functions in conjunction
    with the cache also.
    
    Alias the functions in linux/netfs.h.
    
    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: David Howells <dhowells@redhat.com>

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 2ffdef1ded91..d4cb6e6f704c 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -24,6 +24,8 @@
 #define ClearPageFsCache(page)		ClearPagePrivate2((page))
 #define TestSetPageFsCache(page)	TestSetPagePrivate2((page))
 #define TestClearPageFsCache(page)	TestClearPagePrivate2((page))
+#define wait_on_page_fscache(page)	wait_on_page_private_2((page))
+#define unlock_page_fscache(page)	unlock_page_private_2((page))
 
 enum netfs_read_source {
 	NETFS_FILL_WITH_ZEROES,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4935ad6171c1..a88ccc9ab0b1 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -591,7 +591,7 @@ extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 extern void unlock_page(struct page *page);
-extern void unlock_page_fscache(struct page *page);
+extern void unlock_page_private_2(struct page *page);
 
 /*
  * Return true if the page was successfully locked
@@ -683,16 +683,17 @@ static inline int wait_on_page_locked_killable(struct page *page)
 }
 
 /**
- * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
+ * wait_on_page_private_2 - Wait for PG_private_2 to be cleared on a page
  * @page: The page
  *
- * Wait for the fscache mark to be removed from a page, usually signifying the
- * completion of a write from that page to the cache.
+ * Wait for the PG_private_2 page bit to be removed from a page.  This is, for
+ * example, used to handle a netfs page being written to a local disk cache,
+ * thereby allowing writes to the cache for the same page to be serialised.
  */
-static inline void wait_on_page_fscache(struct page *page)
+static inline void wait_on_page_private_2(struct page *page)
 {
 	if (PagePrivate2(page))
-		wait_on_page_bit(compound_head(page), PG_fscache);
+		wait_on_page_bit(compound_head(page), PG_private_2);
 }
 
 extern void put_and_wait_on_page_locked(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index 91fcae006d64..7d321152d579 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1467,22 +1467,24 @@ void unlock_page(struct page *page)
 EXPORT_SYMBOL(unlock_page);
 
 /**
- * unlock_page_fscache - Unlock a page pinned with PG_fscache
+ * unlock_page_private_2 - Unlock a page that's locked with PG_private_2
  * @page: The page
  *
- * Unlocks the page and wakes up sleepers in wait_on_page_fscache().  Also
- * wakes those waiting for the lock and writeback bits because the wakeup
- * mechanism is shared.  But that's OK - those sleepers will just go back to
- * sleep.
+ * Unlocks a page that's locked with PG_private_2 and wakes up sleepers in
+ * wait_on_page_private_2().
+ *
+ * This is, for example, used when a netfs page is being written to a local
+ * disk cache, thereby allowing writes to the cache for the same page to be
+ * serialised.
  */
-void unlock_page_fscache(struct page *page)
+void unlock_page_private_2(struct page *page)
 {
 	page = compound_head(page);
 	VM_BUG_ON_PAGE(!PagePrivate2(page), page);
-	clear_bit_unlock(PG_fscache, &page->flags);
-	wake_up_page_bit(page, PG_fscache);
+	clear_bit_unlock(PG_private_2, &page->flags);
+	wake_up_page_bit(page, PG_private_2);
 }
-EXPORT_SYMBOL(unlock_page_fscache);
+EXPORT_SYMBOL(unlock_page_private_2);
 
 /**
  * end_page_writeback - end writeback against a page

David Howells Feb. 10, 2021, 4:33 p.m. UTC | #10

> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > The PG_fscache bit waiting functions are completely crazy. The comment
> > about "this will wake up others" is actively wrong, and the waiting
> > function looks insane, because you're mixing the two names for
> > "fscache" which makes the code look totally incomprehensible. Why
> > would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> > why, but the code looks entirely nonsensical.
> 
> How about the attached change to make it more coherent and fix the doc
> comment?

Then I could follow it up with this patch here, moving towards dropping the
PG_fscache alias for the new API.

David
---
commit b415fafb07166732933242e938626fc6cdbdbc5b
Author: David Howells <dhowells@redhat.com>
Date:   Wed Feb 10 11:22:59 2021 +0000

    netfs: Move towards dropping the PG_fscache alias for PG_private_2
    
    Move towards dropping the PG_fscache alias for PG_private_2 with the new
    fscache I/O API and netfs helper lib (this does not alter the old API).
    
    Signed-off-by: David Howells <dhowells@redhat.com>

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 8f28d4f4cfd7..bb8c6d501292 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -438,7 +438,7 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
 	if (PagePrivate(page))
 		afs_invalidate_dirty(page, offset, length);
 
-	wait_on_page_fscache(page);
+	wait_on_page_private_2(page);
 	_leave("");
 }
 
@@ -457,10 +457,10 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 	/* deny if page is being written to the cache and the caller hasn't
 	 * elected to wait */
 #ifdef CONFIG_AFS_FSCACHE
-	if (PageFsCache(page)) {
+	if (PagePrivate2(page)) {
 		if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS))
 			return false;
-		wait_on_page_fscache(page);
+		wait_on_page_private_2(page);
 	}
 #endif
 
diff --git a/fs/afs/write.c b/fs/afs/write.c
index e672833c99bc..9c554981f53b 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -118,7 +118,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 	}
 
 #ifdef CONFIG_AFS_FSCACHE
-	wait_on_page_fscache(page);
+	wait_on_page_private_2(page);
 #endif
 
 	index = page->index;
@@ -929,8 +929,8 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	 * be modified.  We then assume the entire page will need writing back.
 	 */
 #ifdef CONFIG_AFS_FSCACHE
-	if (PageFsCache(page) &&
-	    wait_on_page_bit_killable(page, PG_fscache) < 0)
+	if (PagePrivate2(page) &&
+	    wait_on_page_bit_killable(page, PG_private_2) < 0)
 		return VM_FAULT_RETRY;
 #endif
 
@@ -1026,6 +1026,6 @@ int afs_launder_page(struct page *page)
 
 	trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page);
 	detach_page_private(page);
-	wait_on_page_fscache(page);
+	wait_on_page_private_2(page);
 	return ret;
 }
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 0dd64d31eff6..fd2498567b49 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -147,7 +147,7 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
 	struct ceph_inode_info *ci;
 	struct ceph_snap_context *snapc = page_snap_context(page);
 
-	wait_on_page_fscache(page);
+	wait_on_page_private_2(page);
 
 	inode = page->mapping->host;
 	ci = ceph_inode(inode);
@@ -176,10 +176,10 @@ static int ceph_releasepage(struct page *page, gfp_t gfp_flags)
 	dout("%p releasepage %p idx %lu (%sdirty)\n", page->mapping->host,
 	     page, page->index, PageDirty(page) ? "" : "not ");
 
-	if (PageFsCache(page)) {
+	if (PagePrivate2(page)) {
 		if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS))
 			return 0;
-		wait_on_page_fscache(page);
+		wait_on_page_private_2(page);
 	}
 	return !PagePrivate(page);
 }
@@ -1258,7 +1258,7 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping,
 			      &ceph_netfs_write_begin_ops, NULL);
 out:
 	if (r == 0)
-		wait_on_page_fscache(page);
+		wait_on_page_private_2(page);
 	if (r < 0) {
 		if (page)
 			put_page(page);
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 156941e0de61..9018224693e9 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -223,7 +223,7 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq)
 
 /*
  * Deal with the completion of writing the data to the cache.  We have to clear
- * the PG_fscache bits on the pages involved and release the caller's ref.
+ * the PG_private_2 bits on the pages involved and release the caller's ref.
  *
  * May be called in softirq mode and we inherit a ref from the caller.
  */
@@ -246,7 +246,7 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq)
 			if (have_unlocked && page->index <= unlocked)
 				continue;
 			unlocked = page->index;
-			unlock_page_fscache(page);
+			unlock_page_private_2(page);
 			have_unlocked = true;
 		}
 	}
@@ -357,7 +357,7 @@ static void netfs_rreq_write_to_cache(struct netfs_read_request *rreq)
 }
 
 /*
- * Unlock the pages in a read operation.  We need to set PG_fscache on any
+ * Unlock the pages in a read operation.  We need to set PG_private_2 on any
  * pages we're going to write back before we unlock them.
  */
 static void netfs_rreq_unlock(struct netfs_read_request *rreq)
@@ -404,7 +404,7 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq)
 				break;
 			}
 			if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags))
-				SetPageFsCache(page);
+				SetPagePrivate2(page);
 			pg_failed |= subreq_failed;
 			if (pgend < iopos + subreq->len)
 				break;
@@ -1142,7 +1142,7 @@ int netfs_write_begin(struct file *file, struct address_space *mapping,
 		goto error;
 
 have_page:
-	wait_on_page_fscache(page);
+	wait_on_page_private_2(page);
 have_page_no_wait:
 	if (netfs_priv)
 		ops->cleanup(netfs_priv, mapping);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 3f177faa0ac2..ccf533288291 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -29,6 +29,17 @@
 #define fscache_cookie_valid(cookie) (0)
 #endif
 
+#ifndef FSCACHE_USE_NEW_IO_API
+/*
+ * Overload PG_private_2 to give us PG_fscache - this is used to indicate that
+ * a page is currently backed by a local disk cache
+ */
+#define PageFsCache(page)		PagePrivate2((page))
+#define SetPageFsCache(page)		SetPagePrivate2((page))
+#define ClearPageFsCache(page)		ClearPagePrivate2((page))
+#define TestSetPageFsCache(page)	TestSetPagePrivate2((page))
+#define TestClearPageFsCache(page)	TestClearPagePrivate2((page))
+#endif
 
 /* pattern used to fill dead space in an index entry */
 #define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index d4cb6e6f704c..be03c3b6f0dc 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -15,18 +15,6 @@
 #include <linux/workqueue.h>
 #include <linux/fs.h>
 
-/*
- * Overload PG_private_2 to give us PG_fscache - this is used to indicate that
- * a page is currently backed by a local disk cache
- */
-#define PageFsCache(page)		PagePrivate2((page))
-#define SetPageFsCache(page)		SetPagePrivate2((page))
-#define ClearPageFsCache(page)		ClearPagePrivate2((page))
-#define TestSetPageFsCache(page)	TestSetPagePrivate2((page))
-#define TestClearPageFsCache(page)	TestClearPagePrivate2((page))
-#define wait_on_page_fscache(page)	wait_on_page_private_2((page))
-#define unlock_page_fscache(page)	unlock_page_private_2((page))
-
 enum netfs_read_source {
 	NETFS_FILL_WITH_ZEROES,
 	NETFS_DOWNLOAD_FROM_SERVER,
diff --git a/mm/filemap.c b/mm/filemap.c
index 7d321152d579..e7b2a1e2c40b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3604,7 +3604,7 @@ EXPORT_SYMBOL(generic_file_write_iter);
  * The address_space is to try to release any data against the page
  * (presumably at page->private).
  *
- * This may also be called if PG_fscache is set on a page, indicating that the
+ * This may also be called if PG_private_2 is set on a page, indicating that the
  * page is known to the local caching routines.
  *
  * The @gfp_mask argument specifies whether I/O may be performed to release
diff --git a/mm/readahead.c b/mm/readahead.c
index 4446dada0bc2..01209a46e834 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -40,7 +40,7 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);
 
 /*
  * see if a page needs releasing upon read_cache_pages() failure
- * - the caller of read_cache_pages() may have set PG_private or PG_fscache
+ * - the caller of read_cache_pages() may have set PG_private or PG_private_2
  *   before calling, such as the NFS fs marking pages that are cached locally
  *   on disk, thus we need to give the fs a chance to clean up in the event of
  *   an error

David Howells Feb. 10, 2021, 4:36 p.m. UTC | #11

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Does the code not hold a refcount already?

The attached patch will do that.  Note that it's currently based on top of the
patch that drops the PG_fscache alias, so it refers to PG_private_2.

I've run all three patches through xfstests over afs, both with and without a
cache, and Jeff has tested ceph with them.

David
---
commit 803a09110b41b9f6091a517fc8f5c4b15475048c
Author: David Howells <dhowells@redhat.com>
Date:   Wed Feb 10 11:35:15 2021 +0000

    netfs: Hold a ref on a page when PG_private_2 is set
    
    Take a reference on a page when PG_private_2 is set and drop it once the
    bit is unlocked.
    
    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: David Howells <dhowells@redhat.com>

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 9018224693e9..043d96ca2aad 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -10,6 +10,7 @@
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include <linux/pagemap.h>
+#include <linux/pagevec.h>
 #include <linux/slab.h>
 #include <linux/uio.h>
 #include <linux/sched/mm.h>
@@ -230,10 +231,13 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq)
 static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq)
 {
 	struct netfs_read_subrequest *subreq;
+	struct pagevec pvec;
 	struct page *page;
 	pgoff_t unlocked = 0;
 	bool have_unlocked = false;
 
+	pagevec_init(&pvec);
+
 	rcu_read_lock();
 
 	list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
@@ -247,6 +251,8 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq)
 				continue;
 			unlocked = page->index;
 			unlock_page_private_2(page);
+			if (pagevec_add(&pvec, page) == 0)
+				pagevec_release(&pvec);
 			have_unlocked = true;
 		}
 	}
@@ -403,8 +409,10 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq)
 				pg_failed = true;
 				break;
 			}
-			if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags))
+			if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) {
+				get_page(page);
 				SetPagePrivate2(page);
+			}
 			pg_failed |= subreq_failed;
 			if (pgend < iopos + subreq->len)
 				break;

Linus Torvalds Feb. 10, 2021, 8:43 p.m. UTC | #12

On Wed, Feb 10, 2021 at 8:33 AM David Howells <dhowells@redhat.com> wrote:
>
> Then I could follow it up with this patch here, moving towards dropping the
> PG_fscache alias for the new API.

So I don't mind the alias per se, but I did mind the odd mixing of
names for the same thing.

So I think your change to make it be named "wait_on_page_private_2()"
fixed that mixing, but I also think that it's probably then a good
idea to have aliases in place for filesystems that actually include
the fscache.h header.

Put another way: I think that it would be even better to simply just
have a function like

   static inline void wait_on_page_fscache(struct page *page)
   {
        if (PagePrivate2(page))
                wait_on_page_bit(page, PG_private_2);
  }

and make that be *not* in <linux/pagemap.h>, but simply be in
<linux/fscache.h> under that big comment about how PG_private_2 is
used for the fscache bit. You already have that comment, putting the
above kind of helper function right there would very much explain why
a "wait for fscache bit" function then uses the PagePrivate2 function
to test the bit. Agreed?

Alternatively, since that header file already has

    #define PageFsCache(page)               PagePrivate2((page))

you could also just write the above as

   static inline void wait_on_page_fscache(struct page *page)
   {
        if (PageFsCache(page))
                wait_on_page_bit(page, PG_fscache);
  }

and now it is even more obvious. And there's no odd mixing of
"fscache" and "private_2", it's all consistent.

IOW, I'm not against "wait_on_page_fscache()" as a function, but I
*am* against the odd _mixing_ of things without a big explanation,
where the code itself looks very odd and questionable.

And I think the "fscache" waiting functions should not be visible to
any core VM or filesystem code - it should be limited explicitly to
those filesystems that use fscache, and include that header file.

Wouldn't that make sense?

Also, honestly, I really *REALLY* want your commit messages to talk
about who has been cc'd, who has been part of development, and point
to the PUBLIC MAILING LISTS WHERE THAT DISCUSSION WAS TAKING PLACE, so
that I can actually see that "yes, other people were involved"

No, I don't require this in general, but exactly because of the
history we have, I really really want to see that. I want to see a

   Link: https://lore.kernel.org/r/....

and the Cc's - or better yet, the Reviewed-by's etc - so that when I
get a pull request, it really is very obvious to me when I look at it
that others really have been involved.

So if I continue to see just

    Signed-off-by: David Howells <dhowells@redhat.com>

at the end of the commit messages, I will not pull.

Yes, in this thread a couple of people have piped up and said that
they were part of the discussion and that they are interested, but if
I have to start asking around just to see that, then it's too little,
too late.

No more of this "it looks like David Howells did things in private". I
want links I can follow to see the discussion, and I really want to
see that others really have been involved.

Ok?

                  Linus

David Howells Feb. 11, 2021, 10:38 p.m. UTC | #13

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> ...
> IOW, I'm not against "wait_on_page_fscache()" as a function, but I
> *am* against the odd _mixing_ of things without a big explanation,
> where the code itself looks very odd and questionable.
> 
> And I think the "fscache" waiting functions should not be visible to
> any core VM or filesystem code - it should be limited explicitly to
> those filesystems that use fscache, and include that header file.

Okay...  How about the attached then?

I've also discarded the patch that just moves towards completely getting rid
of PG_fscache and adjusted the third patch that takes a ref on the page for
the duration to handle the change of names.

Speaking of the ref-taking patch, is the one I posted yesterday the sort of
thing you wanted for that?  I wonder if I should drop the ref in the unlock
function, though doing it afterwards does allow for the possibility of using a
pagevec to do mass-release.

> Wouldn't that make sense?

Well, that's the current principle, but I was wondering if the alias was
causing confusion.

David
---
commit c723f0232c9f8928b3b15786499637bda3121f41
Author: David Howells <dhowells@redhat.com>
Date:   Wed Feb 10 10:53:02 2021 +0000

    netfs: Rename unlock_page_fscache() and move wait_on_page_fscache()
    
    Rename unlock_page_fscache() to unlock_page_private_2() and change the
    references to PG_fscache to PG_private_2 also.  This makes it look more
    generic and doesn't mix the terminology.
    
    Fix the kdoc comment on the above as the wake up mechanism doesn't wake up
    all the sleepers.  Note the example usage case for the function in
    conjunction with the cache also.
    
    Place unlock_page_fscache() as an alias in linux/netfs.h.
    
    Move wait_on_page_fscache() to linux/netfs.h.
    
    [v2: Implement suggestion by Linus to move the wait function into netfs.h]
    
    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: David Howells <dhowells@redhat.com>
    Tested-by: Jeff Layton <jlayton@kernel.org>
    Link: https://lore.kernel.org/linux-fsdevel/1330473.1612974547@warthog.procyon.org.uk/
    Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 2ffdef1ded91..59c2623dc408 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -14,6 +14,7 @@
 
 #include <linux/workqueue.h>
 #include <linux/fs.h>
+#include <linux/pagemap.h>
 
 /*
  * Overload PG_private_2 to give us PG_fscache - this is used to indicate that
@@ -25,6 +26,35 @@
 #define TestSetPageFsCache(page)	TestSetPagePrivate2((page))
 #define TestClearPageFsCache(page)	TestClearPagePrivate2((page))
 
+/**
+ * unlock_page_fscache - Unlock a page that's locked with PG_fscache
+ * @page: The page
+ *
+ * Unlocks a page that's locked with PG_fscache and wakes up sleepers in
+ * wait_on_page_fscache().  This page bit is used by the netfs helpers when a
+ * netfs page is being written to a local disk cache, thereby allowing writes
+ * to the cache for the same page to be serialised.
+ */
+static inline void unlock_page_fscache(struct page *page)
+{
+	unlock_page_private_2(page);
+}
+
+/**
+ * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
+ * @page: The page
+ *
+ * Wait for the PG_fscache (PG_private_2) page bit to be removed from a page.
+ * This is, for example, used to handle a netfs page being written to a local
+ * disk cache, thereby allowing writes to the cache for the same page to be
+ * serialised.
+ */
+static inline void wait_on_page_fscache(struct page *page)
+{
+	if (PageFsCache(page))
+		wait_on_page_bit(compound_head(page), PG_fscache);
+}
+
 enum netfs_read_source {
 	NETFS_FILL_WITH_ZEROES,
 	NETFS_DOWNLOAD_FROM_SERVER,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4935ad6171c1..d2786607d297 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -591,7 +591,7 @@ extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 extern void unlock_page(struct page *page);
-extern void unlock_page_fscache(struct page *page);
+extern void unlock_page_private_2(struct page *page);
 
 /*
  * Return true if the page was successfully locked
@@ -682,19 +682,6 @@ static inline int wait_on_page_locked_killable(struct page *page)
 	return wait_on_page_bit_killable(compound_head(page), PG_locked);
 }
 
-/**
- * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
- * @page: The page
- *
- * Wait for the fscache mark to be removed from a page, usually signifying the
- * completion of a write from that page to the cache.
- */
-static inline void wait_on_page_fscache(struct page *page)
-{
-	if (PagePrivate2(page))
-		wait_on_page_bit(compound_head(page), PG_fscache);
-}
-
 extern void put_and_wait_on_page_locked(struct page *page);
 
 void wait_on_page_writeback(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index 91fcae006d64..7d321152d579 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1467,22 +1467,24 @@ void unlock_page(struct page *page)
 EXPORT_SYMBOL(unlock_page);
 
 /**
- * unlock_page_fscache - Unlock a page pinned with PG_fscache
+ * unlock_page_private_2 - Unlock a page that's locked with PG_private_2
  * @page: The page
  *
- * Unlocks the page and wakes up sleepers in wait_on_page_fscache().  Also
- * wakes those waiting for the lock and writeback bits because the wakeup
- * mechanism is shared.  But that's OK - those sleepers will just go back to
- * sleep.
+ * Unlocks a page that's locked with PG_private_2 and wakes up sleepers in
+ * wait_on_page_private_2().
+ *
+ * This is, for example, used when a netfs page is being written to a local
+ * disk cache, thereby allowing writes to the cache for the same page to be
+ * serialised.
  */
-void unlock_page_fscache(struct page *page)
+void unlock_page_private_2(struct page *page)
 {
 	page = compound_head(page);
 	VM_BUG_ON_PAGE(!PagePrivate2(page), page);
-	clear_bit_unlock(PG_fscache, &page->flags);
-	wake_up_page_bit(page, PG_fscache);
+	clear_bit_unlock(PG_private_2, &page->flags);
+	wake_up_page_bit(page, PG_private_2);
 }
-EXPORT_SYMBOL(unlock_page_fscache);
+EXPORT_SYMBOL(unlock_page_private_2);
 
 /**
  * end_page_writeback - end writeback against a page

David Howells Feb. 11, 2021, 11:20 p.m. UTC | #14

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Also, honestly, I really *REALLY* want your commit messages to talk
> about who has been cc'd, who has been part of development, and point
> to the PUBLIC MAILING LISTS WHERE THAT DISCUSSION WAS TAKING PLACE, so
> that I can actually see that "yes, other people were involved"

Most of the development discussion took place on IRC and waving snippets of
code about in pastebin rather than email - the latency of email is just too
high.  There's not a great deal I can do about that now as I haven't kept IRC
logs.  I can do that in future if you want.

> No, I don't require this in general, but exactly because of the
> history we have, I really really want to see that. I want to see a
>
>    Link: https://lore.kernel.org/r/....

I can add links to where I've posted the stuff for review.  Do you want this
on a per-patch basis or just in the cover for now?

Also, do you want things like these:

 https://lore.kernel.org/linux-fsdevel/3326.1579019665@warthog.procyon.org.uk/
 https://lore.kernel.org/linux-fsdevel/4467.1579020509@warthog.procyon.org.uk/

which pertain to the overall fscache rewrite, but where the relevant changes
didn't end up included in this particular patchset?  Or this:

 https://listman.redhat.com/archives/linux-cachefs/2020-December/msg00000.html

where someone was testing the overall patchset of which this is a subset?

> and the Cc's - or better yet, the Reviewed-by's etc - so that when I
> get a pull request, it really is very obvious to me when I look at it
> that others really have been involved.
> 
> So if I continue to see just
> 
>     Signed-off-by: David Howells <dhowells@redhat.com>
> 
> at the end of the commit messages, I will not pull.
> 
> Yes, in this thread a couple of people have piped up and said that
> they were part of the discussion and that they are interested, but if
> I have to start asking around just to see that, then it's too little,
> too late.
> 
> No more of this "it looks like David Howells did things in private". I
> want links I can follow to see the discussion, and I really want to
> see that others really have been involved.
> 
> Ok?

Sure.

I can go and edit in link pointers into the existing patches if you want and
add Jeff's Review-and-tested-by into the appropriate ones.  You would be able
to compare against the existing tag, so it wouldn't entirely invalidate the
testing.

Also, do you want links inserting into all the patches of the two keyrings
pull requests I've sent you?

David

David Wysochanski Feb. 12, 2021, 4:40 p.m. UTC | #15

On Thu, Feb 11, 2021 at 6:20 PM David Howells <dhowells@redhat.com> wrote:
>
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > Also, honestly, I really *REALLY* want your commit messages to talk
> > about who has been cc'd, who has been part of development, and point
> > to the PUBLIC MAILING LISTS WHERE THAT DISCUSSION WAS TAKING PLACE, so
> > that I can actually see that "yes, other people were involved"
>
> Most of the development discussion took place on IRC and waving snippets of
> code about in pastebin rather than email - the latency of email is just too
> high.  There's not a great deal I can do about that now as I haven't kept IRC
> logs.  I can do that in future if you want.
>
> > No, I don't require this in general, but exactly because of the
> > history we have, I really really want to see that. I want to see a
> >
> >    Link: https://lore.kernel.org/r/....
>
> I can add links to where I've posted the stuff for review.  Do you want this
> on a per-patch basis or just in the cover for now?
>
> Also, do you want things like these:
>
>  https://lore.kernel.org/linux-fsdevel/3326.1579019665@warthog.procyon.org.uk/
>  https://lore.kernel.org/linux-fsdevel/4467.1579020509@warthog.procyon.org.uk/
>
> which pertain to the overall fscache rewrite, but where the relevant changes
> didn't end up included in this particular patchset?  Or this:
>
>  https://listman.redhat.com/archives/linux-cachefs/2020-December/msg00000.html
>
> where someone was testing the overall patchset of which this is a subset?
>
> > and the Cc's - or better yet, the Reviewed-by's etc - so that when I
> > get a pull request, it really is very obvious to me when I look at it
> > that others really have been involved.
> >
> > So if I continue to see just
> >
> >     Signed-off-by: David Howells <dhowells@redhat.com>
> >
> > at the end of the commit messages, I will not pull.
> >
> > Yes, in this thread a couple of people have piped up and said that
> > they were part of the discussion and that they are interested, but if
> > I have to start asking around just to see that, then it's too little,
> > too late.
> >
> > No more of this "it looks like David Howells did things in private". I
> > want links I can follow to see the discussion, and I really want to
> > see that others really have been involved.
> >
> > Ok?
>
> Sure.
>
> I can go and edit in link pointers into the existing patches if you want and
> add Jeff's Review-and-tested-by into the appropriate ones.  You would be able
> to compare against the existing tag, so it wouldn't entirely invalidate the
> testing.
>
You can add my Tested-by for your fscache-next branch series ending at
commit  235299002012 netfs: Hold a ref on a page when PG_private_2 is set
This series includes your commit c723f0232c9f8928b3b15786499637bda3121f41
discussed a little earlier in this email thread.

I ran over 24 hours of NFS tests (unit, connectathon, xfstests,
various servers and all NFS versions) on your latest series
and it looks good.  Note I did not run against pNFS servers
due to known issue, and I did not do more advanced tests like
error injections.  I did get one OOM on xfstest generic/551 on
one testbed, but that same' test passed on another testbed,
so it's not clear what is happening there and it could very
well be testbed or NFS related.

In addition, I reviewed various patches in the series, especially the
API portions of the netfs patches, so for those, Reviewed-by is
appropriate as well. I have also reviewed some of the internals
of the other infrastructure patches, but my review is more limited
there.





> Also, do you want links inserting into all the patches of the two keyrings
> pull requests I've sent you?
>
> David
>

Linus Torvalds Feb. 13, 2021, 1:05 a.m. UTC | #16

On Thu, Feb 11, 2021 at 3:21 PM David Howells <dhowells@redhat.com> wrote:
>
> Most of the development discussion took place on IRC and waving snippets of
> code about in pastebin rather than email - the latency of email is just too
> high.  There's not a great deal I can do about that now as I haven't kept IRC
> logs.  I can do that in future if you want.

No, I really don't.

IRC is fine for discussing ideas about how to solve things.

But no, it's not a replacement for actual code review after the fact.

If you think email has too long latency for review, and can't use
public mailing lists and cc the people who are maintainers, then I
simply don't want your patches.

You need to fix your development model. This whole "I need to get
feedback from whoever still uses irc and is active RIGHT NOW" is not a
valid model. It's fine for brainstorming for possible approaches, and
getting ideas, sure.

               Linus

David Howells Feb. 15, 2021, 12:22 a.m. UTC | #17

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> But no, it's not a replacement for actual code review after the fact.
> 
> If you think email has too long latency for review, and can't use
> public mailing lists and cc the people who are maintainers, then I
> simply don't want your patches.

I think we were talking at cross-purposes by the term "development" here.  I
was referring to the discussion of how the implementation should be done and
working closely with colleagues - both inside and outside Red Hat - to get
things working, not specifically the public review side of things.  It's just
that I don't have a complete record of the how-to-implement-it, the
how-to-get-various-bits-working-together and the why-is-it-not-working?
discussions.

Anyway, I have posted my fscache modernisation patches multiple times for
public review, I have tried to involve the wider community in aspects of the
development on public mailing lists and I have been including the maintainers
in to/cc.

I've posted the more full patchset for public review a number of times:

4th May 2020:
https://lore.kernel.org/linux-fsdevel/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/

13th Jul (split into three subsets):
https://lore.kernel.org/linux-fsdevel/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/

20th Nov:
https://lore.kernel.org/linux-fsdevel/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/

I then cut it down and posted that publically a couple of times:

20th Jan:
https://lore.kernel.org/linux-fsdevel/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/

25th Jan:
https://lore.kernel.org/linux-fsdevel/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/

I let you know what was coming here:
https://lore.kernel.org/linux-fsdevel/447452.1596109876@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/2522190.1612544534@warthog.procyon.org.uk/

to try and find out whether you were going to have any objections to the
design in advance, rather than at the last minute.

I've apprised people of what I was up to:
https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/2758811.1610621106@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/1441311.1598547738@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/160655.1611012999@warthog.procyon.org.uk/

Asked for consultation on parts of what I wanted to do:
https://lore.kernel.org/linux-fsdevel/3326.1579019665@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/4467.1579020509@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/3577430.1579705075@warthog.procyon.org.uk/

Asked someone who is actually using fscache in production to test the rewrite:
https://listman.redhat.com/archives/linux-cachefs/2020-December/msg00000.html

I've posted partial patches to try and help 9p and cifs along:
https://lore.kernel.org/linux-fsdevel/1514086.1605697347@warthog.procyon.org.uk/
https://lore.kernel.org/linux-cifs/1794123.1605713481@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/241017.1612263863@warthog.procyon.org.uk/
https://lore.kernel.org/linux-cifs/270998.1612265397@warthog.procyon.org.uk/

(Jeff has been handling Ceph and Dave NFS).

Proposed conference topics related to this:
https://lore.kernel.org/linux-fsdevel/9608.1575900019@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/14196.1575902815@warthog.procyon.org.uk/
https://lore.kernel.org/linux-fsdevel/364531.1579265357@warthog.procyon.org.uk/

though the lockdown put paid to that:-(

Willy has discussed it too:
https://lore.kernel.org/linux-fsdevel/20200826193116.GU17456@casper.infradead.org/

David

Linus Torvalds Feb. 15, 2021, 1:01 a.m. UTC | #18

On Sun, Feb 14, 2021 at 4:23 PM David Howells <dhowells@redhat.com> wrote:
>
> Anyway, I have posted my fscache modernisation patches multiple times for
> public review, I have tried to involve the wider community in aspects of the
> development on public mailing lists and I have been including the maintainers
> in to/cc.

So then add those links and the cc's to the commit logs, so that I can
*see* them.

I'm done with this discussion.

If I see a pull request from you, I DO NOT WANT TO HAVE TO HAVE A
WEEK-LONG EMAIL THREAD ABOUT HOW I CANNOT SEE THAT IT HAS EVER SEEN
ANY REVIEW.

So if all I see is "Signed-off-by:" from you, I will promptly throw
that pull request into the garbage, because it's just not worth my
time to try to have to get you kicking and screaming to show that
others have been involved.

Can you not understand that?

When I get that pull request, I need to see that yes, this has been
reviewed, people have been involved, and yes, it's been in linux-next.

I want to see "reviewed-by" and "tested-by", I want to see "cc", and I
want to see links to submission threads with discussion showing that
others actually were involved.

I do *not* want to see just a single signed-off-by line from you, and
then have to ask for "has anybody else actually seen this and reviewed
it".

Look, here's an entirely unrelated example from a single fairly recent
trivial one-liner memory leak fix:

    Fixes: 87c715dcde63 ("scsi: scsi_debug: Add per_host_store option")
    Link: https://lore.kernel.org/r/20210208111734.34034-1-mlombard@redhat.com
    Acked-by: Douglas Gilbert <dgilbert@interlog.com>
    Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

that's from a quite a trivial commit. Yes, it's trivial, but it could
still be wrong, of course. And if somebody ever reports that it causes
problems despite how simple it was, look at what I have: I have three
people to contact, and I have a pointer to the actual original
submission of the patch.

Do we have that for all our commits? No. But it's also not at all
unusual any more, and in fact many commits have even more, with
testing etc.

And yes, sometimes the test results and acks come back later after
you've already pushed the changes out etc, and no, it's generally not
worth rebasing for that - maybe others have now started to rely on
whatever public branch you have. Which is why the "Link:" is useful,
so that if things come in later, the discussion can still be found.
But quite often, you shouldn't have pushed out some final branch
before you've gotten at least *some* positive response from people, so
I do kind of expect some "Acked-by" etc in the commit itself.

THAT is what you need to aim for.

And yes, I'm picking on you. Because we've had this problem before.
I've complained when you've sent me pull requests that don't even
build, that you in fact had been told by linux-next didn't build, and
you still sent them to me.

And as a result, I've asked for more involvement from other people before.

So now I'm clarifying that requirement - I  absolutely need to see
that it has actually seen testing, that it has seen other people being
involved, and that it isn't just you throwing spaghetti at the wall to
see what sticks.

And I'm not going to do that for every pull request. I want to see
that data *in* the pull request itself.

            Linus

[GIT,PULL] fscache: I/O API modernisation and netfs helper library

Pull-request

Message

Comments