mbox series

[00/17] smb3: Use iov_iters down to the network transport and fix DIO page pinning

Message ID 20230216214745.3985496-1-dhowells@redhat.com (mailing list archive)
Headers show
Series smb3: Use iov_iters down to the network transport and fix DIO page pinning | expand

Message

David Howells Feb. 16, 2023, 9:47 p.m. UTC
Hi Steve,

Here's an updated version of my patchset to make the cifs/smb3 driver pass
iov_iters down to the lowest layers where they can be passed directly to
the network transport rather than passing lists of pages around.

The series deals with the following issues:

 (-) By pinning pages, it fixes the race between concurrent DIO read and
     fork, whereby the pages containing the DIO read buffer may end up
     belonging to the child process and not the parent - with the result
     that the parent might not see the retrieved data.

 (-) cifs shouldn't take refs on pages extracted from non-user-backed
     iterators (eg. KVEC).  With these changes, cifs will apply the
     appropriate cleanup.  Note that there is the possibility the network
     transport might, but that's beyond the scope of this patchset.

 (-) Making it easier to transition to using folios in cifs rather than
     pages by dealing with them through BVEC and XARRAY iterators.

The first five patches add two facilities to the VM/VFS core, excerpts from
my iov-extract branch[1] that are required in order to do the cifs
iteratorisation:

 (*) Future replacements for file-splicing in the form of functions
     filemap_splice_read() and direct_splice_read().  These allow file
     splicing to be done without the use of an ITER_PIPE iterator, without
     the need to take refs on the pages extracted from KVEC/BVEC/XARRAY
     iterators.  This is necessary to use iov_iter_extract_pages().

     [!] Note that whilst these are added in core code, they are only used
     by cifs at this point.

 (*) Add iov_iter_extract_pages(), a replacement for iov_iter_get_pages*()
     that uses FOLL_PIN on user pages (IOVEC, UBUF) and doesn't pin kernel
     pages (BVEC, KVEC, XARRAY).  This allows cifs to do the page pinning
     correctly.

     [!] Note that whilst this is added in core code, it is only used by
     cifs at this point - though a corresponding change is made to the
     flags argument of iov_iter_get_pages*() so that it doesn't take FOLL_*
     flags, but rather takes iov_iter_extraction_t flags that are
     translated internally to FOLL_* flags.

Then there's a couple of patches to make cifs use the new splice functions.

The series continues with a couple of patches that add stuff to netfslib
that I want to use there as well as in cifs:

 (*) Add a netfslib function to extract and pin pages from an ITER_IOBUF or
     ITER_UBUF iterator into an ITER_BVEC iterator.

 (*) Add a netfslib function to extract pages from an iterator that's of
     type ITER_UBUF/IOVEC/BVEC/KVEC/XARRAY and add them to a scatterlist.
     The cleanup will need to be done as for iov_iter_extract_pages().

     BVEC, KVEC and XARRAY iterators can be rendered into elements that
     span multiple pages.

Added to that are some cifs helpers that work with iterators:

 (*) Add a function to walk through an ITER_BVEC/KVEC/XARRAY iterator and
     add elements to an RDMA SGE list.  Only the DMA addresses are stored,
     and an element may span multiple pages (say if an xarray contains a
     multipage folio).

 (*) Add a function to walk through an ITER_BVEC/KVEC/XARRAY iterator and
     pass the contents into a shash function.

 (*) Add functions to walk through an ITER_XARRAY iterator and perform
     various sorts of cleanup on the folios held therein, to be used on I/O
     completion.

 (*) Add a function to read from the transport TCP socket directly into an
     iterator.

Finally come the patches that actually do the work of iteratorising cifs:

 (*) The main patch.  Replace page lists with iterators.  It extracts the
     pages from ITER_UBUF and ITER_IOVEC iterators to an ITER_BVEC
     iterator, pinning or getting refs on them, before passing them down as
     the I/O may be done from a worker thread.

     The iterator is extracted into a scatterlist in order to talk to the
     crypto interface or to do RDMA.

 (*) In the cifs RDMA code, extract the iterator into an RDMA SGE[] list,
     removing the scatterlist intermediate - at least for smbd_send().
     There appear to be other ways for cifs to talk to the RDMA layer that
     don't go through that that I haven't managed to work out.

 (*) Remove a chunk of now-unused code.

 (*) Allow DIO to/from KVEC-type iterators.

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-cifs

David

Link: https://lore.kernel.org/r/20230214171330.2722188-1-dhowells@redhat.com/ [1]
Link: https://lore.kernel.org/r/166697254399.61150.1256557652599252121.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/20230131182855.4027499-1-dhowells@redhat.com/ # v1

David Howells (17):
  mm: Pass info, not iter, into filemap_get_pages()
  splice: Add a func to do a splice from a buffered file without
    ITER_PIPE
  splice: Add a func to do a splice from an O_DIRECT file without
    ITER_PIPE
  iov_iter: Define flags to qualify page extraction.
  iov_iter: Add a function to extract a page list from an iterator
  splice: Export filemap/direct_splice_read()
  cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE
  netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator
  netfs: Add a function to extract an iterator into a scatterlist
  cifs: Add a function to build an RDMA SGE list from an iterator
  cifs: Add a function to Hash the contents of an iterator
  cifs: Add some helper functions
  cifs: Add a function to read into an iter from a socket
  cifs: Change the I/O paths to use an iterator rather than a page list
  cifs: Build the RDMA SGE list directly from an iterator
  cifs: Remove unused code
  cifs: DIO to/from KVEC-type iterators should now work

 block/bio.c               |    6 +-
 block/blk-map.c           |    8 +-
 fs/cifs/Kconfig           |    1 +
 fs/cifs/cifsencrypt.c     |  172 +++-
 fs/cifs/cifsfs.c          |   12 +-
 fs/cifs/cifsfs.h          |    6 +
 fs/cifs/cifsglob.h        |   66 +-
 fs/cifs/cifsproto.h       |   11 +-
 fs/cifs/cifssmb.c         |   15 +-
 fs/cifs/connect.c         |   14 +
 fs/cifs/file.c            | 1772 ++++++++++++++++---------------------
 fs/cifs/fscache.c         |   22 +-
 fs/cifs/fscache.h         |   10 +-
 fs/cifs/misc.c            |  128 +--
 fs/cifs/smb2ops.c         |  362 ++++----
 fs/cifs/smb2pdu.c         |   53 +-
 fs/cifs/smbdirect.c       |  535 ++++++-----
 fs/cifs/smbdirect.h       |    7 +-
 fs/cifs/transport.c       |   54 +-
 fs/netfs/Makefile         |    1 +
 fs/netfs/iterator.c       |  371 ++++++++
 fs/splice.c               |   93 ++
 include/linux/fs.h        |    6 +
 include/linux/netfs.h     |    8 +
 include/linux/pipe_fs_i.h |   20 +
 include/linux/uio.h       |   35 +-
 lib/iov_iter.c            |  284 +++++-
 mm/filemap.c              |  156 +++-
 mm/internal.h             |    6 +
 mm/vmalloc.c              |    1 +
 30 files changed, 2515 insertions(+), 1720 deletions(-)
 create mode 100644 fs/netfs/iterator.c

Comments

Steve French Feb. 17, 2023, 5:52 a.m. UTC | #1
tentatively merged the first 13 of this series into cifs-2.6.git
for-next (pending additional testing and any more review comments)

On Thu, Feb 16, 2023 at 3:47 PM David Howells <dhowells@redhat.com> wrote:
>
> Hi Steve,
>
> Here's an updated version of my patchset to make the cifs/smb3 driver pass
> iov_iters down to the lowest layers where they can be passed directly to
> the network transport rather than passing lists of pages around.
>
> The series deals with the following issues:
>
>  (-) By pinning pages, it fixes the race between concurrent DIO read and
>      fork, whereby the pages containing the DIO read buffer may end up
>      belonging to the child process and not the parent - with the result
>      that the parent might not see the retrieved data.
>
>  (-) cifs shouldn't take refs on pages extracted from non-user-backed
>      iterators (eg. KVEC).  With these changes, cifs will apply the
>      appropriate cleanup.  Note that there is the possibility the network
>      transport might, but that's beyond the scope of this patchset.
>
>  (-) Making it easier to transition to using folios in cifs rather than
>      pages by dealing with them through BVEC and XARRAY iterators.
>
> The first five patches add two facilities to the VM/VFS core, excerpts from
> my iov-extract branch[1] that are required in order to do the cifs
> iteratorisation:
>
>  (*) Future replacements for file-splicing in the form of functions
>      filemap_splice_read() and direct_splice_read().  These allow file
>      splicing to be done without the use of an ITER_PIPE iterator, without
>      the need to take refs on the pages extracted from KVEC/BVEC/XARRAY
>      iterators.  This is necessary to use iov_iter_extract_pages().
>
>      [!] Note that whilst these are added in core code, they are only used
>      by cifs at this point.
>
>  (*) Add iov_iter_extract_pages(), a replacement for iov_iter_get_pages*()
>      that uses FOLL_PIN on user pages (IOVEC, UBUF) and doesn't pin kernel
>      pages (BVEC, KVEC, XARRAY).  This allows cifs to do the page pinning
>      correctly.
>
>      [!] Note that whilst this is added in core code, it is only used by
>      cifs at this point - though a corresponding change is made to the
>      flags argument of iov_iter_get_pages*() so that it doesn't take FOLL_*
>      flags, but rather takes iov_iter_extraction_t flags that are
>      translated internally to FOLL_* flags.
>
> Then there's a couple of patches to make cifs use the new splice functions.
>
> The series continues with a couple of patches that add stuff to netfslib
> that I want to use there as well as in cifs:
>
>  (*) Add a netfslib function to extract and pin pages from an ITER_IOBUF or
>      ITER_UBUF iterator into an ITER_BVEC iterator.
>
>  (*) Add a netfslib function to extract pages from an iterator that's of
>      type ITER_UBUF/IOVEC/BVEC/KVEC/XARRAY and add them to a scatterlist.
>      The cleanup will need to be done as for iov_iter_extract_pages().
>
>      BVEC, KVEC and XARRAY iterators can be rendered into elements that
>      span multiple pages.
>
> Added to that are some cifs helpers that work with iterators:
>
>  (*) Add a function to walk through an ITER_BVEC/KVEC/XARRAY iterator and
>      add elements to an RDMA SGE list.  Only the DMA addresses are stored,
>      and an element may span multiple pages (say if an xarray contains a
>      multipage folio).
>
>  (*) Add a function to walk through an ITER_BVEC/KVEC/XARRAY iterator and
>      pass the contents into a shash function.
>
>  (*) Add functions to walk through an ITER_XARRAY iterator and perform
>      various sorts of cleanup on the folios held therein, to be used on I/O
>      completion.
>
>  (*) Add a function to read from the transport TCP socket directly into an
>      iterator.
>
> Finally come the patches that actually do the work of iteratorising cifs:
>
>  (*) The main patch.  Replace page lists with iterators.  It extracts the
>      pages from ITER_UBUF and ITER_IOVEC iterators to an ITER_BVEC
>      iterator, pinning or getting refs on them, before passing them down as
>      the I/O may be done from a worker thread.
>
>      The iterator is extracted into a scatterlist in order to talk to the
>      crypto interface or to do RDMA.
>
>  (*) In the cifs RDMA code, extract the iterator into an RDMA SGE[] list,
>      removing the scatterlist intermediate - at least for smbd_send().
>      There appear to be other ways for cifs to talk to the RDMA layer that
>      don't go through that that I haven't managed to work out.
>
>  (*) Remove a chunk of now-unused code.
>
>  (*) Allow DIO to/from KVEC-type iterators.
>
> I've pushed the patches here also:
>
>         https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-cifs
>
> David
>
> Link: https://lore.kernel.org/r/20230214171330.2722188-1-dhowells@redhat.com/ [1]
> Link: https://lore.kernel.org/r/166697254399.61150.1256557652599252121.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/20230131182855.4027499-1-dhowells@redhat.com/ # v1
>
> David Howells (17):
>   mm: Pass info, not iter, into filemap_get_pages()
>   splice: Add a func to do a splice from a buffered file without
>     ITER_PIPE
>   splice: Add a func to do a splice from an O_DIRECT file without
>     ITER_PIPE
>   iov_iter: Define flags to qualify page extraction.
>   iov_iter: Add a function to extract a page list from an iterator
>   splice: Export filemap/direct_splice_read()
>   cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE
>   netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator
>   netfs: Add a function to extract an iterator into a scatterlist
>   cifs: Add a function to build an RDMA SGE list from an iterator
>   cifs: Add a function to Hash the contents of an iterator
>   cifs: Add some helper functions
>   cifs: Add a function to read into an iter from a socket
>   cifs: Change the I/O paths to use an iterator rather than a page list
>   cifs: Build the RDMA SGE list directly from an iterator
>   cifs: Remove unused code
>   cifs: DIO to/from KVEC-type iterators should now work
>
>  block/bio.c               |    6 +-
>  block/blk-map.c           |    8 +-
>  fs/cifs/Kconfig           |    1 +
>  fs/cifs/cifsencrypt.c     |  172 +++-
>  fs/cifs/cifsfs.c          |   12 +-
>  fs/cifs/cifsfs.h          |    6 +
>  fs/cifs/cifsglob.h        |   66 +-
>  fs/cifs/cifsproto.h       |   11 +-
>  fs/cifs/cifssmb.c         |   15 +-
>  fs/cifs/connect.c         |   14 +
>  fs/cifs/file.c            | 1772 ++++++++++++++++---------------------
>  fs/cifs/fscache.c         |   22 +-
>  fs/cifs/fscache.h         |   10 +-
>  fs/cifs/misc.c            |  128 +--
>  fs/cifs/smb2ops.c         |  362 ++++----
>  fs/cifs/smb2pdu.c         |   53 +-
>  fs/cifs/smbdirect.c       |  535 ++++++-----
>  fs/cifs/smbdirect.h       |    7 +-
>  fs/cifs/transport.c       |   54 +-
>  fs/netfs/Makefile         |    1 +
>  fs/netfs/iterator.c       |  371 ++++++++
>  fs/splice.c               |   93 ++
>  include/linux/fs.h        |    6 +
>  include/linux/netfs.h     |    8 +
>  include/linux/pipe_fs_i.h |   20 +
>  include/linux/uio.h       |   35 +-
>  lib/iov_iter.c            |  284 +++++-
>  mm/filemap.c              |  156 +++-
>  mm/internal.h             |    6 +
>  mm/vmalloc.c              |    1 +
>  30 files changed, 2515 insertions(+), 1720 deletions(-)
>  create mode 100644 fs/netfs/iterator.c
>
David Howells Feb. 17, 2023, 8:22 a.m. UTC | #2
Steve French <smfrench@gmail.com> wrote:

> tentatively merged the first 13 of this series into cifs-2.6.git
> for-next (pending additional testing and any more review comments)

I've fixed the mistakes in the descriptions of patches 3 and 11 pointed out by
you and Eric and fixed up most of the checkpatch warnings in 14.  I've the the
code-to-be-removed #if'd out as it's removed in patch 16.

David