mbox series

[00/16] Consolidate iommu page table implementations

Message ID 0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com (mailing list archive)
Headers show
Series Consolidate iommu page table implementations | expand

Message

Jason Gunthorpe Aug. 15, 2024, 3:11 p.m. UTC
Currently each of the iommu page table formats duplicates all of the logic
to maintain the page table and perform map/unmap/etc operations. There are
several different versions of the algorithms between all the different
formats. The io-pgtable system provides an interface to help isolate the
page table code from the iommu driver, but doesn't provide tools to
implement the common algorithms.

This makes it very hard to improve the state of the pagetable code under
the iommu domains as any proposed improvement needs to alter a large
number of different driver code paths. Combined with a lack of software
based testing this makes improvement in this area very hard.

iommufd wants several new page table operations:
 - More efficient map/unmap operations, using iommufd's batching approach
 - unmap that returns the physical addresses into a batch as it progresses
 - cut that allows splitting areas so large pages can have holes
   poked in them dynamically
 - More agressive freeing of table memory to avoid waste
 - Fragmenting large pages so that dirty tracking can run efficiently
 - Reassembling large pages so that VMs can run at full IO performance
   in error flows

In addition there are possibilities like directly mapping a bvec, or
sg_list in more efficient ways, and perhaps even optimizations for the GPU
drivers using the io-pgtable code as well.

Together these are algorithmically complex enough to be a very significant
task to go and implement in all the page table formats we support. Just
the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86
PAE / AMDv1 / VT-D SS / RISCV)

Instead of doing the duplicated work, this series takes the first step to
consolidate the algorithms into one places. In spirit it is similar to the
work Christoph did a few years back to pull the redundant get_user_pages()
implementations out of the arch code into core MM. This unlocked a great
deal of improvement in that space in the following years. I would like to
see the same benefit in iommu as well.

The approach is split into three deliberate layers:
 - The truely generic page table components. These are very application
   neutral and could conceivably by used in the MM or KVM if there was
   a reason. A DRM driver may also be interested in this layer as it
   could be more efficient than working through the iommu focused ops.

 - The per format functions. These are a set of small inline functions
   that abstract the details on how the page table is layed out in
   memory and what bits do what things. Like the MM these functions
   share the same name so the same code can be compiled against
   different formats by including the appropriate format header.

 - An iommu implementation. This is intended to create ops that can
   take over from the iommu_domain ops. There is a single set of C
   routines that compile against all the formats generically.

On top of this are two kunit tests, one that directly exercises the iommu
implementation across all the different formats. The second kunit does an
A/B comparison between the iommupt and the io-pgtable implementation to
ensure things are identical.

Sort of like MM, this uses multi-compilation where the common code
includes format specific headers that implement the same C API. Unlike the
MM we need to build multiple page table formats into the same kernel, so
each combinaton of format/parameters/iommu implementation is compiled in a
single compilation unit and into a module. This results in compiling the
same C code multiple times in a single kernel build, using different
combinations of header files.

The approach is designed to be able to provide both mm-like fully inlined
performance, or as typical for iommu, recursive non-inlined smaller .text
version. As the implementation is now shared it will be worthwhile to do
some performance work and fine tune this as appropriate.

I've CC'd a few people from outside iommu that may have some interest in
the generic part of this, or ideas how to better build the abstraction and
helpers.

For this RFC I've provided draft formats for nearly everything (S390 and
RISCV are notably not included). The formats all pass the compare test and
thus, to a significant degree, produce the same memory layouts for the
radix tree. The primary purpose of this breadth is to prove the common API
is suitable for the job. Completing these to be fully usable in their
respective drivers is still to be done.

I'm expecting to show maybe another RFC round with all the formats and
pivot to a more focused series, likely just for AMD, that brings the
minimum necessary. From there we can work in parallel to add the new
iommufd features and convert more of the drievrs. From an iommufd
perspective I would like the "server" drivers (AMD / SMMUv3 / VT-D) to be
converted as a minimum.

This general concept was brough up and discussed a few times during LPC
last year and I have a formal session on the schedule for this series in
LPC Vienna.

There are many additional support patches required to run the kunits,
everything is on github:

https://github.com/jgunthorpe/linux/commits/iommu_pt

FIXME:
  - Improve the two kunit tests
  - Implement additional new iommufd ops
  - Implement the debugfs with the RCU safety
  - Do a performance study vs the io-pgtable versions
  - Implement the flush callbacks, iommu core hookups, etc
  - Look at possible bvec and sg optimizations
  - Link it up to the iommu drivers and test it in HW as an iommu
    implementation

Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: iommu@lists.linux.dev
Cc: kvm@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (16):
  genpt: Generic Page Table base API
  genpt: Add a specialized allocator for page table levels
  iommupt: Add the basic structure of the iommu implementation
  iommupt: Add iova_to_phys op
  iommupt: Add unmap_pages op
  iommupt: Add map_pages op
  iommupt: Add cut_mapping op
  iommupt: Add read_and_clear_dirty op
  iommupt: Add a kunit test for Generic Page Table and the IOMMU
    implementation
  iommupt: Add a kunit test to compare against iopt
  iommupt: Add the 64 bit ARMv8 page table format
  iommupt: Add the AMD IOMMU v1 page table format
  iommupt: Add the x86 PAE page table format
  iommupt: Add the DART v1/v2 page table format
  iommupt: Add the 32 bit ARMv7s page table format
  iommupt: Add the Intel VT-D second stage page table format

 .clang-format                                 |    1 +
 drivers/iommu/Kconfig                         |    2 +
 drivers/iommu/Makefile                        |    1 +
 drivers/iommu/generic_pt/.kunitconfig         |   23 +
 drivers/iommu/generic_pt/Kconfig              |  117 ++
 drivers/iommu/generic_pt/Makefile             |    7 +
 drivers/iommu/generic_pt/fmt/Makefile         |   35 +
 drivers/iommu/generic_pt/fmt/amdv1.h          |  372 ++++++
 drivers/iommu/generic_pt/fmt/armv7s.h         |  529 +++++++++
 drivers/iommu/generic_pt/fmt/armv8.h          |  621 ++++++++++
 drivers/iommu/generic_pt/fmt/dart.h           |  371 ++++++
 drivers/iommu/generic_pt/fmt/defs_amdv1.h     |   21 +
 drivers/iommu/generic_pt/fmt/defs_armv7s.h    |   23 +
 drivers/iommu/generic_pt/fmt/defs_armv8.h     |   28 +
 drivers/iommu/generic_pt/fmt/defs_dart.h      |   21 +
 drivers/iommu/generic_pt/fmt/defs_vtdss.h     |   21 +
 drivers/iommu/generic_pt/fmt/defs_x86pae.h    |   21 +
 drivers/iommu/generic_pt/fmt/iommu_amdv1.c    |    9 +
 drivers/iommu/generic_pt/fmt/iommu_armv7s.c   |   11 +
 .../iommu/generic_pt/fmt/iommu_armv8_16k.c    |   13 +
 drivers/iommu/generic_pt/fmt/iommu_armv8_4k.c |   13 +
 .../iommu/generic_pt/fmt/iommu_armv8_64k.c    |   13 +
 drivers/iommu/generic_pt/fmt/iommu_dart.c     |    8 +
 drivers/iommu/generic_pt/fmt/iommu_template.h |   49 +
 drivers/iommu/generic_pt/fmt/iommu_vtdss.c    |    8 +
 drivers/iommu/generic_pt/fmt/iommu_x86pae.c   |    8 +
 drivers/iommu/generic_pt/fmt/vtdss.h          |  276 +++++
 drivers/iommu/generic_pt/fmt/x86pae.h         |  283 +++++
 drivers/iommu/generic_pt/iommu_pt.h           | 1030 +++++++++++++++++
 drivers/iommu/generic_pt/kunit_generic_pt.h   |  576 +++++++++
 drivers/iommu/generic_pt/kunit_iommu.h        |  105 ++
 drivers/iommu/generic_pt/kunit_iommu_cmp.h    |  272 +++++
 drivers/iommu/generic_pt/kunit_iommu_pt.h     |  352 ++++++
 drivers/iommu/generic_pt/pt_alloc.c           |  174 +++
 drivers/iommu/generic_pt/pt_alloc.h           |   98 ++
 drivers/iommu/generic_pt/pt_common.h          |  311 +++++
 drivers/iommu/generic_pt/pt_defs.h            |  276 +++++
 drivers/iommu/generic_pt/pt_fmt_defaults.h    |  109 ++
 drivers/iommu/generic_pt/pt_iter.h            |  468 ++++++++
 drivers/iommu/generic_pt/pt_log2.h            |  131 +++
 include/linux/generic_pt/common.h             |  156 +++
 include/linux/generic_pt/iommu.h              |  344 ++++++
 42 files changed, 7307 insertions(+)
 create mode 100644 drivers/iommu/generic_pt/.kunitconfig
 create mode 100644 drivers/iommu/generic_pt/Kconfig
 create mode 100644 drivers/iommu/generic_pt/Makefile
 create mode 100644 drivers/iommu/generic_pt/fmt/Makefile
 create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h
 create mode 100644 drivers/iommu/generic_pt/fmt/armv7s.h
 create mode 100644 drivers/iommu/generic_pt/fmt/armv8.h
 create mode 100644 drivers/iommu/generic_pt/fmt/dart.h
 create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h
 create mode 100644 drivers/iommu/generic_pt/fmt/defs_armv7s.h
 create mode 100644 drivers/iommu/generic_pt/fmt/defs_armv8.h
 create mode 100644 drivers/iommu/generic_pt/fmt/defs_dart.h
 create mode 100644 drivers/iommu/generic_pt/fmt/defs_vtdss.h
 create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86pae.h
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv7s.c
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv8_16k.c
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv8_4k.c
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv8_64k.c
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_dart.c
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_vtdss.c
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86pae.c
 create mode 100644 drivers/iommu/generic_pt/fmt/vtdss.h
 create mode 100644 drivers/iommu/generic_pt/fmt/x86pae.h
 create mode 100644 drivers/iommu/generic_pt/iommu_pt.h
 create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h
 create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h
 create mode 100644 drivers/iommu/generic_pt/kunit_iommu_cmp.h
 create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h
 create mode 100644 drivers/iommu/generic_pt/pt_alloc.c
 create mode 100644 drivers/iommu/generic_pt/pt_alloc.h
 create mode 100644 drivers/iommu/generic_pt/pt_common.h
 create mode 100644 drivers/iommu/generic_pt/pt_defs.h
 create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h
 create mode 100644 drivers/iommu/generic_pt/pt_iter.h
 create mode 100644 drivers/iommu/generic_pt/pt_log2.h
 create mode 100644 include/linux/generic_pt/common.h
 create mode 100644 include/linux/generic_pt/iommu.h


base-commit: fdc4344ef3ee7741df149967893fb61240520ab3