Message ID | 20231130201504.2322355-11-pasha.tatashin@soleen.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | IOMMU memory observability | expand |
On Thu, 30 Nov 2023, Pasha Tatashin wrote: > In order to be able to limit the amount of memory that is allocated > by IOMMU subsystem, the memory must be accounted. > > Account IOMMU as part of the secondary pagetables as it was discussed > at LPC. > > The value of SecPageTables now contains mmeory allocation by IOMMU > and KVM. > > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> > --- > Documentation/admin-guide/cgroup-v2.rst | 2 +- > Documentation/filesystems/proc.rst | 4 ++-- > drivers/iommu/iommu-pages.h | 2 ++ > include/linux/mmzone.h | 2 +- > 4 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 3f85254f3cef..e004e05a7cde 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back. > sec_pagetables > Amount of memory allocated for secondary page tables, > this currently includes KVM mmu allocations on x86 > - and arm64. > + and arm64 and IOMMU page tables. Hmm, if existing users are parsing this field and alerting when it exceeds an expected value (a cloud provider, let's say), is it safe to add in a whole new set of page tables? I understand the documentation allows for it, but I think potential impact on userspace would be more interesting. > > percpu (npn) > Amount of memory used for storing per-cpu kernel > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > index 49ef12df631b..86f137a9b66b 100644 > --- a/Documentation/filesystems/proc.rst > +++ b/Documentation/filesystems/proc.rst > @@ -1110,8 +1110,8 @@ KernelStack > PageTables > Memory consumed by userspace page tables > SecPageTables > - Memory consumed by secondary page tables, this currently > - currently includes KVM mmu allocations on x86 and arm64. > + Memory consumed by secondary page tables, this currently includes > + KVM mmu and IOMMU allocations on x86 and arm64. > NFS_Unstable > Always zero. Previous counted pages which had been written to > the server, but has not been committed to stable storage. > diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h > index 69895a355c0c..cdd257585284 100644 > --- a/drivers/iommu/iommu-pages.h > +++ b/drivers/iommu/iommu-pages.h > @@ -27,6 +27,7 @@ static inline void __iommu_alloc_account(struct page *pages, int order) > const long pgcnt = 1l << order; > > mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, pgcnt); > + mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, pgcnt); > } > > /** > @@ -39,6 +40,7 @@ static inline void __iommu_free_account(struct page *pages, int order) > const long pgcnt = 1l << order; > > mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, -pgcnt); > + mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, -pgcnt); > } > > /** > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 1a4d0bba3e8b..aaabb385663c 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -199,7 +199,7 @@ enum node_stat_item { > NR_KERNEL_SCS_KB, /* measured in KiB */ > #endif > NR_PAGETABLE, /* used for pagetables */ > - NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */ > + NR_SECONDARY_PAGETABLE, /* secondary pagetables, KVM & IOMMU */ > #ifdef CONFIG_IOMMU_SUPPORT > NR_IOMMU_PAGES, /* # of pages allocated by IOMMU */ > #endif > -- > 2.43.0.rc2.451.g8631bc7472-goog > > >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > index 3f85254f3cef..e004e05a7cde 100644 > > --- a/Documentation/admin-guide/cgroup-v2.rst > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back. > > sec_pagetables > > Amount of memory allocated for secondary page tables, > > this currently includes KVM mmu allocations on x86 > > - and arm64. > > + and arm64 and IOMMU page tables. > > Hmm, if existing users are parsing this field and alerting when it exceeds > an expected value (a cloud provider, let's say), is it safe to add in a > whole new set of page tables? > > I understand the documentation allows for it, but I think potential impact > on userspace would be more interesting. Hi David, This is something that was discussed at LPC'23. I also was proposing a separate counter for iommu page tables, but it was noted that we specifically have sec_pagetables called this way to include all non regular CPU page tables, and we should therefore account for them together. Please also see this discussion from the previous version of this patch series: https://lore.kernel.org/all/CAJD7tkb1FqTqwONrp2nphBDkEamQtPCOFm0208H3tp0Gq2OLMQ@mail.gmail.com/ Pasha
On Fri, 15 Dec 2023, Pasha Tatashin wrote: > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > > index 3f85254f3cef..e004e05a7cde 100644 > > > --- a/Documentation/admin-guide/cgroup-v2.rst > > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > > @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back. > > > sec_pagetables > > > Amount of memory allocated for secondary page tables, > > > this currently includes KVM mmu allocations on x86 > > > - and arm64. > > > + and arm64 and IOMMU page tables. > > > > Hmm, if existing users are parsing this field and alerting when it exceeds > > an expected value (a cloud provider, let's say), is it safe to add in a > > whole new set of page tables? > > > > I understand the documentation allows for it, but I think potential impact > > on userspace would be more interesting. > > Hi David, > > This is something that was discussed at LPC'23. I also was proposing a > separate counter for iommu page tables, but it was noted that we > specifically have sec_pagetables called this way to include all non > regular CPU page tables, and we should therefore account for them > together. > > Please also see this discussion from the previous version of this patch series: > https://lore.kernel.org/all/CAJD7tkb1FqTqwONrp2nphBDkEamQtPCOFm0208H3tp0Gq2OLMQ@mail.gmail.com/ > Gotcha, I think that makes sense. When sec_pagetables was introduced, I can understand the need to account for non-primary pagetables separately because of the long-standing behavior. In that sense, sec_pagetables becomes a dumping ground for "all other page tables" which IOMMU would naturally include. So this looks good to me. Acked-by: David Rientjes <rientjes@google.com>
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 3f85254f3cef..e004e05a7cde 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back. sec_pagetables Amount of memory allocated for secondary page tables, this currently includes KVM mmu allocations on x86 - and arm64. + and arm64 and IOMMU page tables. percpu (npn) Amount of memory used for storing per-cpu kernel diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 49ef12df631b..86f137a9b66b 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -1110,8 +1110,8 @@ KernelStack PageTables Memory consumed by userspace page tables SecPageTables - Memory consumed by secondary page tables, this currently - currently includes KVM mmu allocations on x86 and arm64. + Memory consumed by secondary page tables, this currently includes + KVM mmu and IOMMU allocations on x86 and arm64. NFS_Unstable Always zero. Previous counted pages which had been written to the server, but has not been committed to stable storage. diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h index 69895a355c0c..cdd257585284 100644 --- a/drivers/iommu/iommu-pages.h +++ b/drivers/iommu/iommu-pages.h @@ -27,6 +27,7 @@ static inline void __iommu_alloc_account(struct page *pages, int order) const long pgcnt = 1l << order; mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, pgcnt); + mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, pgcnt); } /** @@ -39,6 +40,7 @@ static inline void __iommu_free_account(struct page *pages, int order) const long pgcnt = 1l << order; mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, -pgcnt); + mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, -pgcnt); } /** diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1a4d0bba3e8b..aaabb385663c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -199,7 +199,7 @@ enum node_stat_item { NR_KERNEL_SCS_KB, /* measured in KiB */ #endif NR_PAGETABLE, /* used for pagetables */ - NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */ + NR_SECONDARY_PAGETABLE, /* secondary pagetables, KVM & IOMMU */ #ifdef CONFIG_IOMMU_SUPPORT NR_IOMMU_PAGES, /* # of pages allocated by IOMMU */ #endif
In order to be able to limit the amount of memory that is allocated by IOMMU subsystem, the memory must be accounted. Account IOMMU as part of the secondary pagetables as it was discussed at LPC. The value of SecPageTables now contains mmeory allocation by IOMMU and KVM. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> --- Documentation/admin-guide/cgroup-v2.rst | 2 +- Documentation/filesystems/proc.rst | 4 ++-- drivers/iommu/iommu-pages.h | 2 ++ include/linux/mmzone.h | 2 +- 4 files changed, 6 insertions(+), 4 deletions(-)