From patchwork Mon Feb 5 12:01:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545358 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C512BC4828D for ; Mon, 5 Feb 2024 12:05:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FCC56B0078; Mon, 5 Feb 2024 07:05:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5ADAB6B0089; Mon, 5 Feb 2024 07:05:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44DF66B00A0; Mon, 5 Feb 2024 07:05:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 36F556B0078 for ; Mon, 5 Feb 2024 07:05:15 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 119FCA0443 for ; Mon, 5 Feb 2024 12:05:15 +0000 (UTC) X-FDA: 81757619790.25.1F15A61 Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) by imf23.hostedemail.com (Postfix) with ESMTP id 2BF3B140020 for ; Mon, 5 Feb 2024 12:05:12 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=rBiC7ikK; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf23.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134713; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FfAwJVvQHyFiTbONYCZnQNlkav6R7SrA1Dbs6XI8844=; b=BVEu1SdYzROTlV2dk7h4GxHM93rJSeL8VZdjst0hax2vgmDfO+Q1E3lX6i3YkIX28r3DD0 49Lu7qJSqc7AcvWb37o5ZGxxCnNZqgXMZH6xfBKLwi9husU/ujV3/B2Yh4Jx/WVQS3Uviq Wkwb/2/Hi5HWyEAocdmzggrKQ6ZWwCc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=rBiC7ikK; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf23.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134713; a=rsa-sha256; cv=none; b=qSu7DBvXjdPuAsp5lUWgksPfnMi28AH08yDspfRZoNOVrYQziXlBFi7kbl3Qrze62RksaM qRuUYngPPs77ZfSwm9AnMPkdOmF2Lzp7uhDCJMBmOyMkvjangZYZ/IGRq0jBJ6ffSjF4yO 2LydMUDAOPumaMbT7bDlRAP2QjPrJu4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134713; x=1738670713; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FfAwJVvQHyFiTbONYCZnQNlkav6R7SrA1Dbs6XI8844=; b=rBiC7ikKOvQFn4asAe08Bc5Aazz7SkCi9MG/65Unu2c8sm+oWJpXnFwV g+F5I3SDvv2ophIVeXpl8l4FKUoYjdNTM2Ec4h3KnqbFNJbGLX/tjQzz7 xwH9VULxVmg7l9CbjzZh0S+ckkRukr1velW8RuvRnSIQp/pUGh3BkNWLW Q=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="610940405" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:05:08 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.43.254:20056] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.8.155:2525] with esmtp (Farcaster) id 0814e71f-b1b8-4c6d-9ee2-790efdbab159; Mon, 5 Feb 2024 12:05:06 +0000 (UTC) X-Farcaster-Flow-ID: 0814e71f-b1b8-4c6d-9ee2-790efdbab159 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:06 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:00 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 12/18] pkernfs: Add IOMMU domain pgtables file Date: Mon, 5 Feb 2024 12:01:57 +0000 Message-ID: <20240205120203.60312-13-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D046UWA004.ant.amazon.com (10.13.139.76) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2BF3B140020 X-Stat-Signature: msz7wykwxjd9oq1ihbn9961z46wkbxhx X-HE-Tag: 1707134712-812752 X-HE-Meta: U2FsdGVkX19+y9b+G0k/oyoYNgcYqTbXvdYQ1YI1C1DhL/zpgzWflHzdH/78YBUwXyk/RipgBK6ZLsmv+wqW7nABrRt6XiRuLUAMXym8ROgaa3Tkd13LbNClCNm+7c0Som9NkRc0pi3DlN2fjJYEo5wVbXT84G1xw37VoDEB6TBV6L7YsDEo6iuCVnJPxmakkeEyU3XCafWbB+atV9EHSqYiRb08Fb1v9lLdbovudKlLBzXtYtsod6ePOsgN80S+fGAhKBR5WStCoOxgVyNUQwA0NEm9u+yLK0VI8WVDWII2t5jjMaf2OZaXT8e3sR7TX/vxY5ZA6eNBZG9wVMAFgRi3iyou0VBg8fEBaRSPgSWIr7mhx7gSHKHP//IdJ53Eb9hUliDH1rLRPiuRfMsvHhMKUmuHMzXSb+IhbzCoPvxujN1FC3ci0hNotw4NWMZONPgIpcfpr2cwp+gNJ864SW48naRb1AJ/wzdoPSgKgS7v8GycjHQGxzgT0xeiaM2eexz01GIDfPrXJFswa0VyFL5cFydKYGQVImdyPoswYIXuOj39I7xptEyfmYJUKxAw3nceveMvkOIvqnNfa+oU+tNF0CmJxNt3mEfBFhhCrrklmmhabUvb+VQRX7z7oOIohE6b/tj/1Mt9nC9AQaXxfef5oM+TtV5BNuJ2fDf3mixP4Z7m3UEV4spP2vSpAGvfs1pU/Tqo22QdR8YO2VnKZs19jaBqFGzsNca5lJAxTc01aa2AHCjE/Sio8jL/JR0f2otVti00+XmKVJsPFOig8QGfvJYpdQRkOgbzeebdetHo19QcbWY/eWXWiRlVIJt2rmrJzrOd8OigtwIZrm88UEuyREhMabj8xE8S1JXD5Y00/ZCD7Uk/4gIWQBSKmXVacnNIymjtM9+9d446XTNYZImxNrRaNvs/sCFcGcpCg0mJssKbzg/ZgjoSMPW3o6rU6YWo2BeboKp3HJvnK7+ sDUUxBSC cOD+xOHNdcLG5+5fE+rnVVdsmnSg56cJ2QrIjyIaovwU6OCVbeuAgvixPA/jboKuNqQJHqDzMjGZw0J+EN77iF3MhHbetcGOQg//IY4W/qLZet7QmJLDsn8bhRJBIBSft5Guxbq3FmYohLnSvr8JCsxFzfnVVf11R2ke9mGQ5++8GvDHwzwIX9eweLH/Mx2IlqgUj5pmgDAMqZCsLafgWRs5cahOWkrm4NeO+wP9+3EgLcWy7UZj+FTnDZVK2nADDjX0EOSOOmuorqB/+X7miDjBelUSJ3GHWlP1RepYZ+CHsE0k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Similar to the IOMMU root pgtables file which was added in a previous commit, now support a file type for IOMMU domain pgtables in the IOMMU directory. These domain pgtable files only need to be useable after the system has booted up, for example by QEMU creating one of these files and using it to back the IOMMU pgtables for a persistent VM. As such the filesystem abstraction can be better maintained here as the kernel code doesn't need to reach "behind" the filesystem abstraction like it does for the root pgtables. A new inode type is created for domain pgtable files, and the IOMMU directory gets inode_operation callbacks to support creating and deleting these files in it. Note: there is a use-after-free risk here too: if the domain pgtable file is truncated while it's in-use for IOMMU pgtables then freed memory could still be mapped into the IOMMU. To mitigate this there should be a machanism to "freeze" the files once they've been given to the IOMMU. --- fs/pkernfs/inode.c | 9 +++++-- fs/pkernfs/iommu.c | 55 +++++++++++++++++++++++++++++++++++++++-- fs/pkernfs/pkernfs.h | 4 +++ include/linux/pkernfs.h | 1 + 4 files changed, 65 insertions(+), 4 deletions(-) diff --git a/fs/pkernfs/inode.c b/fs/pkernfs/inode.c index 1d712e0a82a1..35842cd61002 100644 --- a/fs/pkernfs/inode.c +++ b/fs/pkernfs/inode.c @@ -35,7 +35,11 @@ struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) inode->i_op = &pkernfs_iommu_dir_inode_operations; inode->i_fop = &pkernfs_dir_fops; inode->i_mode = S_IFDIR; - } else if (pkernfs_inode->flags | PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES) { + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES) { + inode->i_fop = &pkernfs_file_fops; + inode->i_mode = S_IFREG; + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES) { + inode->i_fop = &pkernfs_file_fops; inode->i_mode = S_IFREG; } @@ -175,6 +179,7 @@ const struct inode_operations pkernfs_dir_inode_operations = { }; const struct inode_operations pkernfs_iommu_dir_inode_operations = { + .create = pkernfs_create_iommu_pgtables, .lookup = pkernfs_lookup, + .unlink = pkernfs_unlink, }; - diff --git a/fs/pkernfs/iommu.c b/fs/pkernfs/iommu.c index 5bce8146d7bb..f14e76013e85 100644 --- a/fs/pkernfs/iommu.c +++ b/fs/pkernfs/iommu.c @@ -4,6 +4,27 @@ #include +void pkernfs_alloc_iommu_domain_pgtables(struct file *ppts, struct pkernfs_region *pkernfs_region) +{ + struct pkernfs_inode *pkernfs_inode; + unsigned long *mappings_block_vaddr; + unsigned long inode_idx; + + /* + * For a pkernfs region block, the "mappings_block" field is still + * just a block index, but that block doesn't actually contain mappings + * it contains the pkernfs_region data + */ + + inode_idx = ppts->f_inode->i_ino; + pkernfs_inode = pkernfs_get_persisted_inode(NULL, inode_idx); + + mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, + pkernfs_inode->mappings_block); + set_bit(0, mappings_block_vaddr); + pkernfs_region->vaddr = mappings_block_vaddr; + pkernfs_region->paddr = pkernfs_base + (pkernfs_inode->mappings_block * (2 << 20)); +} void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) { unsigned long *mappings_block_vaddr; @@ -63,9 +84,8 @@ void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) * just a block index, but that block doesn't actually contain mappings * it contains the pkernfs_region data */ - mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, - iommu_pgtables->mappings_block); + iommu_pgtables->mappings_block); set_bit(0, mappings_block_vaddr); pkernfs_region->vaddr = mappings_block_vaddr; pkernfs_region->paddr = pkernfs_base + (iommu_pgtables->mappings_block * PMD_SIZE); @@ -88,6 +108,29 @@ void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) (iommu_pgtables->mappings_block * PMD_SIZE); } +int pkernfs_create_iommu_pgtables(struct mnt_idmap *id, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl) +{ + unsigned long free_inode; + struct pkernfs_inode *pkernfs_inode; + struct inode *vfs_inode; + + free_inode = pkernfs_allocate_inode(dir->i_sb); + if (free_inode <= 0) + return -ENOMEM; + + pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, free_inode); + pkernfs_inode->sibling_ino = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino; + pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = free_inode; + strscpy(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN); + pkernfs_inode->flags = PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES; + pkernfs_inode->mappings_block = pkernfs_alloc_block(dir->i_sb); + memset(pkernfs_addr_for_block(dir->i_sb, pkernfs_inode->mappings_block), 0, (2 << 20)); + vfs_inode = pkernfs_inode_get(dir->i_sb, free_inode); + d_add(dentry, vfs_inode); + return 0; +} + void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr) { if (WARN_ON(paddr >= region->paddr + region->bytes)) @@ -96,3 +139,11 @@ void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long return NULL; return region->vaddr + (paddr - region->paddr); } + +bool pkernfs_is_iommu_domain_pgtables(struct file *f) +{ + return f && + pkernfs_get_persisted_inode(f->f_inode->i_sb, f->f_inode->i_ino)->flags & + PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES; +} + diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index e1b7ae3fe7f1..9bea827f8b40 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -21,6 +21,7 @@ struct pkernfs_sb { #define PKERNFS_INODE_FLAG_DIR (1 << 1) #define PKERNFS_INODE_FLAG_IOMMU_DIR (1 << 2) #define PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES (1 << 3) +#define PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES (1 << 4) struct pkernfs_inode { int flags; /* @@ -50,8 +51,11 @@ void *pkernfs_addr_for_block(struct super_block *sb, int block_idx); unsigned long pkernfs_allocate_inode(struct super_block *sb); struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); +int pkernfs_create_iommu_pgtables(struct mnt_idmap *id, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl); extern const struct file_operations pkernfs_dir_fops; extern const struct file_operations pkernfs_file_fops; extern const struct inode_operations pkernfs_file_inode_operations; extern const struct inode_operations pkernfs_iommu_dir_inode_operations; +extern const struct inode_operations pkernfs_iommu_domain_pgtables_inode_operations; diff --git a/include/linux/pkernfs.h b/include/linux/pkernfs.h index 0110e4784109..4ca923ee0d82 100644 --- a/include/linux/pkernfs.h +++ b/include/linux/pkernfs.h @@ -33,4 +33,5 @@ void pkernfs_alloc_page_from_region(struct pkernfs_region *pkernfs_region, void **vaddr, unsigned long *paddr); void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr); +bool pkernfs_is_iommu_domain_pgtables(struct file *f); #endif /* _LINUX_PKERNFS_H */