From patchwork Fri Apr 26 03:43:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13644104 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A0A8C19F4E for ; Fri, 26 Apr 2024 03:43:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 223A76B0096; Thu, 25 Apr 2024 23:43:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D2C16B0098; Thu, 25 Apr 2024 23:43:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0739D6B0099; Thu, 25 Apr 2024 23:43:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DC61A6B0096 for ; Thu, 25 Apr 2024 23:43:29 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 85BF016069F for ; Fri, 26 Apr 2024 03:43:29 +0000 (UTC) X-FDA: 82050288138.24.2A22C07 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf28.hostedemail.com (Postfix) with ESMTP id C8B92C0015 for ; Fri, 26 Apr 2024 03:43:27 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=zOafirMd; spf=pass (imf28.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714103007; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9nuts9l1zCDLqoFks7w3ko3FmsmK+1Col1dS1KLKxxU=; b=7qFBm1qww0vKfeAAEr389ux1tNhVPmlsdefs+afLL3VI66lStvbDRPAQ5fGxlV7ZF5pSUS nqQYAs1xbdJsVYTx9vMxlhFjunToqS9tr1JXo7syFdaNbB20Twg7RRezbe/n93cTtYl4Mw 3/IZ/+uTBAKLuNnJr8Z4TBW/oDZCAf4= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=zOafirMd; spf=pass (imf28.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714103007; a=rsa-sha256; cv=none; b=0eyN8WiZmFclYqJvFgovX/vPSw1PjdnlSQrL0A1qlei9R4JvS9Lc+PMY01PXcyXOS+nb1Q ALOP6rcornu+NVqiBses0Z6TUaCVFMTjN/hVjZ/JWP2DTOuzJ9rqKTxuKGVTjf3TeCXYl2 g1UyYgDDC1vaV5Q+VgWE4P9R3X8F7Qw= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-439656c1b3eso10364941cf.1 for ; Thu, 25 Apr 2024 20:43:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1714103007; x=1714707807; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=9nuts9l1zCDLqoFks7w3ko3FmsmK+1Col1dS1KLKxxU=; b=zOafirMdLJMYJ5rvqOzkixsKZ+XXfyivRzYZBRhf+yY8ltAVYXule+RaSl3Q5yUx1S VuwHAczX5XtuRe+yX0RJY9UNqKFAnu9cOANM94fVktF8dzflEEQ+3VOpkd7j56BUDbBO 8SqX593T5YsHQwyMaABbGDpio1aQiNvphtKnePTJ0al0hRcR+ltAZT1YYoS+fCbPPEgo vpZ6x69xbAAE3aVEjRnT9Q44kUr20bPPkLssQM5EcnLt5L55vVvx5pKM27nsHP16F/p/ MlnSvVLzpmu1ZcOS1Exuh8SzgleSuA/J8+rQRswJHhiTErEApr5dBSFS3fUwUTmAJgnM 8oeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714103007; x=1714707807; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9nuts9l1zCDLqoFks7w3ko3FmsmK+1Col1dS1KLKxxU=; b=Ec9uyphWiwaxpl6U3WTsH1R/8nVrthr3KKZBl2zWmV9q7Ly4eWsv4CJ/1c6GQVnpWk IFVJR6QzTXZhczklCpgp+NVlRDBF9LxWe5JyF4LO9RiPxAJuoTZvuM6l5uiUkeeHYRO7 FJ6sH0nt4GadXhdf3jP2qCkgqJoa0/nPpbt+LgScGd7ShAKer4skC9ve2ghOlerxgO8K t6H2RBfj0BHyCnJUGbtcbg3c4QMCPqj5SXnVawsj5nUBvpaNAzX6fBufdsJZR83OLSCa lxre+spwfl4GroBoTDgvsUm3XD5mz/zhSdOI/X+aADpJOLeoU65fpWsSWBrnCxkVC1T2 UlVg== X-Forwarded-Encrypted: i=1; AJvYcCVukhetj486WoLklnLG/kdJ64MnKJI9UNoVhVUWsxZOdWO90qmU6mP//h38zy7O+24DCNsNnEN3jUd9lkU8pYb8eCw= X-Gm-Message-State: AOJu0YwL26a7Tu0Ic/Pw8MwTctF2BONfPHz/c6dKdJjdPJc/1T1kzcpg SjKQyChRcQpSAW0/ZZY1kJbBqUeJiRrLFpFelr01WcB9c5mXRW8DCLTwKuyxZZ8= X-Google-Smtp-Source: AGHT+IE86KR+B7xfnZmKKOIt5yjOU6xWcIyovaxdsmvq0Zq76unVSgS9+jQ02n8Qi9Da0wnUCx3NpQ== X-Received: by 2002:ac8:5981:0:b0:439:bb89:1fe1 with SMTP id e1-20020ac85981000000b00439bb891fe1mr1607609qte.4.1714103006889; Thu, 25 Apr 2024 20:43:26 -0700 (PDT) Received: from soleen.c.googlers.com.com (129.177.85.34.bc.googleusercontent.com. [34.85.177.129]) by smtp.gmail.com with ESMTPSA id c6-20020ac80546000000b00436bb57faddsm7540815qth.25.2024.04.25.20.43.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 20:43:26 -0700 (PDT) From: Pasha Tatashin To: akpm@linux-foundation.org, linux-mm@kvack.org, pasha.tatashin@soleen.com, linux-kernel@vger.kernel.org, rientjes@google.com, dwmw2@infradead.org, baolu.lu@linux.intel.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, iommu@lists.linux.dev Subject: [RFC v2 1/3] iommu/intel: Use page->_mapcount to count number of entries in IOMMU Date: Fri, 26 Apr 2024 03:43:21 +0000 Message-ID: <20240426034323.417219-2-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.44.0.769.g3c40516874-goog In-Reply-To: <20240426034323.417219-1-pasha.tatashin@soleen.com> References: <20240426034323.417219-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 X-Stat-Signature: hyqfpxtao79n5k4z9apr7jcj6frkh4x9 X-Rspamd-Queue-Id: C8B92C0015 X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1714103007-19894 X-HE-Meta: U2FsdGVkX19OK004JqdclgXT6lmxypdEI2y5oOq2lc3PHDCaRBgnhlUSvjflMTcnvE/nT+nvqVzcwWESUrVrpNOvsZ+FFtMGUJlVLEVHui2z5pGX1G1pzg0iHZpEPfNeSqX15CCuGBpVd8I6VXYVZ+LrzVWq04RqjL9PQ61pBdkeOMu/m6S/o79ZmbSJjJpT4hngSnGCyRd4F5TArtyIDzYpBLS6uiuREcrJcaBpnJBbLzCPTSEEIU0X0DT7Tp6jE4cGMj0NJs8Ct9rCcs32hRigoei2Jy8avloizar/M1UzcHEB2HUD/4KydowFRH5fBIW5D/jB4wmF2Rg9T/GcPcNM2OE2Zx6FLUUcpda2E2yN/maIslSD8RlNegkdrbFNj1+P/hu6ZVpRuOWICxDT2GS5MKq5xRGDnhQD58gr6BKGhVTTHciltH1CJdtDFEIeO7oXo2ME1VEqgj5XU9cECutyk/Nc4Xl6Jb2usomnoTqVtRhWc6k5a27QNoPwUEftXYFGqiJfL4QzsRgrXWl0zpDWwPxlJaZqHhS7vbMJw3Do//WqlB6CjADTyqzp7iXUMArzVoJm/AcfLmuQdxDzpKm8+d9X2vky/+SsuqMe0NT7GG5gbN3IZNtTwWcA4/kDVQI1kVfBzZp/nqlCNzP2nSWWQEn9iN+FmjHd+zF94MIT6QYHhYPU/xGpU7IEYPSLEc/78zlp2zNkITjE+yHLqoQ+V6zLf2ojTWNt4x5rvG0U5vbB/urca3ljou65YowHJ8k0Ae1TRqi0QtDNVVlqZdIXJlhtd11jKuALKz3VAi8wW6bNYHjVzB+PjcIwefcVL/Xn72xUFD01afNs8HAIq68RS/XI6tKgJH1xvCgF+d5rMIEQREPBrcvUmmgPN2UHFLxBqxWwV9ZZVrA2uhq8FBYvtoQ7Y74VJ5a0gs45rSHaH1BKKYsL/JD55rJBGmEYLUFzKIgQ+ALI56ArVDa bI1otS9x vwYy9s2IS7iOvt+KBFK2gg3HLKYZE02wgj1guKETmw6WhMYOyodbLLwtSTNAwQ9+MFGbKg+5Ui9eawUxxHJHDKpDLqD3LWhBXhsFvJBBFZQLB7s5cNuZ9392eW2SA1VfC5r5za/1DDy+fN/lsHw8D+bTT9xY88SuseR1dQmhVZ4z0/C5Se/Do8IMh16xGUgldbWfGCZNzxoHcxkUh4seQ9qqyV9x5Fuo2DdZb7sRIJBMRiqtPbF7GoFLoa/1P1ez+g7QWXBaaDzdCn06KoNnCsIx0hDsvgMrPRpce8toT6cSOH2SQz558Db57+xxuKr2qSRHpDOhlaB8dS3X2BQ4wmLzDMggPl0tCX50pUu6K+sKf+Q9BzfmuqmzhtvxMg3NxceUY9x+/ucbKaCC3vfWDY10JtT8Eb8hD7DGR0EvU/NBPjoOwIxW6q9eSI9jecwSFHTm1rh4dK6O1AWF/zg3bOpW/y8rU+QiUJFwU/qfhAFrF77KEwECpTa0yOUajWEh4qJWt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In order to be able to efficiently free empty page table levels, count the number of entries in each page table by incremeanting and decremeanting mapcount every time a PTE is inserted or removed form the page table. For this to work correctly, add two helper function: dma_clear_pte and dma_set_pte where counting is performed, Also, modify the code so every page table entry is always updated using the two new functions. Finally, before pages are freed, we must restore mapcount to its original state by calling page_mapcount_reset(). Signed-off-by: Pasha Tatashin --- drivers/iommu/intel/iommu.c | 42 ++++++++++++++++++++++--------------- drivers/iommu/intel/iommu.h | 39 ++++++++++++++++++++++++++++------ drivers/iommu/iommu-pages.h | 30 +++++++++++++++++++------- 3 files changed, 81 insertions(+), 30 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 7abe76f92a3c..1bfb6eccad05 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -862,7 +862,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain, if (domain->use_first_level) pteval |= DMA_FL_PTE_XD | DMA_FL_PTE_US | DMA_FL_PTE_ACCESS; - if (cmpxchg64(&pte->val, 0ULL, pteval)) + if (dma_set_pte(pte, pteval)) /* Someone else set it while we were thinking; use theirs. */ iommu_free_page(tmp_page); else @@ -934,7 +934,8 @@ static void dma_pte_clear_range(struct dmar_domain *domain, continue; } do { - dma_clear_pte(pte); + if (dma_pte_present(pte)) + dma_clear_pte(pte); start_pfn += lvl_to_nr_pages(large_page); pte++; } while (start_pfn <= last_pfn && !first_pte_in_page(pte)); @@ -975,7 +976,8 @@ static void dma_pte_free_level(struct dmar_domain *domain, int level, */ if (level < retain_level && !(start_pfn > level_pfn || last_pfn < level_pfn + level_size(level) - 1)) { - dma_clear_pte(pte); + if (dma_pte_present(pte)) + dma_clear_pte(pte); domain_flush_cache(domain, pte, sizeof(*pte)); iommu_free_page(level_pte); } @@ -1006,12 +1008,13 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain, } } -/* When a page at a given level is being unlinked from its parent, we don't - need to *modify* it at all. All we need to do is make a list of all the - pages which can be freed just as soon as we've flushed the IOTLB and we - know the hardware page-walk will no longer touch them. - The 'pte' argument is the *parent* PTE, pointing to the page that is to - be freed. */ +/* + * A given page at a given level is being unlinked from its parent. + * We need to make a list of all the pages which can be freed just as soon as + * we've flushed the IOTLB and we know the hardware page-walk will no longer + * touch them. The 'pte' argument is the *parent* PTE, pointing to the page + * that is to be freed. + */ static void dma_pte_list_pagetables(struct dmar_domain *domain, int level, struct dma_pte *pte, struct list_head *freelist) @@ -1019,17 +1022,21 @@ static void dma_pte_list_pagetables(struct dmar_domain *domain, struct page *pg; pg = pfn_to_page(dma_pte_addr(pte) >> PAGE_SHIFT); - list_add_tail(&pg->lru, freelist); - - if (level == 1) - return; - pte = page_address(pg); + do { - if (dma_pte_present(pte) && !dma_pte_superpage(pte)) - dma_pte_list_pagetables(domain, level - 1, pte, freelist); + if (dma_pte_present(pte)) { + if (level > 1 && !dma_pte_superpage(pte)) { + dma_pte_list_pagetables(domain, level - 1, pte, + freelist); + } + dma_clear_pte(pte); + } pte++; } while (!first_pte_in_page(pte)); + + page_mapcount_reset(pg); + list_add_tail(&pg->lru, freelist); } static void dma_pte_clear_level(struct dmar_domain *domain, int level, @@ -1093,6 +1100,7 @@ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn, /* free pgd */ if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) { struct page *pgd_page = virt_to_page(domain->pgd); + page_mapcount_reset(pgd_page); list_add_tail(&pgd_page->lru, freelist); domain->pgd = NULL; } @@ -2113,7 +2121,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, /* We don't need lock here, nobody else * touches the iova range */ - tmp = cmpxchg64_local(&pte->val, 0ULL, pteval); + tmp = dma_set_pte(pte, pteval); if (tmp) { static int dumps = 5; pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to %llx not %llx)\n", diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 8d081d8c6f41..e5c1eb23897f 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -814,11 +814,6 @@ struct dma_pte { u64 val; }; -static inline void dma_clear_pte(struct dma_pte *pte) -{ - pte->val = 0; -} - static inline u64 dma_pte_addr(struct dma_pte *pte) { #ifdef CONFIG_64BIT @@ -830,9 +825,41 @@ static inline u64 dma_pte_addr(struct dma_pte *pte) #endif } +#define DMA_PTEVAL_PRESENT(pteval) (((pteval) & 3) != 0) static inline bool dma_pte_present(struct dma_pte *pte) { - return (pte->val & 3) != 0; + return DMA_PTEVAL_PRESENT(pte->val); +} + +static inline void dma_clear_pte(struct dma_pte *pte) +{ + u64 old_pteval; + + old_pteval = xchg(&pte->val, 0ULL); + if (DMA_PTEVAL_PRESENT(old_pteval)) { + struct page *pg = virt_to_page(pte); + + atomic_dec(&pg->_mapcount); + } else { + /* Ensure that we cleared a valid entry from the page table */ + WARN_ON_ONCE(1); + } +} + +static inline u64 dma_set_pte(struct dma_pte *pte, u64 pteval) +{ + u64 old_pteval; + + /* Ensure we about to set a valid entry to the page table */ + WARN_ON_ONCE(!DMA_PTEVAL_PRESENT(pteval)); + old_pteval = cmpxchg64(&pte->val, 0ULL, pteval); + if (old_pteval == 0) { + struct page *pg = virt_to_page(pte); + + atomic_inc(&pg->_mapcount); + } + + return old_pteval; } static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte, diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h index 82ebf0033081..b8b332951944 100644 --- a/drivers/iommu/iommu-pages.h +++ b/drivers/iommu/iommu-pages.h @@ -119,7 +119,8 @@ static inline void *iommu_alloc_pages(gfp_t gfp, int order) } /** - * iommu_alloc_page_node - allocate a zeroed page at specific NUMA node. + * iommu_alloc_page_node - allocate a zeroed page at specific NUMA node, and set + * mapcount in its struct page to 0. * @nid: memory NUMA node id * @gfp: buddy allocator flags * @@ -127,18 +128,29 @@ static inline void *iommu_alloc_pages(gfp_t gfp, int order) */ static inline void *iommu_alloc_page_node(int nid, gfp_t gfp) { - return iommu_alloc_pages_node(nid, gfp, 0); + void *virt = iommu_alloc_pages_node(nid, gfp, 0); + + if (virt) + atomic_set(&(virt_to_page(virt))->_mapcount, 0); + + return virt; } /** - * iommu_alloc_page - allocate a zeroed page + * iommu_alloc_page - allocate a zeroed page, and set mapcount in its struct + * page to 0. * @gfp: buddy allocator flags * * returns the virtual address of the allocated page */ static inline void *iommu_alloc_page(gfp_t gfp) { - return iommu_alloc_pages(gfp, 0); + void *virt = iommu_alloc_pages(gfp, 0); + + if (virt) + atomic_set(&(virt_to_page(virt))->_mapcount, 0); + + return virt; } /** @@ -155,16 +167,19 @@ static inline void iommu_free_pages(void *virt, int order) } /** - * iommu_free_page - free page + * iommu_free_page - free page, and reset mapcount * @virt: virtual address of the page to be freed. */ static inline void iommu_free_page(void *virt) { - iommu_free_pages(virt, 0); + if (virt) { + page_mapcount_reset(virt_to_page(virt)); + iommu_free_pages(virt, 0); + } } /** - * iommu_put_pages_list - free a list of pages. + * iommu_put_pages_list - free a list of pages, and reset mapcount. * @page: the head of the lru list to be freed. * * There are no locking requirement for these pages, as they are going to be @@ -177,6 +192,7 @@ static inline void iommu_put_pages_list(struct list_head *page) while (!list_empty(page)) { struct page *p = list_entry(page->prev, struct page, lru); + page_mapcount_reset(p); list_del(&p->lru); __iommu_free_account(p, 0); put_page(p); From patchwork Fri Apr 26 03:43:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13644105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48DBDC4345F for ; Fri, 26 Apr 2024 03:43:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77A956B0098; Thu, 25 Apr 2024 23:43:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 68EA46B0099; Thu, 25 Apr 2024 23:43:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52F416B009A; Thu, 25 Apr 2024 23:43:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1ED2B6B0099 for ; Thu, 25 Apr 2024 23:43:30 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C87941C0C09 for ; Fri, 26 Apr 2024 03:43:29 +0000 (UTC) X-FDA: 82050288138.13.C1DDC93 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf29.hostedemail.com (Postfix) with ESMTP id 436F7120003 for ; Fri, 26 Apr 2024 03:43:28 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=RA1Fgqa7; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf29.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714103008; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GURCfQOZVuUrKhF3OYuy9zEmklzBEXgHfus1Ew76lR0=; b=RBh+6nS0FmfgtrRbzxAlIBveeRu6qDgxQyBixlEePN/FAG9OaAVHyRO57IZt/g0Rmtc/5o dywtrQhHl4olrX0Zj/trDaZ48zWpVanujMnp0NRXuhrT0E2xhSU99oVWiJaf4uD0VZQo02 QrdWhcHlFdsgL9KlVZEqK8y6yf5kynI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=RA1Fgqa7; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf29.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714103008; a=rsa-sha256; cv=none; b=sWvQF2v5jRozRbsNsKC4/Cqjj0Idv/Vn879BlP55jxmcecw+ynAkp1K84JkRotngzmvCGF heyX3hw81rO0v6T5Lpm5Mbwl7K6CvAo4RO+OBrtgeZ8GqIesc7MxhVhzxeqAuuRYfoLiPn 2bdU+3qF8xHXnEooUSOFDJpe6yxEEnA= Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-4375ddb9eaeso10634801cf.3 for ; Thu, 25 Apr 2024 20:43:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1714103007; x=1714707807; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=GURCfQOZVuUrKhF3OYuy9zEmklzBEXgHfus1Ew76lR0=; b=RA1Fgqa7QmwEIKUz8VcrtUvSGp77QW26muTz/qypMstvtuXVpWgi3TU6ed+CJLxUSb dWtSZld6IMhGwUY+rBtDKrKVyYI1U+eDFu8v5KVDgEhil73kr2EzPRr9d0RFCFCYWSKl tDlUW3iL+qFRbPm70DStBYBulSfz37difYXmDgmEFFlYNAf1jgU9Na8RD3UGwYs6uEIC EEEtM/2EjiwJFMmdM9E153GcxYrefqhwqfs3UgJjaE7OJfNUnlMoreawbYssZDBQHqBA G0b29ak/d1CuSAOKb4OKMA40hs28QFKKTREkA/BCXQn0CtUR6sg2FjNlECcgIs5BUd0u 07zQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714103007; x=1714707807; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GURCfQOZVuUrKhF3OYuy9zEmklzBEXgHfus1Ew76lR0=; b=gQwvCgj2gNPN7VAN6CZ/n4XiE+Z7+b2v1Q8gtd9nd48AjG4OsTg8h1Sv0Lpgo25f4Z Qu5wUB5OAIL8iE71IItbtp7Bab+KH548GpcmTW+e2lbQeb9gSp9mEldD567h6bQ2ffBm yEvxT+yBNZhjFyusg+P2dU/PgI+dpk47yOLjinFVDJFlfVZVexR4DnYSPZ94XLcatAtm w0waD6IOlKB3iOSGSQhNGj0NhTLuK1n3t8T5JpJDGnmtZTBFiQTPDU6SkyYn2y75DGTr 9YdUykmujA3CidtvkgXterQpKsLKsOX1xIhpQ57N/bkhVoVhCIL9m7uftNneSmbxCMgz RVTQ== X-Forwarded-Encrypted: i=1; AJvYcCXSEzejXU1sF2zt9KNqa+KVdu2kc4WuKMBoc+se+kynBu0a+svbKRPMIBMOvIOBd+LHvl8SYA69gcgx7RbSLHvf8us= X-Gm-Message-State: AOJu0YwBqe3B/ebLyyh7nVldAwhXvSCjkQ7jgLoa/LsbIcbu4XQcLKAO nENMsSFnEVgSrB6e4OW2Viqk3UmMC8/tfTBnPlq+At8mT+v4wLg9YppGIQOYANs= X-Google-Smtp-Source: AGHT+IHx2jRF9sHZe0ji37o0f9d/WxOmNqlXZKb8rptZFsjIiM8LnbaBigJJCVNFjwy+WUFUifGdEw== X-Received: by 2002:ac8:7f0e:0:b0:437:a1a2:f832 with SMTP id f14-20020ac87f0e000000b00437a1a2f832mr2149126qtk.11.1714103007410; Thu, 25 Apr 2024 20:43:27 -0700 (PDT) Received: from soleen.c.googlers.com.com (129.177.85.34.bc.googleusercontent.com. [34.85.177.129]) by smtp.gmail.com with ESMTPSA id c6-20020ac80546000000b00436bb57faddsm7540815qth.25.2024.04.25.20.43.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 20:43:27 -0700 (PDT) From: Pasha Tatashin To: akpm@linux-foundation.org, linux-mm@kvack.org, pasha.tatashin@soleen.com, linux-kernel@vger.kernel.org, rientjes@google.com, dwmw2@infradead.org, baolu.lu@linux.intel.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, iommu@lists.linux.dev Subject: [RFC v2 2/3] iommu/intel: synchronize page table map and unmap operations Date: Fri, 26 Apr 2024 03:43:22 +0000 Message-ID: <20240426034323.417219-3-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.44.0.769.g3c40516874-goog In-Reply-To: <20240426034323.417219-1-pasha.tatashin@soleen.com> References: <20240426034323.417219-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 436F7120003 X-Stat-Signature: hfhgocstrd5qbgut4zysnf1qz147ujq9 X-HE-Tag: 1714103008-82148 X-HE-Meta: U2FsdGVkX19k2HQlN22Oe7wj305a/EfR1T8cET1InQNQgPRACwk+sQ+pOisiV6Um6hJWr8RFilR/juNP4IYQK+jLlWqL3xGKO8O+hBbKwAkOlPTd6WlRzqoWOUPi7HGoGuIP+DytJ+gydno+g36Adw0qtPOAeiuszexAxu902gUSrhewK+IKqVtFhNI4vaAWLcku7gY4LpPM6xOYPEAxLvSa+povEQRWbWi0TtJOvgl2av9TjYBt1FXWw9klX4rQS2C7H/kV23n3ZIuxOTo+0LiBzXHrcQF8d829Lc1jWzsYWczSksVBGgDYGbHiOoiF6uqUZKaL73mb35Y4lpK0h0uHvBBhATYef5Q6U9pZOiUmQFIwFvpSC4GUGlWIUG1CgC6I7FNYGanjrBv2FvNxVX998QTzk7ZJneX6INFJi1wfeaw27UQ3RK9o+Cvq8CxhdmuOtSbbQNKq1W9wGG2IMGKhAwzOdGLT047N6zXNgBN916+vEwofjngVVDxLXESzVXtcp02OIZjTyeejGjRvuRao7+KTmDHHVNCoyV9zwAIjgjTTtcOqZZbPEgholDmEQ6OA97vMtwwVMvlJyWU0pnFO0Ddd/cjfefS5Qt75C4R6ITxPl1970U80USRCpk9n4NVsOHzC9Sh+cCqe3ek4jh6pS1uuSVuzcZt2zF9l8NfFakwN+O++QvOVQNBbZMhZlKzg+eXJeNO86dsBM5GK4bv+MtZE9qk8K1uRBqsmyEeuJsnJf/fZ/HlDh7qD9BEhNaH2Czul9JsEtJyjGAbI4EY799IEiDvf/660BlKSn2JukLKqBeAwnDruUznhon8IQns69HBhZqOa/K8B+Xky1hIk/1hzwhSxYKdWh+hL925+Cf/IZ1dmbTngEpEQmN66iIAvqWmq4A++uEldSxzTtN7h9g3/ZKHuJ8W4itP3FCkTe/fcnJi4o3V6Um/N5jdBvgW/i8YAD4hXPxNnw8l kLRRs6mw tyuFByeZarWSuxo4VCzYt43Y+ZTvE+khasiviPSs+kCIhOk6kScIcewPr6CXixsfHzqOBysIrVpHyvt5bL34t7BPNTmxpk0AXRhogPyagLruXq1SRtb5ghgnVlJAT5DNH0k1PiVKC/mx8R7diKycmhZnDRbxDLH6TJoGovrBAv7aBTxw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Since, we are going to update parent page table entries when lower level page tables become emtpy and we add them to the free list. We need a way to synchronize the operation. Use domain->pgd_lock to protect all map and unmap operations. This is reader/writer lock. At the beginning everything is going to be read only mode, however, later, when free page table on unmap is added we will add a writer section as well. Signed-off-by: Pasha Tatashin --- drivers/iommu/intel/iommu.c | 21 +++++++++++++++++++-- drivers/iommu/intel/iommu.h | 3 +++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 1bfb6eccad05..8c7e596728b5 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -995,11 +995,13 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain, unsigned long last_pfn, int retain_level) { + read_lock(&domain->pgd_lock); dma_pte_clear_range(domain, start_pfn, last_pfn); /* We don't need lock here; nobody else touches the iova range */ dma_pte_free_level(domain, agaw_to_level(domain->agaw), retain_level, domain->pgd, 0, start_pfn, last_pfn); + read_unlock(&domain->pgd_lock); /* free pgd */ if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) { @@ -1093,9 +1095,11 @@ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn, WARN_ON(start_pfn > last_pfn)) return; + read_lock(&domain->pgd_lock); /* we don't need lock here; nobody else touches the iova range */ dma_pte_clear_level(domain, agaw_to_level(domain->agaw), domain->pgd, 0, start_pfn, last_pfn, freelist); + read_unlock(&domain->pgd_lock); /* free pgd */ if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) { @@ -2088,6 +2092,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr; + read_lock(&domain->pgd_lock); while (nr_pages > 0) { uint64_t tmp; @@ -2097,8 +2102,10 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, pte = pfn_to_dma_pte(domain, iov_pfn, &largepage_lvl, gfp); - if (!pte) + if (!pte) { + read_unlock(&domain->pgd_lock); return -ENOMEM; + } first_pte = pte; lvl_pages = lvl_to_nr_pages(largepage_lvl); @@ -2158,6 +2165,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, pte = NULL; } } + read_unlock(&domain->pgd_lock); return 0; } @@ -3829,6 +3837,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) domain->pgd = iommu_alloc_page_node(domain->nid, GFP_ATOMIC); if (!domain->pgd) return -ENOMEM; + rwlock_init(&domain->pgd_lock); domain_flush_cache(domain, domain->pgd, PAGE_SIZE); return 0; } @@ -4074,11 +4083,15 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, unsigned long start_pfn, last_pfn; int level = 0; + read_lock(&dmar_domain->pgd_lock); /* Cope with horrid API which requires us to unmap more than the size argument if it happens to be a large-page mapping. */ if (unlikely(!pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, - &level, GFP_ATOMIC))) + &level, GFP_ATOMIC))) { + read_unlock(&dmar_domain->pgd_lock); return 0; + } + read_unlock(&dmar_domain->pgd_lock); if (size < VTD_PAGE_SIZE << level_to_offset_bits(level)) size = VTD_PAGE_SIZE << level_to_offset_bits(level); @@ -4145,8 +4158,10 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain, int level = 0; u64 phys = 0; + read_lock(&dmar_domain->pgd_lock); pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &level, GFP_ATOMIC); + read_unlock(&dmar_domain->pgd_lock); if (pte && dma_pte_present(pte)) phys = dma_pte_addr(pte) + (iova & (BIT_MASK(level_to_offset_bits(level) + @@ -4801,8 +4816,10 @@ static int intel_iommu_read_and_clear_dirty(struct iommu_domain *domain, struct dma_pte *pte; int lvl = 0; + read_lock(&dmar_domain->pgd_lock); pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &lvl, GFP_ATOMIC); + read_unlock(&dmar_domain->pgd_lock); pgsize = level_size(lvl) << VTD_PAGE_SHIFT; if (!pte || !dma_pte_present(pte)) { iova += pgsize; diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index e5c1eb23897f..2f38b087ea4f 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -615,6 +615,9 @@ struct dmar_domain { struct { /* virtual address */ struct dma_pte *pgd; + + /* Synchronizes pgd map/unmap operations */ + rwlock_t pgd_lock; /* max guest address width */ int gaw; /* From patchwork Fri Apr 26 03:43:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13644106 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53929C4345F for ; Fri, 26 Apr 2024 03:43:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 706A76B0099; Thu, 25 Apr 2024 23:43:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 68EC96B009A; Thu, 25 Apr 2024 23:43:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E04A6B009B; Thu, 25 Apr 2024 23:43:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 232886B0099 for ; Thu, 25 Apr 2024 23:43:31 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D740C801A9 for ; Fri, 26 Apr 2024 03:43:30 +0000 (UTC) X-FDA: 82050288180.15.A10101D Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf21.hostedemail.com (Postfix) with ESMTP id 24E551C000B for ; Fri, 26 Apr 2024 03:43:29 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=bLy4O+ON; spf=pass (imf21.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714103009; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0qKgoZHcRwjMzBnVukZSI3GExhXHJMz1htG24w59mTg=; b=qby8exZLZPmqWIoON6hjhDm6tkKE8yaVIolu9JWAPF4uK8mbQJJy24Qt94r35tZjlQ7lNq qXZYh5kcWiQUHzbV9AhxzoCmo4FvBOBbK5LxsepAsVDRr7cD28spUYCxSIK76AejgAIieX HFJ3q9YKxAXg1DVsXSALp+2dciBAf1E= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=bLy4O+ON; spf=pass (imf21.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714103009; a=rsa-sha256; cv=none; b=6OU2ivjqmQHminnpjrnBKArxHpYrflqzpNIUXeiWi5l/MOCA00a4e/n4HMR/k+0mjoPOAm yxpV+2qVPz6kQRmwNSUJB2SYYBfh976fdI5AGa+BggQS10+iobJPjgrSabpbR0bIdmQNG1 b+7Mybw2ZyU2e7U3BDsABjToAy0byl4= Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-436ffd27871so10818791cf.2 for ; Thu, 25 Apr 2024 20:43:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1714103008; x=1714707808; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=0qKgoZHcRwjMzBnVukZSI3GExhXHJMz1htG24w59mTg=; b=bLy4O+ONaCZE/WM7wS/0bdI7+tLavxac57qgPg8OmeVUqT4Ajv4jbnFD7b3nbCs8XT PSy58vNVw4At+BxfKsbgq8u63npLW13nXAHvy0xvLzJjDOWEn0NIs+I1nJE4KL4XUx1b ORZwHrcqSmtWn3b/8h/pqcHL8j56s3qj8MnEcLjUDn+jggNWyfkA6qoXEeFYaOdbFLxf 0O4o0l1Bpt22tVfP/MeTNxOEYnFJZAMOa5SNMYyECVsmbgQ/JE8tPl3tkXsqUtWBu2hA 4eET368e44tLpPrgopBNHTdKm9kbYz3RdKRpLPV0QBxvq34TTu0jnd3tB/++WPZHuqRd xx6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714103008; x=1714707808; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0qKgoZHcRwjMzBnVukZSI3GExhXHJMz1htG24w59mTg=; b=DGSt1XmTOA0Zv//I9wr/8H5WzK7nn40wYrIhiQS5Ug8zRbvP7vRhAT4/jxV7+f/NAr loJqK/R/QKWzvVT7M2o952KmjNLzc46u4l1zij2TkTg95bBLNPla5QvAs+7AhsDCVgem gXRq/aUbQWhB+e4FKjXvKsQpWI3LZZuIHb7KdL72ylVfZGZE2q3CCr3CTaYRZ6f2q2K1 k31gNAPgjoeVCWtOu+Dyh7h6CXJyd+3NAQz1HDQNkr495npCTXjZyPcSHcVGfbKgfzPZ Z1cxx6uGkMPOg0O2VAf+2zJbsFyAr27IBPNAkbdhYS6uMy6acJDIyfYLsPsoMbXs54Bw RlcQ== X-Forwarded-Encrypted: i=1; AJvYcCURBWAW4FvSEgnwi9uvfwCMzosg82Ct7uD5xmHRvXbWzhO+bXT8z521TcU5YfQhazkt7X1erXze+QaET8FHxP7hfHg= X-Gm-Message-State: AOJu0Yyk85C2aIy7jB0RSdz3Q143oGYibhfciv3Ff3EabN2bD34t/wDH sxk4OaI9ucrhqPr2CGOtgLkgpx/DB+uwi4A7aS6EoW5N6WRtjT4ax5T1YfiVlf0= X-Google-Smtp-Source: AGHT+IHmUgwf6vDm3AhjFFV4a78DkNgnzXWpFjgQ5FiiDRbS0nCVhPfbCsQnbjR/SyVu2r2tQ6OZiw== X-Received: by 2002:ac8:7e81:0:b0:43a:7bc9:7c2f with SMTP id w1-20020ac87e81000000b0043a7bc97c2fmr325043qtj.2.1714103008254; Thu, 25 Apr 2024 20:43:28 -0700 (PDT) Received: from soleen.c.googlers.com.com (129.177.85.34.bc.googleusercontent.com. [34.85.177.129]) by smtp.gmail.com with ESMTPSA id c6-20020ac80546000000b00436bb57faddsm7540815qth.25.2024.04.25.20.43.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 20:43:27 -0700 (PDT) From: Pasha Tatashin To: akpm@linux-foundation.org, linux-mm@kvack.org, pasha.tatashin@soleen.com, linux-kernel@vger.kernel.org, rientjes@google.com, dwmw2@infradead.org, baolu.lu@linux.intel.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, iommu@lists.linux.dev Subject: [RFC v2 3/3] iommu/intel: free empty page tables on unmaps Date: Fri, 26 Apr 2024 03:43:23 +0000 Message-ID: <20240426034323.417219-4-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.44.0.769.g3c40516874-goog In-Reply-To: <20240426034323.417219-1-pasha.tatashin@soleen.com> References: <20240426034323.417219-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 24E551C000B X-Stat-Signature: xctki885p7rcdtf3ehmqwsbnwekzsnp7 X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1714103009-19102 X-HE-Meta: U2FsdGVkX18xMqhxvF6Eunhy56zYFe7X5qrib7V45HUfFmLecsqSpSIwNNaQg3G46q3qvARN3TUPI9WXYcEd+0h3oqOYkTKcAbQWNtFDtZeh4KbnckXy3hQHFu2Jcs7ogTL22hj++WzPH+2pqPKkR9h/qeiuTTYOZdUQDLFU4iQXX1IExrtCvkV4U9ayzrFpgHh7qSrif853G6NPMNPO8ykwTCvQNllOHAaOxKBNej6PJucQMdZ4OXwiAGO7AzpkeVp3TZ9znAihsyLCNrqNRHcvE+yElu/unRzvjGMY3T2pqwfkAwVyHUOaZruxFEk5sTEj/vgJq+B5F6D1ghn63uV3XAGBKOVewZO+hUhueHF3253vK2wHCLoOqsN63yxpPwVOlfKtttdFwCrTu+tjUFWuRi0KmenfpPoh1J0R7mY5GucEJWdgVoS+6nLY74VGNEH729ZD59SmsWBWv9zgLI4tYYtgDvuunvpgmAbK/VcPE8hDNSadFTYqtdEykiF5kMYgTaPB+ndMF7k/rrc4LMFSBGfCgDo6NFCb8UprD/FC25BUkWZo3jwBpijfjrPWeDMA6+KU83qUihzgsJ8RZBBu5qnrggLv99vg962xp+R3hTdbY40MPlCKmkU1nSjPAiiA5cCVULL7cgWoL/SfSfqZ/Ozh4YS4egh4/HcsTkhEYFYiyj0ZaVcs5vVYb0/dtOyejiuf3IcSKXMt8GGWAU0KInZ8fh5+0X81DvjtL15VsiAgUjRB5g9XI9GFkpu33j30X+txYxke4RboNdsTGdJY51xhxZhq6AEPuuQjLRXUqu9AzH/+ARqmaM2X9vvF827DuBawWE4b7B+lsT9foqvQ+MvfPe7keSxgDav/nsvh5ax9m1Pr7lviWEhJfQ7emYlMIvnvumyhE7LzQ51u0TShodi9AuavEXLOw3lbrHUzsIq62FgaidTqaxLjoOiOvUDHKl+eUcNwACu/ncS XH5kAWzM Dl8+d3UmZoEHr5trl+yHrxxe2muJ1/5toBM/PaZKmbWFa/a3FbqVC8Guo86lKqeJGOi6Sa4/oA6RTCmVlOnoevsTZSgsRlOnGX+3L4ah9BZxmXwzrmKInIAeDgve2jhyqjURQnSid8IGgv1rA7TBLnpIPTHLPomsWkRj4ppqEIJJKaQGWsYFewcshvpg9xMGitAoxzdQDwcMbCu+Lzk9pxGgUad1Z8/Tu5Q6/N3h/DAfhL2hP5o29EUETIShLtAV23ndCvI+3D0xorgc86ogQx89TBFB39kkFGSDNFHjOzaNs8yg6KkMkb0JYJTFN2goR07omvzeRakvFzt1uJhwqdbqZdCJj8eJ8D2l+oARQITprpqMtegMVLcjoQbrRhpjMRHi1nikJ3+Ctifsl0BKgJ6a8wRVB3OCBZzOvgBxd3COJSWctnbFPz545sxKTW+ddPowrUCRi4QhGCYrXHNPBWUEvFyyTTulJBjxpEpYnXqnughI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When page tables become empty, add them to the freelist so that they can also be freed. This is means that a page tables that are outside of the imediat iova range might be freed as well, therefore, only in the case where such page tables are going to be freed, we take the writer lock. Signed-off-by: Pasha Tatashin --- drivers/iommu/intel/iommu.c | 91 +++++++++++++++++++++++++++++++------ 1 file changed, 77 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 8c7e596728b5..2dedcd4f6060 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1044,7 +1044,7 @@ static void dma_pte_list_pagetables(struct dmar_domain *domain, static void dma_pte_clear_level(struct dmar_domain *domain, int level, struct dma_pte *pte, unsigned long pfn, unsigned long start_pfn, unsigned long last_pfn, - struct list_head *freelist) + struct list_head *freelist, int *freed_level) { struct dma_pte *first_pte = NULL, *last_pte = NULL; @@ -1070,11 +1070,47 @@ static void dma_pte_clear_level(struct dmar_domain *domain, int level, first_pte = pte; last_pte = pte; } else if (level > 1) { + struct dma_pte *npte = phys_to_virt(dma_pte_addr(pte)); + struct page *npage = virt_to_page(npte); + /* Recurse down into a level that isn't *entirely* obsolete */ - dma_pte_clear_level(domain, level - 1, - phys_to_virt(dma_pte_addr(pte)), + dma_pte_clear_level(domain, level - 1, npte, level_pfn, start_pfn, last_pfn, - freelist); + freelist, freed_level); + + /* + * Free next level page table if it became empty. + * + * We only holding the reader lock, and it is possible + * that other threads are accessing page table as + * readers as well. We can only free page table that + * is outside of the request IOVA space only if + * we grab the writer lock. Since we need to drop reader + * lock, we are incrementing the mapcount in the npage + * so it (and the current page table) does not + * dissappear due to concurrent unmapping threads. + * + * Store the size maximum size of the freed page table + * into freed_level, so the size of the IOTLB flush + * can be determined. + */ + if (freed_level && !atomic_read(&npage->_mapcount)) { + atomic_inc(&npage->_mapcount); + read_unlock(&domain->pgd_lock); + write_lock(&domain->pgd_lock); + atomic_dec(&npage->_mapcount); + if (!atomic_read(&npage->_mapcount)) { + dma_clear_pte(pte); + if (!first_pte) + first_pte = pte; + last_pte = pte; + page_mapcount_reset(npage); + list_add_tail(&npage->lru, freelist); + *freed_level = level; + } + write_unlock(&domain->pgd_lock); + read_lock(&domain->pgd_lock); + } } next: pfn = level_pfn + level_size(level); @@ -1089,7 +1125,8 @@ static void dma_pte_clear_level(struct dmar_domain *domain, int level, the page tables, and may have cached the intermediate levels. The pages can only be freed after the IOTLB flush has been done. */ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn, - unsigned long last_pfn, struct list_head *freelist) + unsigned long last_pfn, struct list_head *freelist, + int *level) { if (WARN_ON(!domain_pfn_supported(domain, last_pfn)) || WARN_ON(start_pfn > last_pfn)) @@ -1098,7 +1135,8 @@ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn, read_lock(&domain->pgd_lock); /* we don't need lock here; nobody else touches the iova range */ dma_pte_clear_level(domain, agaw_to_level(domain->agaw), - domain->pgd, 0, start_pfn, last_pfn, freelist); + domain->pgd, 0, start_pfn, last_pfn, freelist, + level); read_unlock(&domain->pgd_lock); /* free pgd */ @@ -1479,11 +1517,11 @@ static void __iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, static void iommu_flush_iotlb_psi(struct intel_iommu *iommu, struct dmar_domain *domain, - unsigned long pfn, unsigned int pages, + unsigned long pfn, unsigned long pages, int ih, int map) { - unsigned int aligned_pages = __roundup_pow_of_two(pages); - unsigned int mask = ilog2(aligned_pages); + unsigned long aligned_pages = __roundup_pow_of_two(pages); + unsigned long mask = ilog2(aligned_pages); uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT; u16 did = domain_id_iommu(domain, iommu); @@ -1837,7 +1875,8 @@ static void domain_exit(struct dmar_domain *domain) if (domain->pgd) { LIST_HEAD(freelist); - domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist); + domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist, + NULL); iommu_put_pages_list(&freelist); } @@ -3419,7 +3458,8 @@ static int intel_iommu_memory_notifier(struct notifier_block *nb, struct intel_iommu *iommu; LIST_HEAD(freelist); - domain_unmap(si_domain, start_vpfn, last_vpfn, &freelist); + domain_unmap(si_domain, start_vpfn, last_vpfn, + &freelist, NULL); rcu_read_lock(); for_each_active_iommu(iommu, drhd) @@ -4080,6 +4120,7 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, struct iommu_iotlb_gather *gather) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); + bool queued = iommu_iotlb_gather_queued(gather); unsigned long start_pfn, last_pfn; int level = 0; @@ -4099,7 +4140,16 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, start_pfn = iova >> VTD_PAGE_SHIFT; last_pfn = (iova + size - 1) >> VTD_PAGE_SHIFT; - domain_unmap(dmar_domain, start_pfn, last_pfn, &gather->freelist); + /* + * pass level only if !queued, which means we will do iotlb + * flush callback before freeing pages from freelist. + * + * When level is passed domain_unamp will attempt to add empty + * page tables to freelist, and pass the level number of the highest + * page table that was added to the freelist. + */ + domain_unmap(dmar_domain, start_pfn, last_pfn, &gather->freelist, + queued ? NULL : &level); if (dmar_domain->max_addr == iova + size) dmar_domain->max_addr = iova; @@ -4108,8 +4158,21 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, * We do not use page-selective IOTLB invalidation in flush queue, * so there is no need to track page and sync iotlb. */ - if (!iommu_iotlb_gather_queued(gather)) - iommu_iotlb_gather_add_page(domain, gather, iova, size); + if (!queued) { + size_t sz = size; + + /* + * Increase iova and sz for flushing if level was returned, + * as it means we also are freeing some page tables. + */ + if (level) { + unsigned long pgsize = level_size(level) << VTD_PAGE_SHIFT; + + iova = ALIGN_DOWN(iova, pgsize); + sz = ALIGN(size, pgsize); + } + iommu_iotlb_gather_add_page(domain, gather, iova, sz); + } return size; }