From patchwork Thu Dec 21 03:19:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13500958 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37C32C46CD8 for ; Thu, 21 Dec 2023 03:19:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F4FB8D0008; Wed, 20 Dec 2023 22:19:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 97DA18D0001; Wed, 20 Dec 2023 22:19:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70F248D0008; Wed, 20 Dec 2023 22:19:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5B4378D0001 for ; Wed, 20 Dec 2023 22:19:24 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 39CFA16060E for ; Thu, 21 Dec 2023 03:19:24 +0000 (UTC) X-FDA: 81589369848.13.A46B4B2 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) by imf16.hostedemail.com (Postfix) with ESMTP id 7A5F6180011 for ; Thu, 21 Dec 2023 03:19:22 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=TIou7z8c; dmarc=none; spf=pass (imf16.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.174 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703128762; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1AmuHXuEhInFkqtbi2Boe13e6hcYidcEU++H5mZtuKg=; b=JLM/KyY4i7prfotBDTkMnbhieTT22X8Lv7DhA48ws9+Og+osKaOGgjhgSVtGDtEFQ9ZpAn 0InleX6pT9/fm5zHfkzaXueSXd2uYxAmb0kcC+bGQgea9BDwWMPIpFWk+DII1eeBJE1fw4 VLA0FVBIuXw7C2eW5qj9GVi2OyO5S7w= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=TIou7z8c; dmarc=none; spf=pass (imf16.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.174 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703128762; a=rsa-sha256; cv=none; b=NXAh6wof6mS1T6uEZN+kuUcDBdz9ZPgz0Z7tuAaq/felDy9Jn464iAmg8REhxCyrfSYSbN 4ctwoIxwm5BYnefFuWMo8eR6l3QJI2OuqDmheFsVkoXeSjCh79049eKyYpfjcPBuEzSKDg FFROHLhFq1dcRCXlmN9piawUFsISkk4= Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-78120bb5592so2300585a.3 for ; Wed, 20 Dec 2023 19:19:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1703128761; x=1703733561; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=1AmuHXuEhInFkqtbi2Boe13e6hcYidcEU++H5mZtuKg=; b=TIou7z8cRgg5eHcxNZtfZ9eXYUHe61SUOJ55aZvIQljWC9LGOuhLAQ2kCdN9nN3CMY uGnX9fYrQp0skyNucldedivnBBH8hmMoxYONWPiUBsLhZo4zf++WmPIbTQj2Kfs08kMo F56lOF6Ld0ZuxgwjWwGGqslvDk7HxMVHgUmltXSwVWI0DcG9MQWV982Bwstl34wqMWwg WsEEOGPDE662q7ZMzfcO0Hh02RqQxKWo6Px7DfBIkhxVNmE4W0b6TgHDp8ru5R7SRRkn wwJ6DTahWh/dJfJvf09SKoeL4Dl0V57cKwPrtmzdX+YgL9MG6Xi1fF6hPLakVh+KTmH7 I4mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703128761; x=1703733561; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1AmuHXuEhInFkqtbi2Boe13e6hcYidcEU++H5mZtuKg=; b=QFAs1jmHxM6BimLXuyPD0cVCRn7O3kNwyJu7XHaPZ4fst7pTk+IDjospAVQLU/1NT0 cmTkRQa2+jOYixL1iKC//p2ZiyhwGrE/2+g/kxr4tfghzGyqYv0qZTm3b44aMaiolkoa c7NBg8/xIemF2x1I3Lxl3eVO/EJxHVdFHjfMkEG5tc+NlV5U9TinmC2hAC6hW3tSTr1R RA/WTDqUoazxEdXFnJq8AjCjFUPwZBo07Yh5fDIpgx9fH0alD3soSxJkRyvNcu8kIZ0n ZLlqw/sz7mXXnJ1n4nNm769L0a1J692j4T9sutM4GuL4+oqrFgnJKusnGl5EdqJ69RtS cdhg== X-Gm-Message-State: AOJu0Yzl9cgjG/KHYO4xmABLrExd0DjhrD3iUjzIMZsUtNcBLuymVOZj VkCojI1P5QLvl9a0Thj3I7IhyA== X-Google-Smtp-Source: AGHT+IHF5zSXhAGcOI9AIkZshfFia5huQM56CGqHZFlyV5V3eI/TVjPw6Y6m+SE7x4iKtWD/m0z+kg== X-Received: by 2002:a05:620a:4891:b0:774:cf9:b206 with SMTP id ea17-20020a05620a489100b007740cf9b206mr19223447qkb.42.1703128761561; Wed, 20 Dec 2023 19:19:21 -0800 (PST) Received: from soleen.c.googlers.com.com (55.87.194.35.bc.googleusercontent.com. [35.194.87.55]) by smtp.gmail.com with ESMTPSA id m18-20020a05620a221200b0077d85695db4sm371893qkh.99.2023.12.20.19.19.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 19:19:21 -0800 (PST) From: Pasha Tatashin To: akpm@linux-foundation.org, linux-mm@kvack.org, pasha.tatashin@soleen.com, linux-kernel@vger.kernel.org, rientjes@google.com, dwmw2@infradead.org, baolu.lu@linux.intel.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com, iommu@lists.linux.dev Subject: [RFC 3/3] iommu/intel: free empty page tables on unmaps Date: Thu, 21 Dec 2023 03:19:15 +0000 Message-ID: <20231221031915.619337-4-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231221031915.619337-1-pasha.tatashin@soleen.com> References: <20231221031915.619337-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 7A5F6180011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: jsccz8tru1rjixsbtahfkszupnpfenob X-HE-Tag: 1703128762-77131 X-HE-Meta: U2FsdGVkX1+sceVOdSG4sDZ5k16+QnZjeHQXNlUo1CMbgWapNX4z3CBsnVtwMk8zPOxHL6crNVNsDpR4nWPvenaxXSwPVbDL9ehYR2P8I4qU7HO42F9aWpg0VL64gcdfUcTcrLj3lUnXGGJz/ARvcGv3cG8jR8gB129Tfpg4g+efgDHUcsRr0mRK1w35Kskh+Dz9sSPQFpWCeOj9jKNyrCTkhj71krYLNO/2kCn+LcEcJn5qfsumMB3kICSNz08rQPqWVTAvX8USbjUZVZBpoQmGtfatv7A840sF4SGe/+8vZf4nQ00Soes/AbDt4qm/3CXHJsvBxyqZAt4zXqhXxSZ8/RGzFAHiXFPY9ncsj7iKNdaObIghO1elrHK784BQOUyKXl10crImVE82bS2F9mrMjFS4N2w/1MuOgf/gVcd2IkcW1D15c+/wgQ3Nb5c52i8j43ip7qLmNsvdKrfCmsx8q8OA/gMeZK5LXa6G8q/LFWf3bL0yrE2qIDQUA93FOd68/KM3ouy02x5ULYAXCW4eGwG4oiRhJzUXUwrHuAX3+VPNArJYjeXcHVjgxqWYxHqgZD9J8pM0Prfh5NURmN0sWoOcH1c2+WwQh0mJ9jQmflsSJxiI7qBlrvKMb65WHnJhq5zuH+aqLy07ht3M4i6Od0pJoMNWvfyImGxFqgDkUJhnkIH46bDxSwPadQhQpwbp2kZrdFj1RtHpr1AJlTa6JhIxT+zbNGpB2gwsbaDnXlP1Xo0Zr31UTevyPQd5Vxh4AIcucb9v9pgJ8Es3JSVlCDfiTTNpHxDP6AdKM8g+qVx6uc1SHM9goNPSShafTOtR+YG4aFpkYNjwL/nQg/PXxgQi2RS2pBncH5BYTykQcQ3n/IrOZbMHHJjUVDqASswChRY4BCEXkClDwG0HffPt1FqhRUSm3+hT+ogSY9XUfAAMjynYkb5OL8QVVWt4lzm1nBDRjt9eBXCZKCF v667MSTc RDvue4CtiUaBeUfgsUvSAAuaCA0vnUznm3aiepYeWCbYsoZyKRV1VHnycG31rieLGG5w4ndxeraHIpiIfVWJ3KmhJ57s7okxtAlPpEpOzXxz7xtYvcPekWxc6SYCobd/NQdgPlJ2tBWIXMmTzI0uvHheRJ/pR9blaJp9mG5wiLj4DoMKxmimkgkOJZW3PGw7zNh1TY4sv1wsfrJtFsgToR8ndoomMDpxBP6uJByTUA9ZY+UEbo3p17fc/7TNaiaaYRRuzklLMENMWAyfXsfa3KN4f364La7C1n+3m/rNLy3mA/Nisc/1iMbuQvwtf+SjUogCxAiJWOkXF2iqRqb8gihFTR1qYc0jiIECzI6QtOzRZ3ePW+AEuO83JoFauOj9dajMFHRTK1qePp/MkIx3LHrfJ+w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When page tables become empty, add them to the freelist so that they can also be freed. This is means that a page tables that are outside of the imediat iova range might be freed as well, therefore, only in the case where such page tables are going to be freed, we take the writer lock. Signed-off-by: Pasha Tatashin --- drivers/iommu/intel/iommu.c | 92 +++++++++++++++++++++++++++++++------ 1 file changed, 78 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 733f25b277a3..141dc106fb01 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1130,7 +1130,7 @@ static void dma_pte_list_pagetables(struct dmar_domain *domain, static void dma_pte_clear_level(struct dmar_domain *domain, int level, struct dma_pte *pte, unsigned long pfn, unsigned long start_pfn, unsigned long last_pfn, - struct list_head *freelist) + struct list_head *freelist, int *freed_level) { struct dma_pte *first_pte = NULL, *last_pte = NULL; @@ -1156,11 +1156,48 @@ static void dma_pte_clear_level(struct dmar_domain *domain, int level, first_pte = pte; last_pte = pte; } else if (level > 1) { + struct dma_pte *npte = phys_to_virt(dma_pte_addr(pte)); + struct page *npage = virt_to_page(npte); + /* Recurse down into a level that isn't *entirely* obsolete */ - dma_pte_clear_level(domain, level - 1, - phys_to_virt(dma_pte_addr(pte)), + dma_pte_clear_level(domain, level - 1, npte, level_pfn, start_pfn, last_pfn, - freelist); + freelist, freed_level); + + /* + * Free next level page table if it became empty. + * + * We only holding the reader lock, and it is possible + * that other threads are accessing page table as + * readers as well. We can only free page table that + * is outside of the request IOVA space only if + * we grab the writer lock. Since we need to drop reader + * lock, we are incrementing the refcount in the npage + * so it (and the current page table) does not + * dissappear due to concurrent unmapping threads. + * + * Store the size maximum size of the freed page table + * into freed_level, so the size of the IOTLB flush + * can be determined. + */ + if (freed_level && page_count(npage) == 1) { + page_ref_inc(npage); + read_unlock(&domain->pgd_lock); + write_lock(&domain->pgd_lock); + if (page_count(npage) == 2) { + dma_clear_pte(pte); + + if (!first_pte) + first_pte = pte; + + last_pte = pte; + list_add_tail(&npage->lru, freelist); + *freed_level = level; + } + write_unlock(&domain->pgd_lock); + read_lock(&domain->pgd_lock); + page_ref_dec(npage); + } } next: pfn = level_pfn + level_size(level); @@ -1175,7 +1212,8 @@ static void dma_pte_clear_level(struct dmar_domain *domain, int level, the page tables, and may have cached the intermediate levels. The pages can only be freed after the IOTLB flush has been done. */ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn, - unsigned long last_pfn, struct list_head *freelist) + unsigned long last_pfn, struct list_head *freelist, + int *level) { if (WARN_ON(!domain_pfn_supported(domain, last_pfn)) || WARN_ON(start_pfn > last_pfn)) @@ -1184,7 +1222,8 @@ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn, read_lock(&domain->pgd_lock); /* we don't need lock here; nobody else touches the iova range */ dma_pte_clear_level(domain, agaw_to_level(domain->agaw), - domain->pgd, 0, start_pfn, last_pfn, freelist); + domain->pgd, 0, start_pfn, last_pfn, freelist, + level); read_unlock(&domain->pgd_lock); /* free pgd */ @@ -1524,11 +1563,11 @@ static void domain_flush_pasid_iotlb(struct intel_iommu *iommu, static void iommu_flush_iotlb_psi(struct intel_iommu *iommu, struct dmar_domain *domain, - unsigned long pfn, unsigned int pages, + unsigned long pfn, unsigned long pages, int ih, int map) { - unsigned int aligned_pages = __roundup_pow_of_two(pages); - unsigned int mask = ilog2(aligned_pages); + unsigned long aligned_pages = __roundup_pow_of_two(pages); + unsigned long mask = ilog2(aligned_pages); uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT; u16 did = domain_id_iommu(domain, iommu); @@ -1872,7 +1911,8 @@ static void domain_exit(struct dmar_domain *domain) if (domain->pgd) { LIST_HEAD(freelist); - domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist); + domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist, + NULL); put_pages_list(&freelist); } @@ -3579,7 +3619,8 @@ static int intel_iommu_memory_notifier(struct notifier_block *nb, struct intel_iommu *iommu; LIST_HEAD(freelist); - domain_unmap(si_domain, start_vpfn, last_vpfn, &freelist); + domain_unmap(si_domain, start_vpfn, last_vpfn, + &freelist, NULL); rcu_read_lock(); for_each_active_iommu(iommu, drhd) @@ -4253,6 +4294,7 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, struct iommu_iotlb_gather *gather) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); + bool queued = iommu_iotlb_gather_queued(gather); unsigned long start_pfn, last_pfn; int level = 0; @@ -4272,7 +4314,16 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, start_pfn = iova >> VTD_PAGE_SHIFT; last_pfn = (iova + size - 1) >> VTD_PAGE_SHIFT; - domain_unmap(dmar_domain, start_pfn, last_pfn, &gather->freelist); + /* + * pass level only if !queued, which means we will do iotlb + * flush callback before freeing pages from freelist. + * + * When level is passed domain_unamp will attempt to add empty + * page tables to freelist, and pass the level number of the highest + * page table that was added to the freelist. + */ + domain_unmap(dmar_domain, start_pfn, last_pfn, &gather->freelist, + queued ? NULL : &level); if (dmar_domain->max_addr == iova + size) dmar_domain->max_addr = iova; @@ -4281,8 +4332,21 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, * We do not use page-selective IOTLB invalidation in flush queue, * so there is no need to track page and sync iotlb. */ - if (!iommu_iotlb_gather_queued(gather)) - iommu_iotlb_gather_add_page(domain, gather, iova, size); + if (!queued) { + size_t sz = size; + + /* + * Increase iova and sz for flushing if level was returned, + * as it means we also are freeing some page tables. + */ + if (level) { + unsigned long pgsize = level_size(level) << VTD_PAGE_SHIFT; + + iova = ALIGN_DOWN(iova, pgsize); + sz = ALIGN(size, pgsize); + } + iommu_iotlb_gather_add_page(domain, gather, iova, sz); + } return size; }