From patchwork Thu Jul 7 12:52:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 12909533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 171D5C433EF for ; Thu, 7 Jul 2022 12:53:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7AEA16B0072; Thu, 7 Jul 2022 08:53:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 75C3B6B0073; Thu, 7 Jul 2022 08:53:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64ABB6B0074; Thu, 7 Jul 2022 08:53:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 55ACC6B0072 for ; Thu, 7 Jul 2022 08:53:06 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 117DE33DAD for ; Thu, 7 Jul 2022 12:53:06 +0000 (UTC) X-FDA: 79660293972.25.01F3457 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf19.hostedemail.com (Postfix) with ESMTP id B03341A001C for ; Thu, 7 Jul 2022 12:53:04 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id l124so8794650pfl.8 for ; Thu, 07 Jul 2022 05:53:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=1HOT8eT9LJMvE08pzscJQsZxewmGGQ7hUQZcMynRv0U=; b=E+tBAHKbYBb1jyLJZwkH0wefRhu9LBtW28jrUfYjKXn65diHTNvv068WhUeaNGioO5 4lJmab0dNgevmRk0aajh6aNmoH6aY88oTD+s+d6JBjJSCxRhy1bzM2WTgfLZJdeCBQTO 5DIRzjZrDeUoz9GX9SIZ45mQAvhzo8CzRZQEx7e9PbpAuqZ6mn7ZQ3aZrfiL/NMyjAZI VOHIW0ZeCdxb1oNrRfSKmME3juksIpNzrQ4fpPzqw1zb5lODDJ+ZqxPeMqS2nCMP6Hmi ALs+X8EQDoGXFs0vIM212wNczouRDzaMUIpM9ogIH0f/M+CPjNzwrZjx4nwuduqmOpRV +7vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=1HOT8eT9LJMvE08pzscJQsZxewmGGQ7hUQZcMynRv0U=; b=RnCBFQFmWirNrQexRdVZf9KiN8A8GOUuaD6EXid9+D3zHwRyoiXucJHE4HQW4/nxYn rAYlxcHLL2i2e/P+Hc0h9JrD3gW8keAw+Z83kyD3of3PPDiyk6oFl4u0n6TvWpB75GCO KBjs8ZetgOkhZproF+GvQ08y14QT/zoJLQ0Un/hFAIv4QS3xiPMjm84ZHJq6Eq+jqdc1 zNdjxDzQf9iAf/gCMBZ+Hg/U8nVdfQp+XLr4sw4VtUpuXqpFXkQo2WESUIr75SgWobji HLJ14lcPfsUr4pRwO9qAq6BHHC6y/fXRQ+pWGUKwls3w+THHbjaBlrnYxYaZOeKkBxVj zT4Q== X-Gm-Message-State: AJIora+gj0C3FZw7e5Rp9BO9uEokD3XHHWie30xjatKXxmIRVlXB/CuR XjzDljhvw6A9Ca6j22EHHW0= X-Google-Smtp-Source: AGRyM1sl4VKGZh3qbbhRsLLlluOZCr006j4y/s4lIPsX2UllxFxD81/fcATU0Wpwicb9MsZ3+B8cMw== X-Received: by 2002:a17:90b:33ca:b0:1ec:c617:9660 with SMTP id lk10-20020a17090b33ca00b001ecc6179660mr5091908pjb.95.1657198383576; Thu, 07 Jul 2022 05:53:03 -0700 (PDT) Received: from localhost.localdomain (47-72-206-164.dsl.dyn.ihug.co.nz. [47.72.206.164]) by smtp.gmail.com with ESMTPSA id a11-20020a1709027e4b00b0016a3b5a46f0sm27907831pln.241.2022.07.07.05.52.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Jul 2022 05:53:02 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, x86@kernel.org, catalin.marinas@arm.com, will@kernel.org, linux-doc@vger.kernel.org Cc: corbet@lwn.net, arnd@arndb.de, linux-kernel@vger.kernel.org, darren@os.amperecomputing.com, yangyicong@hisilicon.com, huzhanyuan@oppo.com, lipeifeng@oppo.com, zhangshiming@oppo.com, guojian@oppo.com, realmz6@gmail.com, Barry Song <21cnbao@gmail.com> Subject: [PATCH 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH Date: Fri, 8 Jul 2022 00:52:38 +1200 Message-Id: <20220707125242.425242-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657198384; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=1HOT8eT9LJMvE08pzscJQsZxewmGGQ7hUQZcMynRv0U=; b=6ywef9aHIPP+mbvIc4LRP0k5GLW0jWLUeEZhM3m5OUTzmYRgAdtTPqwoPms0SpoJ5sHlzp EXPUGYgUH6u0s37aAQl7+yksav2aqGXad9+Gyq3R7r5AX2SXP8ETM4F7K8oZcGg6AGjh/G yyTp6oMWUX0He5IwcpYHlUZHIHdL3sg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657198384; a=rsa-sha256; cv=none; b=C0v+ApENUxJzdpODXhq/+HmnOABo9xnV/Z/9MrRWFIINtHoOpr3yjCsvcfURN1cRGfWrRl vqvYbsgeKT/B2sLxyB4DBgKPA9t4RQhTK5iR/+6yVft2/cnxHHo49wdGhtGeTi8J/yGrcb 0mQe7VqUXYZI1wtupBj8txJr8yEj9ms= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=E+tBAHKb; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=21cnbao@gmail.com X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=E+tBAHKb; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=21cnbao@gmail.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B03341A001C X-Stat-Signature: pfa484ecucqhjw86mwtuasw9kawkwxq9 X-HE-Tag: 1657198384-747657 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Though ARM64 has the hardware to do tlb shootdown, it is not free. A simplest micro benchmark shows even on snapdragon 888 with only 8 cores, the overhead for ptep_clear_flush is huge even for paging out one page mapped by only one process: 5.36% a.out [kernel.kallsyms] [k] ptep_clear_flush While pages are mapped by multiple processes or HW has more CPUs, the cost should become even higher due to the bad scalability of tlb shootdown. This patchset leverages the existing BATCHED_UNMAP_TLB_FLUSH by 1. only send tlbi instructions in the first stage - arch_tlbbatch_add_mm() 2. wait for the completion of tlbi by dsb while doing tlbbatch sync in arch_tlbbatch_flush() My testing on snapdragon shows the overhead of ptep_clear_flush is removed by the patchset. The micro benchmark becomes 5% faster even for one page mapped by single process on snapdragon 888. While believing the micro benchmark in 4/4 will perform better on arm64 servers, I don't have a hardware to test. Thus, Hi Yicong, Would you like to run the same test in 4/4 on Kunpeng920? Hi Darren, Would you like to run the same test in 4/4 on Ampere's ARM64 server? Remember to enable zRAM swap device so that pageout can actually work for the micro benchmark. thanks! Barry Song (4): Revert "Documentation/features: mark BATCHED_UNMAP_TLB_FLUSH doesn't apply to ARM64" mm: rmap: Allow platforms without mm_cpumask to defer TLB flush mm: rmap: Extend tlbbatch APIs to fit new platforms arm64: support batched/deferred tlb shootdown during page reclamation Documentation/features/arch-support.txt | 1 - .../features/vm/TLB/arch-support.txt | 2 +- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/tlbbatch.h | 12 +++++++++++ arch/arm64/include/asm/tlbflush.h | 13 ++++++++++++ arch/x86/include/asm/tlbflush.h | 4 +++- mm/rmap.c | 21 +++++++++++++------ 7 files changed, 45 insertions(+), 9 deletions(-) create mode 100644 arch/arm64/include/asm/tlbbatch.h