From patchwork Thu Feb 25 07:29:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12103561 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F13DFC433DB for ; Thu, 25 Feb 2021 07:33:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5E0F264E85 for ; Thu, 25 Feb 2021 07:33:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5E0F264E85 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1F8AA8D0002; Thu, 25 Feb 2021 02:33:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 17A2D6B0074; Thu, 25 Feb 2021 02:33:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F11B58D0002; Thu, 25 Feb 2021 02:33:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id DCBBD6B0073 for ; Thu, 25 Feb 2021 02:33:54 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A31B4824C431 for ; Thu, 25 Feb 2021 07:33:54 +0000 (UTC) X-FDA: 77855975988.20.87C02E9 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf30.hostedemail.com (Postfix) with ESMTP id 1BBDAE0001B4 for ; Thu, 25 Feb 2021 07:33:53 +0000 (UTC) Received: by mail-pj1-f52.google.com with SMTP id i14so1355606pjz.4 for ; Wed, 24 Feb 2021 23:33:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=y3sPy044K8OK/fHnemDT9h+eBrVf8ED4HLMVS2xW7Ww=; b=eePsIdmliCbqiaRylWemGy7PqRX5igzSUxNh11OhklFJWQo37Lut9vXXemqCp/bVTY mjs0PMLx9IXyr4wwaVkz9PFt7D7jKWB5hl5upqD0hFsMQY5Rk9cKAHJscSAvZ4rj91xX ND8CIMyVnqIeowKTaJNjQCtpQF7pwoGkYHXp496JEb0tORwaPzcExh3Wn4g3UCshRmWA xkdTjRZ09F1jloI1yO7JDQ2DuA2Jwi9eEyv7uqNbve6TrPftVujfi2kPpGIBuLoY4EC0 ARm069rNNRjb5W1r7e8Pow5SwWZPp3YpQm6Uh47biRuXcu2Bd36uaMX2opFMYEe2QYgu 0zYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=y3sPy044K8OK/fHnemDT9h+eBrVf8ED4HLMVS2xW7Ww=; b=NWgIE3UQH/L7ARaImN6wpRyVq6ztr7xvKUjO3Cx5N9spkPsL0EVsL5WWrnpFhDm3z4 2C2dEDaJuM7StvXz4CVD7FnRyYCfptDH1uortjwVvW6A2A8VEAE4Gyqtb1DHlkLDV8pW /R2bixVTOpDe6oPLsIx4SuVrLGc1B4VzUd6Un/rntL+JsXIyz3HyVxbbqXoLHbsyLtXF D17APsOpYsfD2zO+6uVZ1flwuQ8bpJiC7A6UWRHNrHPe6/gtDennV1W89/XpI596pzHz J5bj1bDctwpNjVAZbJtYJW770N+rtBtv9gDf5krhJFz2AGIJdqIsTvrPC8+O/Au0NsMU RU7A== X-Gm-Message-State: AOAM532p2o+75+CKlDxHXqQ1QYGUANOTVIjqM+5zmJkJJQNS+RlAhlSV r0RBeeueylGJVhJmCh0rRzTWbBZ3mhGPHw== X-Google-Smtp-Source: ABdhPJwQKZ1gEUuuRDb10ggdD8J/o5EHLicPpQsdWwjLGG8ZFCGBr/8ITx253xVJibGmkOimKln6HA== X-Received: by 2002:a17:90a:4f85:: with SMTP id q5mr1997681pjh.42.1614238432716; Wed, 24 Feb 2021 23:33:52 -0800 (PST) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id w3sm4917561pjt.24.2021.02.24.23.33.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Feb 2021 23:33:51 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Hugh Dickins , Andy Lutomirski , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Borislav Petkov , Nadav Amit , Sean Christopherson , Andrew Morton , x86@kernel.org Subject: [RFC 3/6] x86/vdso: introduce page_prefetch() Date: Wed, 24 Feb 2021 23:29:07 -0800 Message-Id: <20210225072910.2811795-4-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210225072910.2811795-1-namit@vmware.com> References: <20210225072910.2811795-1-namit@vmware.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1BBDAE0001B4 X-Stat-Signature: rto19qecxzesnjs31p8cuisprdfidcrh Received-SPF: none (<>: No applicable sender policy available) receiver=imf30; identity=mailfrom; envelope-from="<>"; helo=mail-pj1-f52.google.com; client-ip=209.85.216.52 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614238433-109115 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Introduce a new vDSO function: page_prefetch() which is to be used when certain memory, which might be paged out, is expected to be used soon. The function prefetches the page if needed. The function returns zero if the page is accessible after the call and -1 otherwise. page_prefetch() is intended to be very lightweight both when the page is already present and when the page is prefetched. The implementation leverages the new vDSO exception tables mechanism. page_prefetch() accesses the page for read and has a corresponding vDSO exception-table entry that indicates that a #PF might occur and that in such case the page should be brought asynchronously. If #PF indeed occurs, the page-fault handler sets the FAULT_FLAG_RETRY_NOWAIT flag. If the page-fault was not resolved, the page-fault handler does not retry, and instead jumps to the new IP that is marked in the exception table. The vDSO part returns accordingly the return value. Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Andrew Morton Cc: x86@kernel.org Signed-off-by: Nadav Amit --- arch/x86/Kconfig | 1 + arch/x86/entry/vdso/Makefile | 1 + arch/x86/entry/vdso/extable.c | 59 +++++++++++++++++++++++++-------- arch/x86/entry/vdso/vdso.lds.S | 1 + arch/x86/entry/vdso/vprefetch.S | 39 ++++++++++++++++++++++ arch/x86/include/asm/vdso.h | 38 +++++++++++++++++++-- arch/x86/mm/fault.c | 11 ++++-- lib/vdso/Kconfig | 5 +++ 8 files changed, 136 insertions(+), 19 deletions(-) create mode 100644 arch/x86/entry/vdso/vprefetch.S diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 21f851179ff0..86a4c265e8af 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -136,6 +136,7 @@ config X86 select GENERIC_TIME_VSYSCALL select GENERIC_GETTIMEOFDAY select GENERIC_VDSO_TIME_NS + select GENERIC_VDSO_PREFETCH select GUP_GET_PTE_LOW_HIGH if X86_PAE select HARDIRQS_SW_RESEND select HARDLOCKUP_CHECK_TIMESTAMP if X86_64 diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 02e3e42f380b..e32ca1375b84 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -28,6 +28,7 @@ vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o vobjs32-y += vdso32/vclock_gettime.o vobjs-$(CONFIG_X86_SGX) += vsgx.o +vobjs-$(CONFIG_GENERIC_VDSO_PREFETCH) += vprefetch.o # files to link into kernel obj-y += vma.o extable.o diff --git a/arch/x86/entry/vdso/extable.c b/arch/x86/entry/vdso/extable.c index 93fb37bd32ad..e821887112ce 100644 --- a/arch/x86/entry/vdso/extable.c +++ b/arch/x86/entry/vdso/extable.c @@ -4,36 +4,67 @@ #include #include #include +#include "extable.h" struct vdso_exception_table_entry { int insn, fixup; unsigned int mask, flags; }; -bool fixup_vdso_exception(struct pt_regs *regs, int trapnr, - unsigned long error_code, unsigned long fault_addr) +static unsigned long +get_vdso_exception_table_entry(const struct pt_regs *regs, int trapnr, + unsigned int *flags) { const struct vdso_image *image = current->mm->context.vdso_image; const struct vdso_exception_table_entry *extable; unsigned int nr_entries, i; unsigned long base; + unsigned long ip = regs->ip; + unsigned long vdso_base = (unsigned long)current->mm->context.vdso; - if (!current->mm->context.vdso) - return false; - - base = (unsigned long)current->mm->context.vdso + image->extable_base; + base = vdso_base + image->extable_base; nr_entries = image->extable_len / (sizeof(*extable)); extable = image->extable; for (i = 0; i < nr_entries; i++, base += sizeof(*extable)) { - if (regs->ip == base + extable[i].insn) { - regs->ip = base + extable[i].fixup; - regs->di = trapnr; - regs->si = error_code; - regs->dx = fault_addr; - return true; - } + if (ip != base + extable[i].insn) + continue; + + if (!((1u << trapnr) & extable[i].mask)) + continue; + + /* found */ + if (flags) + *flags = extable[i].flags; + return base + extable[i].fixup; } - return false; + return 0; +} + +bool __fixup_vdso_exception(struct pt_regs *regs, int trapnr, + unsigned long error_code, unsigned long fault_addr) +{ + unsigned long new_ip; + + new_ip = get_vdso_exception_table_entry(regs, trapnr, NULL); + if (!new_ip) + return false; + + instruction_pointer_set(regs, new_ip); + regs->di = trapnr; + regs->si = error_code; + regs->dx = fault_addr; + return true; +} + +__attribute_const__ bool __is_async_vdso_exception(struct pt_regs *regs, + int trapnr) +{ + unsigned long new_ip; + unsigned int flags; + + new_ip = get_vdso_exception_table_entry(regs, trapnr, &flags); + + return new_ip && (flags & ASM_VDSO_ASYNC_FLAGS); } diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S index 4bf48462fca7..fd4ba24571c8 100644 --- a/arch/x86/entry/vdso/vdso.lds.S +++ b/arch/x86/entry/vdso/vdso.lds.S @@ -28,6 +28,7 @@ VERSION { clock_getres; __vdso_clock_getres; __vdso_sgx_enter_enclave; + __vdso_prefetch_page; local: *; }; } diff --git a/arch/x86/entry/vdso/vprefetch.S b/arch/x86/entry/vdso/vprefetch.S new file mode 100644 index 000000000000..a0fcafb7d546 --- /dev/null +++ b/arch/x86/entry/vdso/vprefetch.S @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include +#include +#include +#include + +#include "extable.h" + +.code64 +.section .text, "ax" + +SYM_FUNC_START(__vdso_prefetch_page) + /* Prolog */ + .cfi_startproc + push %rbp + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset %rbp, 0 + mov %rsp, %rbp + .cfi_def_cfa_register %rbp + + xor %rax, %rax +.Laccess_page: + movb (%rdi), %dil +.Lout: + + /* Epilog */ + pop %rbp + .cfi_def_cfa %rsp, 8 + ret + +.Lhandle_exception: + mov $-1ll, %rax + jmp .Lout + .cfi_endproc +ASM_VDSO_EXTABLE_HANDLE .Laccess_page, .Lhandle_exception, \ + (1< +#include struct vdso_image { void *data; @@ -49,9 +50,40 @@ extern void __init init_vdso_image(const struct vdso_image *image); extern int map_vdso_once(const struct vdso_image *image, unsigned long addr); -extern bool fixup_vdso_exception(struct pt_regs *regs, int trapnr, - unsigned long error_code, - unsigned long fault_addr); +extern bool __fixup_vdso_exception(struct pt_regs *regs, int trapnr, + unsigned long error_code, + unsigned long fault_addr); + +extern __attribute_const__ bool __is_async_vdso_exception(struct pt_regs *regs, + int trapnr); + +static inline bool is_exception_in_vdso(struct pt_regs *regs) +{ + const struct vdso_image *image = current->mm->context.vdso_image; + unsigned long vdso_base = (unsigned long)current->mm->context.vdso; + + return regs->ip >= vdso_base && regs->ip < vdso_base + image->size && + vdso_base != 0; +} + +static inline bool is_async_vdso_exception(struct pt_regs *regs, int trapnr) +{ + if (!is_exception_in_vdso(regs)) + return false; + + return __is_async_vdso_exception(regs, trapnr); +} + +static inline bool fixup_vdso_exception(struct pt_regs *regs, int trapnr, + unsigned long error_code, + unsigned long fault_addr) +{ + if (is_exception_in_vdso(regs)) + return __fixup_vdso_exception(regs, trapnr, error_code, + fault_addr); + return false; +} + #endif /* __ASSEMBLER__ */ #endif /* _ASM_X86_VDSO_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index f1f1b5a0956a..87d8ae46510c 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1289,6 +1289,10 @@ void do_user_addr_fault(struct pt_regs *regs, if (user_mode(regs)) { local_irq_enable(); flags |= FAULT_FLAG_USER; + if (IS_ENABLED(CONFIG_GENERIC_VDSO_PREFETCH) && + is_async_vdso_exception(regs, X86_TRAP_PF)) + flags |= FAULT_FLAG_ALLOW_RETRY | + FAULT_FLAG_RETRY_NOWAIT; } else { if (regs->flags & X86_EFLAGS_IF) local_irq_enable(); @@ -1407,8 +1411,11 @@ void do_user_addr_fault(struct pt_regs *regs, */ if (unlikely((fault & VM_FAULT_RETRY) && (flags & FAULT_FLAG_ALLOW_RETRY))) { - flags |= FAULT_FLAG_TRIED; - goto retry; + if (!(flags & FAULT_FLAG_RETRY_NOWAIT)) { + flags |= FAULT_FLAG_TRIED; + goto retry; + } + fixup_vdso_exception(regs, X86_TRAP_PF, hw_error_code, address); } mmap_read_unlock(mm); diff --git a/lib/vdso/Kconfig b/lib/vdso/Kconfig index d883ac299508..a64d2b08b6f4 100644 --- a/lib/vdso/Kconfig +++ b/lib/vdso/Kconfig @@ -30,4 +30,9 @@ config GENERIC_VDSO_TIME_NS Selected by architectures which support time namespaces in the VDSO +config GENERIC_VDSO_PREFETCH + bool + help + Selected by architectures which support page prefetch VDSO + endif