From patchwork Wed Sep 17 17:51:48 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andres Lagar-Cavilla X-Patchwork-Id: 4925851 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id DA7359F2EC for ; Wed, 17 Sep 2014 17:50:50 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9473320158 for ; Wed, 17 Sep 2014 17:52:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6F74A2015E for ; Wed, 17 Sep 2014 17:51:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756620AbaIQRvz (ORCPT ); Wed, 17 Sep 2014 13:51:55 -0400 Received: from mail-pa0-f73.google.com ([209.85.220.73]:62595 "EHLO mail-pa0-f73.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756564AbaIQRvx convert rfc822-to-8bit (ORCPT ); Wed, 17 Sep 2014 13:51:53 -0400 Received: by mail-pa0-f73.google.com with SMTP id kx10so561923pab.2 for ; Wed, 17 Sep 2014 10:51:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=rjN3cSLU46uq6CwcIzY/ojoFzQHEGTUyNHhvX2kU86c=; b=lP1QpdP+9QP6mfxJ52IrzE9/9IYBkN0ArEX/8hDUab9aI8WpdECmiKkV7o3pCDFgHD ytub6bYgbA+/uDr9WRWXNyewqSYwE0C6uZzW7iX6D4Pfwou9ibJTU7WU/H2dDlIH4cVz pMVEuKFA2QwToNSHsnTpZHpoMMHxuXtdAM5Vr+Jobt25pkbkiUrxl5r9XOKXtlPSQbM3 rgf37vDtDqfMtwS/KAIOzIdklipBswv7lGBkPlsb6AwtfWMgykuLIvrnBZLl6myPc7zf yhiS3Gce1PTYQSu4RIGQObyLXy/Me7KUTzC4MHXxVSEd900IyfckhGf+Wpt5lSX9za7L iXJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=rjN3cSLU46uq6CwcIzY/ojoFzQHEGTUyNHhvX2kU86c=; b=WaBblBV8Kp3Qyz7pqe5bCYKBjeBZTUz/iUZlmbvYAQkMlOa6GRPBwwT1hqIu1Vrx5Q q/IVWqEzLWGrJbvpnZSbnQNaFb4FcQsSTc8jAl4YXbOpwR8EgSAMKPxj4y8MqqJ8lSTq VxiAK2Cqrs24/yN9UmCJi2w4Kpc4PpoBFSd/+7WfZz9TC+Qj8rHVZAPoJ9II+Hy/Rh0m uwo4YGrYt+S98LOkssIrEZqkpuKKOJbm5jGDTs+5V5HOKEuNmr1+f9OlVYsRA7BH6fo/ 1qV6xLTHJiCvWKJ9Otyn2SQuQMzL9w9xnYuLqcrhvX12oCpWsV+K6qBmueuNuy6Aqbhz GkWg== X-Gm-Message-State: ALoCoQm+b2Fd+JpjTOK8NG1apqUQXoAWa4x4pHM69iBIefjJwgmtOalXnV7BcoYaNJuUOq0Vt2V6 X-Received: by 10.66.66.196 with SMTP id h4mr262332pat.18.1410976312563; Wed, 17 Sep 2014 10:51:52 -0700 (PDT) Received: from corpmail-nozzle1-1.hot.corp.google.com ([100.108.1.104]) by gmr-mx.google.com with ESMTPS id n24si856673yha.6.2014.09.17.10.51.51 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Sep 2014 10:51:52 -0700 (PDT) Received: from sigsegv.mtv.corp.google.com ([172.17.131.1]) by corpmail-nozzle1-1.hot.corp.google.com with ESMTP id 8dEkCnt8.1; Wed, 17 Sep 2014 10:51:52 -0700 Received: by sigsegv.mtv.corp.google.com (Postfix, from userid 256548) id C3C5012009B; Wed, 17 Sep 2014 10:51:50 -0700 (PDT) From: Andres Lagar-Cavilla To: Gleb Natapov , Radim Krcmar , Paolo Bonzini , Rik van Riel , Peter Zijlstra , Mel Gorman , Andy Lutomirski , Andrew Morton , Andrea Arcangeli , Sasha Levin , Jianyu Zhan , Paul Cassella , Hugh Dickins , Peter Feiner , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andres Lagar-Cavilla Subject: [PATCH v2] kvm: Faults which trigger IO release the mmap_sem Date: Wed, 17 Sep 2014 10:51:48 -0700 Message-Id: <1410976308-7683-1-git-send-email-andreslc@google.com> X-Mailer: git-send-email 2.1.0.rc2.206.gedb03e5 In-Reply-To: <1410811885-17267-1-git-send-email-andreslc@google.com> References: <1410811885-17267-1-git-send-email-andreslc@google.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When KVM handles a tdp fault it uses FOLL_NOWAIT. If the guest memory has been swapped out or is behind a filemap, this will trigger async readahead and return immediately. The rationale is that KVM will kick back the guest with an "async page fault" and allow for some other guest process to take over. If async PFs are enabled the fault is retried asap from an async workqueue. If not, it's retried immediately in the same code path. In either case the retry will not relinquish the mmap semaphore and will block on the IO. This is a bad thing, as other mmap semaphore users now stall as a function of swap or filemap latency. This patch ensures both the regular and async PF path re-enter the fault allowing for the mmap semaphore to be relinquished in the case of IO wait. Reviewed-by: Radim Kr?má? Signed-off-by: Andres Lagar-Cavilla Reviewed-by: Gleb Natapov Reviewed-by: Wanpeng Li --- v1 -> v2 * WARN_ON_ONCE -> VM_WARN_ON_ONCE * pagep == NULL skips the final retry * kvm_gup_retry -> kvm_gup_io * Comment updates throughout --- include/linux/kvm_host.h | 11 +++++++++++ include/linux/mm.h | 1 + mm/gup.c | 4 ++++ virt/kvm/async_pf.c | 4 +--- virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++++++++++++++++++++--- 5 files changed, 63 insertions(+), 6 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3addcbc..4c1991b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -198,6 +198,17 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva, int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #endif +/* + * Carry out a gup that requires IO. Allow the mm to relinquish the mmap + * semaphore if the filemap/swap has to wait on a page lock. pagep == NULL + * controls whether we retry the gup one more time to completion in that case. + * Typically this is called after a FAULT_FLAG_RETRY_NOWAIT in the main tdp + * handler. + */ +int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm, + unsigned long addr, bool write_fault, + struct page **pagep); + enum { OUTSIDE_GUEST_MODE, IN_GUEST_MODE, diff --git a/include/linux/mm.h b/include/linux/mm.h index ebc5f90..13e585f7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2011,6 +2011,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma, #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ #define FOLL_NUMA 0x200 /* force NUMA hinting page fault */ #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ +#define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr, void *data); diff --git a/mm/gup.c b/mm/gup.c index 91d044b..af7ea3e 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -281,6 +281,10 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_ALLOW_RETRY; if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; + if (*flags & FOLL_TRIED) { + VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY); + fault_flags |= FAULT_FLAG_TRIED; + } ret = handle_mm_fault(mm, vma, address, fault_flags); if (ret & VM_FAULT_ERROR) { diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index d6a3d09..5ff7f7f 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -80,9 +80,7 @@ static void async_pf_execute(struct work_struct *work) might_sleep(); - down_read(&mm->mmap_sem); - get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL); - up_read(&mm->mmap_sem); + kvm_get_user_page_io(NULL, mm, addr, 1, NULL); kvm_async_page_present_sync(vcpu, apf); spin_lock(&vcpu->async_pf.lock); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7ef6b48..fa8a565 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1115,6 +1115,43 @@ static int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm, return __get_user_pages(tsk, mm, start, 1, flags, page, NULL, NULL); } +int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm, + unsigned long addr, bool write_fault, + struct page **pagep) +{ + int npages; + int locked = 1; + int flags = FOLL_TOUCH | FOLL_HWPOISON | + (pagep ? FOLL_GET : 0) | + (write_fault ? FOLL_WRITE : 0); + + /* + * If retrying the fault, we get here *not* having allowed the filemap + * to wait on the page lock. We should now allow waiting on the IO with + * the mmap semaphore released. + */ + down_read(&mm->mmap_sem); + npages = __get_user_pages(tsk, mm, addr, 1, flags, pagep, NULL, + &locked); + if (!locked) { + VM_BUG_ON(npages != -EBUSY); + + if (!pagep) + return 0; + + /* + * The previous call has now waited on the IO. Now we can + * retry and complete. Pass TRIED to ensure we do not re + * schedule async IO (see e.g. filemap_fault). + */ + down_read(&mm->mmap_sem); + npages = __get_user_pages(tsk, mm, addr, 1, flags | FOLL_TRIED, + pagep, NULL, NULL); + } + up_read(&mm->mmap_sem); + return npages; +} + static inline int check_user_page_hwpoison(unsigned long addr) { int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE; @@ -1177,9 +1214,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, npages = get_user_page_nowait(current, current->mm, addr, write_fault, page); up_read(¤t->mm->mmap_sem); - } else - npages = get_user_pages_fast(addr, 1, write_fault, - page); + } else { + /* + * By now we have tried gup_fast, and possibly async_pf, and we + * are certainly not atomic. Time to retry the gup, allowing + * mmap semaphore to be relinquished in the case of IO. + */ + npages = kvm_get_user_page_io(current, current->mm, addr, + write_fault, page); + } if (npages != 1) return npages;