From patchwork Wed Aug 7 06:54:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081151 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AA74E912 for ; Wed, 7 Aug 2019 06:56:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B10D288C5 for ; Wed, 7 Aug 2019 06:56:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8F9CD288D7; Wed, 7 Aug 2019 06:56:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 53FB2288DA for ; Wed, 7 Aug 2019 06:56:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727459AbfHGGzE (ORCPT ); Wed, 7 Aug 2019 02:55:04 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41430 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726883AbfHGGzE (ORCPT ); Wed, 7 Aug 2019 02:55:04 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2297C9B28F; Wed, 7 Aug 2019 06:55:04 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id CB6001001281; Wed, 7 Aug 2019 06:55:01 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca Subject: [PATCH V3 01/10] vhost: disable metadata prefetch optimization Date: Wed, 7 Aug 2019 02:54:40 -0400 Message-Id: <20190807065449.23373-2-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 07 Aug 2019 06:55:04 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: "Michael S. Tsirkin" This seems to cause guest and host memory corruption. Disable for now until we get a better handle on that. Signed-off-by: Michael S. Tsirkin --- drivers/vhost/vhost.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 819296332913..42a8c2a13ab1 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -96,7 +96,7 @@ struct vhost_uaddr { }; #if defined(CONFIG_MMU_NOTIFIER) && ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 0 -#define VHOST_ARCH_CAN_ACCEL_UACCESS 1 +#define VHOST_ARCH_CAN_ACCEL_UACCESS 0 #else #define VHOST_ARCH_CAN_ACCEL_UACCESS 0 #endif From patchwork Wed Aug 7 06:54:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081149 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C75A13AC for ; Wed, 7 Aug 2019 06:56:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F043528893 for ; Wed, 7 Aug 2019 06:56:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E4EA1288D7; Wed, 7 Aug 2019 06:56:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A86B128893 for ; Wed, 7 Aug 2019 06:56:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727637AbfHGGzI (ORCPT ); Wed, 7 Aug 2019 02:55:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46430 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726883AbfHGGzH (ORCPT ); Wed, 7 Aug 2019 02:55:07 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6FB273DE02; Wed, 7 Aug 2019 06:55:07 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8EE3D1001281; Wed, 7 Aug 2019 06:55:04 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca, Jason Wang Subject: [PATCH V3 02/10] vhost: don't set uaddr for invalid address Date: Wed, 7 Aug 2019 02:54:41 -0400 Message-Id: <20190807065449.23373-3-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 07 Aug 2019 06:55:07 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We should not setup uaddr for the invalid address, otherwise we may try to pin or prefetch mapping of wrong pages. Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 0536f8526359..488380a581dc 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2082,7 +2082,8 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, } #if VHOST_ARCH_CAN_ACCEL_UACCESS - vhost_setup_vq_uaddr(vq); + if (r == 0) + vhost_setup_vq_uaddr(vq); if (d->mm) mmu_notifier_register(&d->mmu_notifier, d->mm); From patchwork Wed Aug 7 06:54:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081147 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E0BC13AC for ; Wed, 7 Aug 2019 06:56:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E05B28893 for ; Wed, 7 Aug 2019 06:56:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 523D1288D7; Wed, 7 Aug 2019 06:56:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F27F128893 for ; Wed, 7 Aug 2019 06:56:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727700AbfHGGzL (ORCPT ); Wed, 7 Aug 2019 02:55:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34016 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726883AbfHGGzK (ORCPT ); Wed, 7 Aug 2019 02:55:10 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6EB818EA41; Wed, 7 Aug 2019 06:55:10 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id E2C7B10016F3; Wed, 7 Aug 2019 06:55:07 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca, Jason Wang Subject: [PATCH V3 03/10] vhost: validate MMU notifier registration Date: Wed, 7 Aug 2019 02:54:42 -0400 Message-Id: <20190807065449.23373-4-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 07 Aug 2019 06:55:10 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The return value of mmu_notifier_register() is not checked in vhost_vring_set_num_addr(). This will cause an out of sync between mm and MMU notifier thus a double free. To solve this, introduce a boolean flag to track whether MMU notifier is registered and only do unregistering when it was true. Reported-and-tested-by: syzbot+e58112d71f77113ddb7b@syzkaller.appspotmail.com Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 19 +++++++++++++++---- drivers/vhost/vhost.h | 1 + 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 488380a581dc..17f6abea192e 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -629,6 +629,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; + dev->has_notifier = false; init_llist_head(&dev->work_list); init_waitqueue_head(&dev->wait); INIT_LIST_HEAD(&dev->read_list); @@ -730,6 +731,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev) if (err) goto err_mmu_notifier; #endif + dev->has_notifier = true; return 0; @@ -959,7 +961,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev) } if (dev->mm) { #if VHOST_ARCH_CAN_ACCEL_UACCESS - mmu_notifier_unregister(&dev->mmu_notifier, dev->mm); + if (dev->has_notifier) { + mmu_notifier_unregister(&dev->mmu_notifier, + dev->mm); + dev->has_notifier = false; + } #endif mmput(dev->mm); } @@ -2064,8 +2070,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, /* Unregister MMU notifer to allow invalidation callback * can access vq->uaddrs[] without holding a lock. */ - if (d->mm) + if (d->has_notifier) { mmu_notifier_unregister(&d->mmu_notifier, d->mm); + d->has_notifier = false; + } vhost_uninit_vq_maps(vq); #endif @@ -2085,8 +2093,11 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, if (r == 0) vhost_setup_vq_uaddr(vq); - if (d->mm) - mmu_notifier_register(&d->mmu_notifier, d->mm); + if (d->mm) { + r = mmu_notifier_register(&d->mmu_notifier, d->mm); + if (!r) + d->has_notifier = true; + } #endif mutex_unlock(&vq->mutex); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 42a8c2a13ab1..a9a2a93857d2 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -214,6 +214,7 @@ struct vhost_dev { int iov_limit; int weight; int byte_weight; + bool has_notifier; }; bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len); From patchwork Wed Aug 7 06:54:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081121 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D9F5A13AC for ; Wed, 7 Aug 2019 06:55:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CA2232863C for ; Wed, 7 Aug 2019 06:55:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BDE11288C5; Wed, 7 Aug 2019 06:55:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 486322863C for ; Wed, 7 Aug 2019 06:55:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727923AbfHGGzQ (ORCPT ); Wed, 7 Aug 2019 02:55:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48916 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726883AbfHGGzO (ORCPT ); Wed, 7 Aug 2019 02:55:14 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6DA583066FD9; Wed, 7 Aug 2019 06:55:14 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id E10841000324; Wed, 7 Aug 2019 06:55:10 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca, Jason Wang Subject: [PATCH V3 04/10] vhost: fix vhost map leak Date: Wed, 7 Aug 2019 02:54:43 -0400 Message-Id: <20190807065449.23373-5-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Wed, 07 Aug 2019 06:55:14 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We don't free map during vhost_map_unprefetch(). This means it could be leaked. Fixing by free the map. Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 17f6abea192e..2a3154976277 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -302,9 +302,7 @@ static void vhost_vq_meta_reset(struct vhost_dev *d) static void vhost_map_unprefetch(struct vhost_map *map) { kfree(map->pages); - map->pages = NULL; - map->npages = 0; - map->addr = NULL; + kfree(map); } static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) From patchwork Wed Aug 7 06:54:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081123 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 67D22912 for ; Wed, 7 Aug 2019 06:55:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 56AFA2863C for ; Wed, 7 Aug 2019 06:55:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4AEE1288C5; Wed, 7 Aug 2019 06:55:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6E122863C for ; Wed, 7 Aug 2019 06:55:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728044AbfHGGzX (ORCPT ); Wed, 7 Aug 2019 02:55:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51724 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728003AbfHGGzV (ORCPT ); Wed, 7 Aug 2019 02:55:21 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B7B0230C061C; Wed, 7 Aug 2019 06:55:21 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2503F1000324; Wed, 7 Aug 2019 06:55:14 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca, Jason Wang Subject: [PATCH V3 05/10] vhost: reset invalidate_count in vhost_set_vring_num_addr() Date: Wed, 7 Aug 2019 02:54:44 -0400 Message-Id: <20190807065449.23373-6-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Wed, 07 Aug 2019 06:55:21 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The vhost_set_vring_num_addr() could be called in the middle of invalidate_range_start() and invalidate_range_end(). If we don't reset invalidate_count after the un-registering of MMU notifier, the invalidate_cont will run out of sync (e.g never reach zero). This will in fact disable the fast accessor path. Fixing by reset the count to zero. Reported-by: Michael S. Tsirkin Reported-by: Jason Gunthorpe Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 2a3154976277..2a7217c33668 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2073,6 +2073,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, d->has_notifier = false; } + /* reset invalidate_count in case we are in the middle of + * invalidate_start() and invalidate_end(). + */ + vq->invalidate_count = 0; vhost_uninit_vq_maps(vq); #endif From patchwork Wed Aug 7 06:54:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081145 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2418E912 for ; Wed, 7 Aug 2019 06:56:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1407227FAE for ; Wed, 7 Aug 2019 06:56:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 08AC0284C8; Wed, 7 Aug 2019 06:56:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B167B27FAE for ; Wed, 7 Aug 2019 06:56:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728188AbfHGGzZ (ORCPT ); Wed, 7 Aug 2019 02:55:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:6523 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728121AbfHGGzY (ORCPT ); Wed, 7 Aug 2019 02:55:24 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B30CC30C5859; Wed, 7 Aug 2019 06:55:24 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id 37BF91000324; Wed, 7 Aug 2019 06:55:21 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca, Jason Wang Subject: [PATCH V3 06/10] vhost: mark dirty pages during map uninit Date: Wed, 7 Aug 2019 02:54:45 -0400 Message-Id: <20190807065449.23373-7-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Wed, 07 Aug 2019 06:55:24 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We don't mark dirty pages if the map was teared down outside MMU notifier. This will lead untracked dirty pages. Fixing by marking dirty pages during map uninit. Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 2a7217c33668..c12cdadb0855 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -305,6 +305,18 @@ static void vhost_map_unprefetch(struct vhost_map *map) kfree(map); } +static void vhost_set_map_dirty(struct vhost_virtqueue *vq, + struct vhost_map *map, int index) +{ + struct vhost_uaddr *uaddr = &vq->uaddrs[index]; + int i; + + if (uaddr->write) { + for (i = 0; i < map->npages; i++) + set_page_dirty(map->pages[i]); + } +} + static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) { struct vhost_map *map[VHOST_NUM_ADDRS]; @@ -314,8 +326,10 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) for (i = 0; i < VHOST_NUM_ADDRS; i++) { map[i] = rcu_dereference_protected(vq->maps[i], lockdep_is_held(&vq->mmu_lock)); - if (map[i]) + if (map[i]) { + vhost_set_map_dirty(vq, map[i], i); rcu_assign_pointer(vq->maps[i], NULL); + } } spin_unlock(&vq->mmu_lock); @@ -353,7 +367,6 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, { struct vhost_uaddr *uaddr = &vq->uaddrs[index]; struct vhost_map *map; - int i; if (!vhost_map_range_overlap(uaddr, start, end)) return; @@ -364,10 +377,7 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, map = rcu_dereference_protected(vq->maps[index], lockdep_is_held(&vq->mmu_lock)); if (map) { - if (uaddr->write) { - for (i = 0; i < map->npages; i++) - set_page_dirty(map->pages[i]); - } + vhost_set_map_dirty(vq, map, index); rcu_assign_pointer(vq->maps[index], NULL); } spin_unlock(&vq->mmu_lock); From patchwork Wed Aug 7 06:54:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081143 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B24C4912 for ; Wed, 7 Aug 2019 06:56:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A25D127FAE for ; Wed, 7 Aug 2019 06:56:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 96F04284C8; Wed, 7 Aug 2019 06:56:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5555427FAE for ; Wed, 7 Aug 2019 06:56:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728321AbfHGGz2 (ORCPT ); Wed, 7 Aug 2019 02:55:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:13179 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727675AbfHGGz1 (ORCPT ); Wed, 7 Aug 2019 02:55:27 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AD0ED30C5859; Wed, 7 Aug 2019 06:55:27 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id 33C2A1001281; Wed, 7 Aug 2019 06:55:24 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca, Jason Wang Subject: [PATCH V3 07/10] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps() Date: Wed, 7 Aug 2019 02:54:46 -0400 Message-Id: <20190807065449.23373-8-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Wed, 07 Aug 2019 06:55:27 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There's no need for RCU synchronization in vhost_uninit_vq_maps() since we've already serialized with readers (memory accessors). This also avoid the possible userspace DOS through ioctl() because of the possible high latency caused by synchronize_rcu(). Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index c12cdadb0855..cfc11f9ed9c9 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -333,7 +333,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) } spin_unlock(&vq->mmu_lock); - synchronize_rcu(); + /* No need for synchronize_rcu() or kfree_rcu() since we are + * serialized with memory accessors (e.g vq mutex held). + */ for (i = 0; i < VHOST_NUM_ADDRS; i++) if (map[i]) From patchwork Wed Aug 7 06:54:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081141 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4874813AC for ; Wed, 7 Aug 2019 06:55:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3778927FAE for ; Wed, 7 Aug 2019 06:55:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2BEA1284C8; Wed, 7 Aug 2019 06:55:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EFF9027FAE for ; Wed, 7 Aug 2019 06:55:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728435AbfHGGzc (ORCPT ); Wed, 7 Aug 2019 02:55:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41462 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727675AbfHGGzb (ORCPT ); Wed, 7 Aug 2019 02:55:31 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AB692307BCC4; Wed, 7 Aug 2019 06:55:30 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2E3881000324; Wed, 7 Aug 2019 06:55:27 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca, Jason Wang Subject: [PATCH V3 08/10] vhost: do not use RCU to synchronize MMU notifier with worker Date: Wed, 7 Aug 2019 02:54:47 -0400 Message-Id: <20190807065449.23373-9-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Wed, 07 Aug 2019 06:55:30 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We used to use RCU to synchronize MMU notifier with worker. This leads calling synchronize_rcu() in invalidate_range_start(). But on a busy system, there would be many factors that may slow down the synchronize_rcu() which makes it unsuitable to be called in MMU notifier. So this patch switches use seqlock counter to track whether or not the map was used. The counter was increased when vq try to start or finish uses the map. This means, when it was even, we're sure there's no readers and MMU notifier is synchronized. When it was odd, it means there's a reader we need to wait it to be even again then we are synchronized. Consider the read critical section is pretty small the synchronization should be done very fast. Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 141 ++++++++++++++++++++++++++---------------- drivers/vhost/vhost.h | 7 ++- 2 files changed, 90 insertions(+), 58 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index cfc11f9ed9c9..57bfbb60d960 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -324,17 +324,16 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) spin_lock(&vq->mmu_lock); for (i = 0; i < VHOST_NUM_ADDRS; i++) { - map[i] = rcu_dereference_protected(vq->maps[i], - lockdep_is_held(&vq->mmu_lock)); + map[i] = vq->maps[i]; if (map[i]) { vhost_set_map_dirty(vq, map[i], i); - rcu_assign_pointer(vq->maps[i], NULL); + vq->maps[i] = NULL; } } spin_unlock(&vq->mmu_lock); - /* No need for synchronize_rcu() or kfree_rcu() since we are - * serialized with memory accessors (e.g vq mutex held). + /* No need for synchronization since we are serialized with + * memory accessors (e.g vq mutex held). */ for (i = 0; i < VHOST_NUM_ADDRS; i++) @@ -362,6 +361,40 @@ static bool vhost_map_range_overlap(struct vhost_uaddr *uaddr, return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 + uaddr->size); } +static void inline vhost_vq_access_map_begin(struct vhost_virtqueue *vq) +{ + write_seqcount_begin(&vq->seq); +} + +static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq) +{ + write_seqcount_end(&vq->seq); +} + +static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq) +{ + unsigned int seq; + + /* Make sure any changes to map was done before checking seq + * counter. Paired with smp_wmb() in write_seqcount_begin(). + */ + smp_mb(); + seq = raw_read_seqcount(&vq->seq); + /* Odd means the map was currently accessed by vhost worker */ + if (seq & 0x1) { + /* When seq changes, we are sure no reader can see + * previous map */ + while (raw_read_seqcount(&vq->seq) == seq) { + if (need_resched()) + schedule(); + } + } + /* Make sure seq counter was checked before map is + * freed. Paired with smp_wmb() in write_seqcount_end(). + */ + smp_mb(); +} + static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, int index, unsigned long start, @@ -376,16 +409,15 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, spin_lock(&vq->mmu_lock); ++vq->invalidate_count; - map = rcu_dereference_protected(vq->maps[index], - lockdep_is_held(&vq->mmu_lock)); + map = vq->maps[index]; if (map) { vhost_set_map_dirty(vq, map, index); - rcu_assign_pointer(vq->maps[index], NULL); + vq->maps[index] = NULL; } spin_unlock(&vq->mmu_lock); if (map) { - synchronize_rcu(); + vhost_vq_sync_access(vq); vhost_map_unprefetch(map); } } @@ -457,7 +489,7 @@ static void vhost_init_maps(struct vhost_dev *dev) for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; for (j = 0; j < VHOST_NUM_ADDRS; j++) - RCU_INIT_POINTER(vq->maps[j], NULL); + vq->maps[j] = NULL; } } #endif @@ -655,6 +687,7 @@ void vhost_dev_init(struct vhost_dev *dev, vq->indirect = NULL; vq->heads = NULL; vq->dev = dev; + seqcount_init(&vq->seq); mutex_init(&vq->mutex); spin_lock_init(&vq->mmu_lock); vhost_vq_reset(dev, vq); @@ -921,7 +954,7 @@ static int vhost_map_prefetch(struct vhost_virtqueue *vq, map->npages = npages; map->pages = pages; - rcu_assign_pointer(vq->maps[index], map); + vq->maps[index] = map; /* No need for a synchronize_rcu(). This function should be * called by dev->worker so we are serialized with all * readers. @@ -1216,18 +1249,18 @@ static inline int vhost_put_avail_event(struct vhost_virtqueue *vq) struct vring_used *used; if (!vq->iotlb) { - rcu_read_lock(); + vhost_vq_access_map_begin(vq); - map = rcu_dereference(vq->maps[VHOST_ADDR_USED]); + map = vq->maps[VHOST_ADDR_USED]; if (likely(map)) { used = map->addr; *((__virtio16 *)&used->ring[vq->num]) = cpu_to_vhost16(vq, vq->avail_idx); - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1245,18 +1278,18 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq, size_t size; if (!vq->iotlb) { - rcu_read_lock(); + vhost_vq_access_map_begin(vq); - map = rcu_dereference(vq->maps[VHOST_ADDR_USED]); + map = vq->maps[VHOST_ADDR_USED]; if (likely(map)) { used = map->addr; size = count * sizeof(*head); memcpy(used->ring + idx, head, size); - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1272,17 +1305,17 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq) struct vring_used *used; if (!vq->iotlb) { - rcu_read_lock(); + vhost_vq_access_map_begin(vq); - map = rcu_dereference(vq->maps[VHOST_ADDR_USED]); + map = vq->maps[VHOST_ADDR_USED]; if (likely(map)) { used = map->addr; used->flags = cpu_to_vhost16(vq, vq->used_flags); - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1298,17 +1331,17 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq) struct vring_used *used; if (!vq->iotlb) { - rcu_read_lock(); + vhost_vq_access_map_begin(vq); - map = rcu_dereference(vq->maps[VHOST_ADDR_USED]); + map = vq->maps[VHOST_ADDR_USED]; if (likely(map)) { used = map->addr; used->idx = cpu_to_vhost16(vq, vq->last_used_idx); - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1362,17 +1395,17 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq, struct vring_avail *avail; if (!vq->iotlb) { - rcu_read_lock(); + vhost_vq_access_map_begin(vq); - map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]); + map = vq->maps[VHOST_ADDR_AVAIL]; if (likely(map)) { avail = map->addr; *idx = avail->idx; - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1387,17 +1420,17 @@ static inline int vhost_get_avail_head(struct vhost_virtqueue *vq, struct vring_avail *avail; if (!vq->iotlb) { - rcu_read_lock(); + vhost_vq_access_map_begin(vq); - map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]); + map = vq->maps[VHOST_ADDR_AVAIL]; if (likely(map)) { avail = map->addr; *head = avail->ring[idx & (vq->num - 1)]; - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1413,17 +1446,17 @@ static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq, struct vring_avail *avail; if (!vq->iotlb) { - rcu_read_lock(); + vhost_vq_access_map_begin(vq); - map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]); + map = vq->maps[VHOST_ADDR_AVAIL]; if (likely(map)) { avail = map->addr; *flags = avail->flags; - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1438,15 +1471,15 @@ static inline int vhost_get_used_event(struct vhost_virtqueue *vq, struct vring_avail *avail; if (!vq->iotlb) { - rcu_read_lock(); - map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]); + vhost_vq_access_map_begin(vq); + map = vq->maps[VHOST_ADDR_AVAIL]; if (likely(map)) { avail = map->addr; *event = (__virtio16)avail->ring[vq->num]; - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1461,17 +1494,17 @@ static inline int vhost_get_used_idx(struct vhost_virtqueue *vq, struct vring_used *used; if (!vq->iotlb) { - rcu_read_lock(); + vhost_vq_access_map_begin(vq); - map = rcu_dereference(vq->maps[VHOST_ADDR_USED]); + map = vq->maps[VHOST_ADDR_USED]; if (likely(map)) { used = map->addr; *idx = used->idx; - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1486,17 +1519,17 @@ static inline int vhost_get_desc(struct vhost_virtqueue *vq, struct vring_desc *d; if (!vq->iotlb) { - rcu_read_lock(); + vhost_vq_access_map_begin(vq); - map = rcu_dereference(vq->maps[VHOST_ADDR_DESC]); + map = vq->maps[VHOST_ADDR_DESC]; if (likely(map)) { d = map->addr; *desc = *(d + idx); - rcu_read_unlock(); + vhost_vq_access_map_end(vq); return 0; } - rcu_read_unlock(); + vhost_vq_access_map_end(vq); } #endif @@ -1843,13 +1876,11 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq, #if VHOST_ARCH_CAN_ACCEL_UACCESS static void vhost_vq_map_prefetch(struct vhost_virtqueue *vq) { - struct vhost_map __rcu *map; + struct vhost_map *map; int i; for (i = 0; i < VHOST_NUM_ADDRS; i++) { - rcu_read_lock(); - map = rcu_dereference(vq->maps[i]); - rcu_read_unlock(); + map = vq->maps[i]; if (unlikely(!map)) vhost_map_prefetch(vq, i); } diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index a9a2a93857d2..12399e7c7a61 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -115,16 +115,17 @@ struct vhost_virtqueue { #if VHOST_ARCH_CAN_ACCEL_UACCESS /* Read by memory accessors, modified by meta data * prefetching, MMU notifier and vring ioctl(). - * Synchonrized through mmu_lock (writers) and RCU (writers - * and readers). + * Synchonrized through mmu_lock (writers) and seqlock + * counters, see vhost_vq_access_map_{begin|end}(). */ - struct vhost_map __rcu *maps[VHOST_NUM_ADDRS]; + struct vhost_map *maps[VHOST_NUM_ADDRS]; /* Read by MMU notifier, modified by vring ioctl(), * synchronized through MMU notifier * registering/unregistering. */ struct vhost_uaddr uaddrs[VHOST_NUM_ADDRS]; #endif + seqcount_t seq; const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS]; struct file *kick; From patchwork Wed Aug 7 06:54:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081135 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7A3113AC for ; Wed, 7 Aug 2019 06:55:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A7BDB27FAE for ; Wed, 7 Aug 2019 06:55:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B048284C8; Wed, 7 Aug 2019 06:55:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5CE1F27FAE for ; Wed, 7 Aug 2019 06:55:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728558AbfHGGzh (ORCPT ); Wed, 7 Aug 2019 02:55:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53648 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728523AbfHGGzg (ORCPT ); Wed, 7 Aug 2019 02:55:36 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 10D5614AFAA; Wed, 7 Aug 2019 06:55:36 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2D84A1001938; Wed, 7 Aug 2019 06:55:30 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca, Jason Wang Subject: [PATCH V3 09/10] vhost: correctly set dirty pages in MMU notifiers callback Date: Wed, 7 Aug 2019 02:54:48 -0400 Message-Id: <20190807065449.23373-10-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Wed, 07 Aug 2019 06:55:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We need make sure there's no reference on the map before trying to mark set dirty pages. Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 57bfbb60d960..6650a3ff88c1 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -410,14 +410,13 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, ++vq->invalidate_count; map = vq->maps[index]; - if (map) { - vhost_set_map_dirty(vq, map, index); + if (map) vq->maps[index] = NULL; - } spin_unlock(&vq->mmu_lock); if (map) { vhost_vq_sync_access(vq); + vhost_set_map_dirty(vq, map, index); vhost_map_unprefetch(map); } } From patchwork Wed Aug 7 06:54:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 11081139 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D3D3C912 for ; Wed, 7 Aug 2019 06:55:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C3D5627FAE for ; Wed, 7 Aug 2019 06:55:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B83FE284C8; Wed, 7 Aug 2019 06:55:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C14127FAE for ; Wed, 7 Aug 2019 06:55:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728677AbfHGGzo (ORCPT ); Wed, 7 Aug 2019 02:55:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52626 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727180AbfHGGzn (ORCPT ); Wed, 7 Aug 2019 02:55:43 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2F233C08EC28; Wed, 7 Aug 2019 06:55:43 +0000 (UTC) Received: from hp-dl380pg8-01.lab.eng.pek2.redhat.com (hp-dl380pg8-01.lab.eng.pek2.redhat.com [10.73.8.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id A54A01001958; Wed, 7 Aug 2019 06:55:36 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, jgg@ziepe.ca, Jason Wang Subject: [PATCH V3 10/10] vhost: do not return -EAGAIN for non blocking invalidation too early Date: Wed, 7 Aug 2019 02:54:49 -0400 Message-Id: <20190807065449.23373-11-jasowang@redhat.com> In-Reply-To: <20190807065449.23373-1-jasowang@redhat.com> References: <20190807065449.23373-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 07 Aug 2019 06:55:43 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Instead of returning -EAGAIN unconditionally, we'd better do that only we're sure the range is overlapped with the metadata area. Reported-by: Jason Gunthorpe Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 6650a3ff88c1..0271f853fa9c 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -395,16 +395,19 @@ static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq) smp_mb(); } -static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, - int index, - unsigned long start, - unsigned long end) +static int vhost_invalidate_vq_start(struct vhost_virtqueue *vq, + int index, + unsigned long start, + unsigned long end, + bool blockable) { struct vhost_uaddr *uaddr = &vq->uaddrs[index]; struct vhost_map *map; if (!vhost_map_range_overlap(uaddr, start, end)) - return; + return 0; + else if (!blockable) + return -EAGAIN; spin_lock(&vq->mmu_lock); ++vq->invalidate_count; @@ -419,6 +422,8 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, vhost_set_map_dirty(vq, map, index); vhost_map_unprefetch(map); } + + return 0; } static void vhost_invalidate_vq_end(struct vhost_virtqueue *vq, @@ -439,18 +444,19 @@ static int vhost_invalidate_range_start(struct mmu_notifier *mn, { struct vhost_dev *dev = container_of(mn, struct vhost_dev, mmu_notifier); - int i, j; - - if (!mmu_notifier_range_blockable(range)) - return -EAGAIN; + bool blockable = mmu_notifier_range_blockable(range); + int i, j, ret; for (i = 0; i < dev->nvqs; i++) { struct vhost_virtqueue *vq = dev->vqs[i]; - for (j = 0; j < VHOST_NUM_ADDRS; j++) - vhost_invalidate_vq_start(vq, j, - range->start, - range->end); + for (j = 0; j < VHOST_NUM_ADDRS; j++) { + ret = vhost_invalidate_vq_start(vq, j, + range->start, + range->end, blockable); + if (ret) + return ret; + } } return 0;