From patchwork Thu Jan 26 17:39:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Halil Pasic X-Patchwork-Id: 9539871 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4C28E601D3 for ; Thu, 26 Jan 2017 17:40:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3540C27F8F for ; Thu, 26 Jan 2017 17:40:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2A04E2824F; Thu, 26 Jan 2017 17:40:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86F9127F8F for ; Thu, 26 Jan 2017 17:40:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753881AbdAZRj6 (ORCPT ); Thu, 26 Jan 2017 12:39:58 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:38907 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752843AbdAZRj4 (ORCPT ); Thu, 26 Jan 2017 12:39:56 -0500 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v0QHcbbj069267 for ; Thu, 26 Jan 2017 12:39:21 -0500 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0a-001b2d01.pphosted.com with ESMTP id 287m9ybxah-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 26 Jan 2017 12:39:21 -0500 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 26 Jan 2017 17:39:18 -0000 Received: from d06dlp03.portsmouth.uk.ibm.com (9.149.20.15) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 26 Jan 2017 17:39:15 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 794B31B08023; Thu, 26 Jan 2017 17:42:02 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v0QHdEx39765280; Thu, 26 Jan 2017 17:39:14 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1AA0E42047; Thu, 26 Jan 2017 17:39:12 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CFAAF42045; Thu, 26 Jan 2017 17:39:11 +0000 (GMT) Received: from oc3836556865.ibm.com (unknown [9.152.224.168]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 26 Jan 2017 17:39:11 +0000 (GMT) From: Halil Pasic Cc: "Michael S. Tsirkin" , Jason Wang , kvm@vger.kernel.org, Greg Kurz , "virtualization@lists.linux-foundation.org" , netdev@vger.kernel.org, Cornelia Huck Subject: [BUG/RFC] vhost: net: big endian viring access despite virtio 1 Date: Thu, 26 Jan 2017 18:39:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17012617-0028-0000-0000-0000029EC3B2 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17012617-0029-0000-0000-0000222B733D Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-01-26_12:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1701260171 To: unlisted-recipients:; (no To-header on input) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi! Recently I have been investigating some strange migration problems on s390x. It turned out under certain circumstances vhost_net corrupts avail.idx by using wrong endianness. I managed to track the problem down (I'm pretty sure). It boils down to the following. When stopping vhost userspace (QEMU) calls vhost_net_set_backend with the fd argument set to -1, this leads to is_le being reset in vhost_vq_init_access. On a BE system resetting to legacy means resetting to BE. Usually this is not a problem, but in the case when oldubufs is not zero (which is not likely if no network stress applied) it is a problem. That code path needs to write avail.idx, and ends up using wrong endianness when doing that (but only on a BE system). That is the story in prose, now let's see the corresponding code annotated with some comments. from drivers/vhost/net.c: static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) { /* [..] some not too interesting stuff */ sock = get_socket(fd); /* fd == -1 --> sock == NULL */ if (IS_ERR(sock)) { r = PTR_ERR(sock); goto err_vq; } /* start polling new socket */ oldsock = vq->private_data; if (sock != oldsock) { ubufs = vhost_net_ubuf_alloc(vq, sock && vhost_sock_zcopy(sock)); if (IS_ERR(ubufs)) { r = PTR_ERR(ubufs); goto err_ubufs; } vhost_net_disable_vq(n, vq); ==> vq->private_data = sock; /* now vq->private_data is NULL */ ==> r = vhost_vq_init_access(vq); if (r) goto err_used; /* vq endianness has been reset to BE on s390 */ r = vhost_net_enable_vq(n, vq); if (r) goto err_used; ==> oldubufs = nvq->ubufs; /* here oldubufs might become != 0 */ nvq->ubufs = ubufs; n->tx_packets = 0; n->tx_zcopy_err = 0; n->tx_flush = false; } mutex_unlock(&vq->mutex); if (oldubufs) { vhost_net_ubuf_put_wait_and_free(oldubufs); mutex_lock(&vq->mutex); ==> vhost_zerocopy_signal_used(n, vq); /* tries to update virtqueue structures; endianness is BE on s390 * now (but should be LE for virtio-1) */ mutex_unlock(&vq->mutex); } /*[..] rest of the function */ } from drivers/vhost/vhost.c: int vhost_vq_init_access(struct vhost_virtqueue *vq) { __virtio16 last_used_idx; int r; bool is_le = vq->is_le; if (!vq->private_data) { ==> vhost_reset_is_le(vq); /* resets to native endianness and returns */ return 0; } ==> vhost_init_is_le(vq); /* here we init is_le */ r = vhost_update_used_flags(vq); if (r) goto err; vq->signalled_used_valid = false; if (!vq->iotlb && !access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) { r = -EFAULT; goto err; } r = vhost_get_user(vq, last_used_idx, &vq->used->idx); if (r) { vq_err(vq, "Can't access used idx at %p\n", &vq->used->idx); goto err; } vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); return 0; err: vq->is_le = is_le; return r; } AFAIU this can be fixed very simply by omitting the reset. Below the patch. I'm not sure though, the endianness handling ain't simple in vhost. Am I going in the right direction? We have a test (on s390x only) running for a couple of hours now and so far so good but it's a bit early to announce it is tested for s390x. If the patch is reasonable, I'm intend to do a version with proper attribution and stuff. By the way the reset was first introduced by https://lkml.org/lkml/2015/4/10/226 (dug it up in the hope that reasons for the reset were discussed -- but no enlightenment for me). Finally I would like to credit Dave Gilbert for hinting that the unreasonable avail.idx may be due to an endianness problem and Michael A. Tebolt for reporting the bug and testing. -------------------------8<-------------- From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001 From: Halil Pasic Date: Thu, 26 Jan 2017 00:06:15 +0100 Subject: [PATCH] vhost: remove useless/dangerous reset of is_le The reset of is_le does no good, but it contributes its fair share to a bug in vhost_net, which occurs if we have some oldubufs when stopping and setting a fd = -1 as a backend. Instead of doing something convoluted in vhost_net, let's just get rid of the reset. Signed-off-by: Halil Pasic Fixes: commit 2751c9882b94 --- drivers/vhost/vhost.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d643260..08072a2 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) int r; bool is_le = vq->is_le; - if (!vq->private_data) { - vhost_reset_is_le(vq); + if (!vq->private_data) return 0; - } vhost_init_is_le(vq);