From patchwork Tue Mar 29 10:39:50 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Beulich <JBeulich@suse.com>
X-Patchwork-Id: 8685701
Return-Path: <xen-devel-bounces@lists.xen.org>
X-Original-To: patchwork-xen-devel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id C7AAE9F44D
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue, 29 Mar 2016 10:42:47 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 8C666201FE
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue, 29 Mar 2016 10:42:46 +0000 (UTC)
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 3563520123
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue, 29 Mar 2016 10:42:45 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1akr44-0002Dh-2z; Tue, 29 Mar 2016 10:40:00 +0000
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <JBeulich@suse.com>) id 1akr42-0002DS-VI
	for xen-devel@lists.xenproject.org; Tue, 29 Mar 2016 10:39:59 +0000
Received: from [85.158.137.68] by server-3.bemta-3.messagelabs.com id
	23/83-03294-E7B5AF65; Tue, 29 Mar 2016 10:39:58 +0000
X-Env-Sender: JBeulich@suse.com
X-Msg-Ref: server-11.tower-31.messagelabs.com!1459247995!5601279!1
X-Originating-IP: [137.65.248.74]
X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG
X-StarScan-Received: 
X-StarScan-Version: 8.11; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 49143 invoked from network); 29 Mar 2016 10:39:56 -0000
Received: from prv-mh.provo.novell.com (HELO prv-mh.provo.novell.com)
	(137.65.248.74)
	by server-11.tower-31.messagelabs.com with DHE-RSA-AES256-GCM-SHA384
	encrypted SMTP; 29 Mar 2016 10:39:56 -0000
Received: from INET-PRV-MTA by prv-mh.provo.novell.com
	with Novell_GroupWise; Tue, 29 Mar 2016 04:39:54 -0600
Message-Id: <56FA779702000078000E0D0B@prv-mh.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 14.2.0 
Date: Tue, 29 Mar 2016 04:39:50 -0600
From: "Jan Beulich" <JBeulich@suse.com>
To: "xen-devel" <xen-devel@lists.xenproject.org>
Mime-Version: 1.0
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Keir Fraser <keir@xen.org>,
	Paul Durrant <paul.durrant@citrix.com>,
	Chang Jianzhong <changjzh@gmail.com>
Subject: [Xen-devel] [PATCH v2] x86/HVM: fix forwarding of internally cached
	requests
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Forwarding entire batches to the device model when an individual
iteration of them got rejected by internal device emulation handlers
with X86EMUL_UNHANDLEABLE is wrong: The device model would then handle
all iterations, without the internal handler getting to see any past
the one it returned failure for. This causes misbehavior in at least
the MSI-X and VGA code, which want to see all such requests for
internal tracking/caching purposes. But note that this does not apply
to buffered I/O requests.

This in turn means that the condition in hvm_process_io_intercept() of
when to crash the domain was wrong: Since X86EMUL_UNHANDLEABLE can
validly be returned by the individual device handlers, we mustn't
blindly crash the domain if such occurs on other than the initial
iteration. Instead we need to distinguish hvm_copy_*_guest_phys()
failures from device specific ones, and then the former need to always
be fatal to the domain (i.e. also on the first iteration), since
otherwise we again would end up forwarding a request to qemu which the
internal handler didn't get to see.

The adjustment should be okay even for stdvga's MMIO handling:
- if it is not caching then the accept function would have failed so we
  won't get into hvm_process_io_intercept(),
- if it issued the buffered ioreq then we only get to the p->count
  reduction if hvm_send_ioreq() actually encountered an error (in which
  we don't care about the request getting split up).

Also commit 4faffc41d ("x86/hvm: limit reps to avoid the need to handle
retry") went too far in removing code from hvm_process_io_intercept():
When there were successfully handled iterations, the function should
continue to return success with a clipped repeat count.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Chang Jianzhong <changjzh@gmail.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
v2: Make hvm_process_io_intercept() always return a valid X86EMUL_*
    code by moving the domain_crash() invocations up. Add a stdvga
    related paragraph to the commit message.
---
I assume this also addresses the issue which
http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg03189.html 
attempted to deal with in a not really acceptable way.
x86/HVM: fix forwarding of internally cached requests

Forwarding entire batches to the device model when an individual
iteration of them got rejected by internal device emulation handlers
with X86EMUL_UNHANDLEABLE is wrong: The device model would then handle
all iterations, without the internal handler getting to see any past
the one it returned failure for. This causes misbehavior in at least
the MSI-X and VGA code, which want to see all such requests for
internal tracking/caching purposes. But note that this does not apply
to buffered I/O requests.

This in turn means that the condition in hvm_process_io_intercept() of
when to crash the domain was wrong: Since X86EMUL_UNHANDLEABLE can
validly be returned by the individual device handlers, we mustn't
blindly crash the domain if such occurs on other than the initial
iteration. Instead we need to distinguish hvm_copy_*_guest_phys()
failures from device specific ones, and then the former need to always
be fatal to the domain (i.e. also on the first iteration), since
otherwise we again would end up forwarding a request to qemu which the
internal handler didn't get to see.

The adjustment should be okay even for stdvga's MMIO handling:
- if it is not caching then the accept function would have failed so we
  won't get into hvm_process_io_intercept(),
- if it issued the buffered ioreq then we only get to the p->count
  reduction if hvm_send_ioreq() actually encountered an error (in which
  we don't care about the request getting split up).

Also commit 4faffc41d ("x86/hvm: limit reps to avoid the need to handle
retry") went too far in removing code from hvm_process_io_intercept():
When there were successfully handled iterations, the function should
continue to return success with a clipped repeat count.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Chang Jianzhong <changjzh@gmail.com>
---
v2: Make hvm_process_io_intercept() always return a valid X86EMUL_*
    code by moving the domain_crash() invocations up. Add a stdvga
    related paragraph to the commit message.
---
I assume this also addresses the issue which
http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg03189.html
attempted to deal with in a not really acceptable way.

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -95,7 +95,7 @@ static const struct hvm_io_handler null_
 };
 
 static int hvmemul_do_io(
-    bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size,
+    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
     struct vcpu *curr = current;
@@ -104,7 +104,7 @@ static int hvmemul_do_io(
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
         .addr = addr,
         .size = size,
-        .count = reps,
+        .count = *reps,
         .dir = dir,
         .df = df,
         .data = data,
@@ -136,7 +136,7 @@ static int hvmemul_do_io(
         if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||
              (p.addr != addr) ||
              (p.size != size) ||
-             (p.count != reps) ||
+             (p.count != *reps) ||
              (p.dir != dir) ||
              (p.df != df) ||
              (p.data_is_ptr != data_is_addr) )
@@ -214,7 +214,7 @@ static int hvmemul_do_io_buffer(
 
     BUG_ON(buffer == NULL);
 
-    rc = hvmemul_do_io(is_mmio, addr, *reps, size, dir, df, 0,
+    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
                        (uintptr_t)buffer);
     if ( rc == X86EMUL_UNHANDLEABLE && dir == IOREQ_READ )
         memset(buffer, 0xff, size);
@@ -305,13 +305,13 @@ static int hvmemul_do_io_addr(
         count = 1;
     }
 
-    rc = hvmemul_do_io(is_mmio, addr, count, size, dir, df, 1,
+    rc = hvmemul_do_io(is_mmio, addr, &count, size, dir, df, 1,
                        ram_gpa);
+
     if ( rc == X86EMUL_OKAY )
-    {
         v->arch.hvm_vcpu.hvm_io.mmio_retry = (count < *reps);
-        *reps = count;
-    }
+
+    *reps = count;
 
  out:
     while ( nr_pages )
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -148,8 +148,8 @@ int hvm_process_io_intercept(const struc
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
-                    rc = X86EMUL_UNHANDLEABLE;
-                    break;
+                    domain_crash(current->domain);
+                    return X86EMUL_UNHANDLEABLE;
                 }
                 if ( rc != X86EMUL_OKAY )
                     break;
@@ -178,8 +178,8 @@ int hvm_process_io_intercept(const struc
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
-                    rc = X86EMUL_UNHANDLEABLE;
-                    break;
+                    domain_crash(current->domain);
+                    return X86EMUL_UNHANDLEABLE;
                 }
                 if ( rc != X86EMUL_OKAY )
                     break;
@@ -196,8 +196,20 @@ int hvm_process_io_intercept(const struc
         }
     }
 
-    if ( i != 0 && rc == X86EMUL_UNHANDLEABLE )
-        domain_crash(current->domain);
+    if ( i )
+    {
+        p->count = i;
+        rc = X86EMUL_OKAY;
+    }
+    else if ( rc == X86EMUL_UNHANDLEABLE )
+    {
+        /*
+         * Don't forward entire batches to the device model: This would
+         * prevent the internal handlers to see subsequent iterations of
+         * the request.
+         */
+        p->count = 1;
+    }
 
     return rc;
 }

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -95,7 +95,7 @@ static const struct hvm_io_handler null_
 };
 
 static int hvmemul_do_io(
-    bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size,
+    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
     struct vcpu *curr = current;
@@ -104,7 +104,7 @@ static int hvmemul_do_io(
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
         .addr = addr,
         .size = size,
-        .count = reps,
+        .count = *reps,
         .dir = dir,
         .df = df,
         .data = data,
@@ -136,7 +136,7 @@ static int hvmemul_do_io(
         if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||
              (p.addr != addr) ||
              (p.size != size) ||
-             (p.count != reps) ||
+             (p.count != *reps) ||
              (p.dir != dir) ||
              (p.df != df) ||
              (p.data_is_ptr != data_is_addr) )
@@ -214,7 +214,7 @@ static int hvmemul_do_io_buffer(
 
     BUG_ON(buffer == NULL);
 
-    rc = hvmemul_do_io(is_mmio, addr, *reps, size, dir, df, 0,
+    rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
                        (uintptr_t)buffer);
     if ( rc == X86EMUL_UNHANDLEABLE && dir == IOREQ_READ )
         memset(buffer, 0xff, size);
@@ -305,13 +305,13 @@ static int hvmemul_do_io_addr(
         count = 1;
     }
 
-    rc = hvmemul_do_io(is_mmio, addr, count, size, dir, df, 1,
+    rc = hvmemul_do_io(is_mmio, addr, &count, size, dir, df, 1,
                        ram_gpa);
+
     if ( rc == X86EMUL_OKAY )
-    {
         v->arch.hvm_vcpu.hvm_io.mmio_retry = (count < *reps);
-        *reps = count;
-    }
+
+    *reps = count;
 
  out:
     while ( nr_pages )
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -148,8 +148,8 @@ int hvm_process_io_intercept(const struc
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
-                    rc = X86EMUL_UNHANDLEABLE;
-                    break;
+                    domain_crash(current->domain);
+                    return X86EMUL_UNHANDLEABLE;
                 }
                 if ( rc != X86EMUL_OKAY )
                     break;
@@ -178,8 +178,8 @@ int hvm_process_io_intercept(const struc
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
-                    rc = X86EMUL_UNHANDLEABLE;
-                    break;
+                    domain_crash(current->domain);
+                    return X86EMUL_UNHANDLEABLE;
                 }
                 if ( rc != X86EMUL_OKAY )
                     break;
@@ -196,8 +196,20 @@ int hvm_process_io_intercept(const struc
         }
     }
 
-    if ( i != 0 && rc == X86EMUL_UNHANDLEABLE )
-        domain_crash(current->domain);
+    if ( i )
+    {
+        p->count = i;
+        rc = X86EMUL_OKAY;
+    }
+    else if ( rc == X86EMUL_UNHANDLEABLE )
+    {
+        /*
+         * Don't forward entire batches to the device model: This would
+         * prevent the internal handlers to see subsequent iterations of
+         * the request.
+         */
+        p->count = 1;
+    }
 
     return rc;
 }