From patchwork Fri Dec  9 16:13:53 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
X-Patchwork-Id: 9468539
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	C998C607D8 for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri,  9 Dec 2016 16:16:46 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BBDAB28650
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri,  9 Dec 2016 16:16:46 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id B040C28653; Fri,  9 Dec 2016 16:16:46 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	UNPARSEABLE_RELAY autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A2D7528650
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri,  9 Dec 2016 16:16:45 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1cFNoU-0007iA-GV; Fri, 09 Dec 2016 16:14:22 +0000
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <konrad.wilk@oracle.com>) id 1cFNoS-0007i3-Me
	for xen-devel@lists.xenproject.org; Fri, 09 Dec 2016 16:14:20 +0000
Received: from [85.158.139.211] by server-5.bemta-5.messagelabs.com id
	A5/53-02084-B58DA485; Fri, 09 Dec 2016 16:14:19 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrFIsWRWlGSWpSXmKPExsXSO6nOVTfihle
	Ewey3shbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8aJw5NZC74bV6z8dIe9gXGJZhcjF4eQwAQm
	iX/dR9ggnD+MEm+WdrNDOBsYJXY9W8UC4XQzShw+8ZK5i5ETyCmSmHfkNlMXIwcHi4CKxNF3z
	CAmm4CJxJtVjiAVIgLuEkePXGICsZkF5jFLnN1mBGILC7hJPPu9mg3E5hUwl7jY3cEEMf4Vo8
	TD1x0sEAlBiZMzn7BANGtJ3Pj3EmwVs4C0xPJ/HCBhTgEnid2tD9lBbFEBZYnF/T1gMyUEDCV
	OP9zGOIFRaBaSSbOQTJqFMGkBI/MqRo3i1KKy1CJdI2O9pKLM9IyS3MTMHF1DA1O93NTi4sT0
	1JzEpGK95PzcTYzAcK5nYGDcwbij3e8QoyQHk5IobzGTV4QQX1J+SmVGYnFGfFFpTmrxIUYZD
	g4lCd7C60A5waLU9NSKtMwcYGTBpCU4eJREeFVA0rzFBYm5xZnpEKlTjLoc054tfsokxJKXn5
	cqJc6bCFIkAFKUUZoHNwIW5ZcYZaWEeRkZGBiEeApSi3IzS1DlXzGKczAqCfNqg0zhycwrgds
	EDG+g+0V4591wBzmiJBEhJdXA6M728K7MK163GVLTbDJU1ZM25qmzyFzt2Db1/deJhXpPE590
	+coo3p92Ra8gz+TGu4Opt/3/sEwtC6nu4bDR2ty9iPHbCwUejx2T/q55WPbOy1bv8cYlPCeyP
	fU6y7aye1jsXxmueUPVUMZB92paz0/bOvl/tw4JSU+eI/1YIFPpcASvtsQDJZbijERDLeai4k
	QA1p3Z7+0CAAA=
X-Env-Sender: konrad.wilk@oracle.com
X-Msg-Ref: server-8.tower-206.messagelabs.com!1481300053!74501574!1
X-Originating-IP: [141.146.126.69]
X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor:
	VHJ1c3RlZCBJUDogMTQxLjE0Ni4xMjYuNjkgPT4gMjc3MjE4\n
X-StarScan-Received: 
X-StarScan-Version: 9.1.1; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 50667 invoked from network); 9 Dec 2016 16:14:14 -0000
Received: from aserp1040.oracle.com (HELO aserp1040.oracle.com)
	(141.146.126.69)
	by server-8.tower-206.messagelabs.com with DHE-RSA-AES256-GCM-SHA384
	encrypted SMTP; 9 Dec 2016 16:14:14 -0000
Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71])
	by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with
	ESMTP id uB9GDu7m018747
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK); Fri, 9 Dec 2016 16:13:57 GMT
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236])
	by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id
	uB9GDu1x004190
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK); Fri, 9 Dec 2016 16:13:56 GMT
Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11])
	by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id
	uB9GDtH7018694; Fri, 9 Dec 2016 16:13:55 GMT
Received: from char.us.oracle.com (/10.137.176.158)
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Fri, 09 Dec 2016 08:13:54 -0800
Received: by char.us.oracle.com (Postfix, from userid 1000)
	id 90C3A6A09DE; Fri,  9 Dec 2016 11:13:53 -0500 (EST)
Date: Fri, 9 Dec 2016 11:13:53 -0500
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>, Paul.Durrant@citrix.com
Message-ID: <20161209161353.GA11877@char.us.oracle.com>
References: <1480499272-11893-1-git-send-email-zhangchen.fnst@cn.fujitsu.com>
	<1480499272-11893-2-git-send-email-zhangchen.fnst@cn.fujitsu.com>
	<20161201131955.GA394@citrix.com>
	<09464095-e9f9-789a-5dc8-7d436d01d353@cn.fujitsu.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <09464095-e9f9-789a-5dc8-7d436d01d353@cn.fujitsu.com>
User-Agent: Mutt/1.6.2 (2016-07-01)
X-Source-IP: userv0021.oracle.com [156.151.31.71]
Cc: Changlong Xie <xiecl.fnst@cn.fujitsu.com>, Wei Liu <wei.liu2@citrix.com>,
	"eddie . dong" <eddie.dong@intel.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Wen Congyang <wencongyang@gmail.com>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	Yang Hongyang <imhy.yang@gmail.com>,
	Xen devel <xen-devel@lists.xenproject.org>
Subject: Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

.snip..
> > If you can be more specific about what is broken in COLO we might be
> > able to devise a fix for you.
> 
> My workmate have reported this BUG last year:
> https://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02850.html

Paul, Andrew was asking about:

	This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The migration code needs a way of being able to query whether a default ioreq server exists, without creating one.

	Can you remember what the justification for the read side effects were? ISTR that it was only for qemu compatibility until the ioreq server work got in upstream. If that was the case, can we drop the read side effects now and mandate that all qemus explicitly create their ioreq servers (even if this involves creating a default ioreq server for qemu-trad)?


?

Full thread below:

[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] question about migration

To: Wen Congyang <wency@xxxxxxxxxxxxxx>
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Tue, 29 Dec 2015 11:24:14 +0000
Cc: Paul Durrant <paul.durrant@xxxxxxxxxx>, xen devel <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Tue, 29 Dec 2015 11:24:33 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>
On 25/12/2015 01:45, Wen Congyang wrote:
On 12/24/2015 08:36 PM, Andrew Cooper wrote:
On 24/12/15 02:29, Wen Congyang wrote:
Hi Andrew Cooper:

I rebase the COLO codes to the newest upstream xen, and test it. I found
a problem in the test, and I can reproduce this problem via the migration.

How to reproduce:
1. xl cr -p hvm_nopv
2. xl migrate hvm_nopv 192.168.3.1
You are the very first person to try a usecase like this.

It works as much as it does because of your changes to the uncooperative HVM 
domain logic.  I have said repeatedly during review, this is not necessarily a 
safe change to make without an in-depth analysis of the knock-on effects; it 
looks as if you have found the first knock-on effect.

The migration successes, but the vm doesn't run in the target machine.
You can get the reason from 'xl dmesg':
(XEN) HVM2 restore: VMCE_VCPU 1
(XEN) HVM2 restore: TSC_ADJUST 0
(XEN) HVM2 restore: TSC_ADJUST 1
(d2) HVM Loader
(d2) Detected Xen v4.7-unstable
(d2) Get guest memory maps[128] failed. (-38)
(d2) *** HVMLoader bug at e820.c:39
(d2) *** HVMLoader crashed.

The reason is that:
We don't call xc_domain_set_memory_map() in the target machine.
When we create a hvm domain:
libxl__domain_build()
      libxl__build_hvm()
          libxl__arch_domain_construct_memmap()
              xc_domain_set_memory_map()

Should we migrate the guest memory from source machine to target machine?
This bug specifically is because HVMLoader is expected to have run and turned 
the hypercall information in an E820 table in the guest before a migration 
occurs.

Unfortunately, the current codebase is riddled with such assumption and 
expectations (e.g. the HVM save code assumed that FPU context is valid when it 
is saving register state) which is a direct side effect of how it was developed.


Having said all of the above, I agree that your example is a usecase which 
should work.  It is the ultimate test of whether the migration stream contains 
enough information to faithfully reproduce the domain on the far side.  Clearly 
at the moment, this is not the case.

I have an upcoming project to work on the domain memory layout logic, because 
it is unsuitable for a number of XenServer usecases. Part of that will require 
moving it in the migration stream.
I found another migration problem in the test:
If the migration fails, we will resume it in the source side.
But the hvm guest doesn't response any more.

In my test envirionment, the migration always successses, so I

"succeeds"

use a hack way to reproduce it:
1. modify the target xen tools:

3. xl migrate hvm_nopv 192.168.3.1

The reason it that:
We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN.
It means that: the problem occurs only when the migration fails after we get
the hvm param HVM_PARAM_IOREQ_PFN.

In the function hvm_select_ioreq_server()
If the I/O will be handed by non-default ioreq server, we will return the
non-default ioreq server. In this case, it is handed by qemu.
If the I/O will not be handed by non-default ioreq server, we will return
the default ioreq server. Before migration, we return NULL, and after migration
it is not NULL.
See the caller is hvmemul_do_io():
     case X86EMUL_UNHANDLEABLE:
     {
         struct hvm_ioreq_server *s =
             hvm_select_ioreq_server(curr->domain, &p);

         /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
         {
             rc = hvm_process_io_intercept(&null_handler, &p);
             vio->io_req.state = STATE_IOREQ_NONE;
         }
         else
         {
             rc = hvm_send_ioreq(s, &p, 0);
             if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
                 vio->io_req.state = STATE_IOREQ_NONE;
             else if ( data_is_addr )
                 rc = X86EMUL_OKAY;
         }
         break;

We send the I/O request to the default I/O request server, but no backing
DM hands it. We wil wait the I/O forever......

Hmm yes.  This needs fixing.

CC'ing Paul who did the ioreq server work.

This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The migration code needs a way of being able to query whether a default ioreq server exists, without creating one.

Can you remember what the justification for the read side effects were? ISTR that it was only for qemu compatibility until the ioreq server work got in upstream. If that was the case, can we drop the read side effects now and m
> 
> Can you give me a fix or a detailed suggestion for this bug?
> 
> 
> Thanks
> Zhang Chen
> 
> > >       default:
> > >           a.value = d->arch.hvm_domain.params[a.index];
> > >           break;
> > > -- 
> > > 2.7.4
> > > 
> > > 
> > > 
> > 
> > .
> > 
> 
> -- 
> Thanks
> zhangchen
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 258dec4..da95606 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void 
*dcs_void,
          goto err;
      }
+ rc = ERROR_FAIL;
+
   err:
      check_all_finished(egc, stream, rc);
2. xl cr hvm_nopv, and wait some time(You can login to the guest)