From patchwork Wed Apr 12 14:05:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 9677555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6B99660325 for ; Wed, 12 Apr 2017 14:09:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D67F28680 for ; Wed, 12 Apr 2017 14:09:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 51DF328685; Wed, 12 Apr 2017 14:09:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8DF7028680 for ; Wed, 12 Apr 2017 14:09:32 +0000 (UTC) Received: from localhost ([::1]:44552 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cyIxf-0004kh-NY for patchwork-qemu-devel@patchwork.kernel.org; Wed, 12 Apr 2017 10:09:31 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36631) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cyIuf-00031I-RV for qemu-devel@nongnu.org; Wed, 12 Apr 2017 10:06:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cyIuX-00018B-PD for qemu-devel@nongnu.org; Wed, 12 Apr 2017 10:06:25 -0400 Received: from szxga03-in.huawei.com ([45.249.212.189]:3366 helo=dggrg03-dlp.huawei.com) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1cyIuK-0000tR-Mh; Wed, 12 Apr 2017 10:06:05 -0400 Received: from 172.30.72.55 (EHLO DGGEML403-HUB.china.huawei.com) ([172.30.72.55]) by dggrg03-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id ALP11468; Wed, 12 Apr 2017 22:05:50 +0800 (CST) Received: from localhost (10.177.24.212) by DGGEML403-HUB.china.huawei.com (10.3.17.33) with Microsoft SMTP Server id 14.3.301.0; Wed, 12 Apr 2017 22:05:39 +0800 From: zhanghailiang To: , Date: Wed, 12 Apr 2017 22:05:16 +0800 Message-ID: <1492005921-15664-2-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.7.2.windows.1 In-Reply-To: <1492005921-15664-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1492005921-15664-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.24.212] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.58EE3440.032D, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: e36426469222b5fe45615e6ca46f9aba X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 45.249.212.189 Subject: [Qemu-devel] [PATCH v4 1/6] docs/block-replication: Add description for shared-disk case X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, xiecl.fnst@cn.fujitsu.com, zhangchen.fnst@cn.fujitsu.com, Wen Congyang , qemu-block@nongnu.org, zhanghailiang Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP Introuduce the scenario of shared-disk block replication and how to use it. Reviewed-by: Changlong Xie Reviewed-by: Stefan Hajnoczi Signed-off-by: zhanghailiang Signed-off-by: Wen Congyang Signed-off-by: Zhang Chen --- docs/block-replication.txt | 139 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 135 insertions(+), 4 deletions(-) diff --git a/docs/block-replication.txt b/docs/block-replication.txt index 6bde673..fbfe005 100644 --- a/docs/block-replication.txt +++ b/docs/block-replication.txt @@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation effort during a vmstate checkpoint, the disk modification operations of the Primary disk are asynchronously forwarded to the Secondary node. -== Workflow == +== Non-shared disk workflow == The following is the image of block replication workflow: +----------------------+ +------------------------+ @@ -57,7 +57,7 @@ The following is the image of block replication workflow: 4) Secondary write requests will be buffered in the Disk buffer and it will overwrite the existing sector content in the buffer. -== Architecture == +== Non-shared disk architecture == We are going to implement block replication from many basic blocks that are already in QEMU. @@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through of the NBD server into the secondary disk. So before block replication, the primary disk and secondary disk should contain the same data. +== Shared Disk Mode Workflow == +The following is the image of block replication workflow: + + +----------------------+ +------------------------+ + |Primary Write Requests| |Secondary Write Requests| + +----------------------+ +------------------------+ + | | + | (4) + | V + | /-------------\ + | (2)Forward and write through | | + | +--------------------------> | Disk Buffer | + | | | | + | | \-------------/ + | |(1)read | + | | | + (3)write | | | backing file + V | | + +-----------------------------+ | + | Shared Disk | <-----+ + +-----------------------------+ + + 1) Primary writes will read original data and forward it to Secondary + QEMU. + 2) Before Primary write requests are written to Shared disk, the + original sector content will be read from Shared disk and + forwarded and buffered in the Disk buffer on the secondary site, + but it will not overwrite the existing sector content (it could be + from either "Secondary Write Requests" or previous COW of "Primary + Write Requests") in the Disk buffer. + 3) Primary write requests will be written to Shared disk. + 4) Secondary write requests will be buffered in the Disk buffer and it + will overwrite the existing sector content in the buffer. + +== Shared Disk Mode Architecture == +We are going to implement block replication from many basic +blocks that are already in QEMU. + virtio-blk || .---------- + / || | Secondary + / || '---------- + / || virtio-blk + / || | + | || replication(5) + | NBD --------> NBD (2) | + | client || server ---> hidden disk <-- active disk(4) + | ^ || | + | replication(1) || | + | | || | + | +-----------------' || | + (3) |drive-backup sync=none || | +--------. | +-----------------+ || | +Primary | | | || backing | +--------' | | || | + V | | + +-------------------------------------------+ | + | shared disk | <----------+ + +-------------------------------------------+ + + + 1) Primary writes will read original data and forward it to Secondary + QEMU. + 2) The hidden-disk buffers the original content that is modified by the + primary VM. It should also be an empty disk, and the driver supports + bdrv_make_empty() and backing file. + 3) Primary write requests will be written to Shared disk. + 4) Secondary write requests will be buffered in the active disk and it + will overwrite the existing sector content in the buffer. + == Failure Handling == There are 7 internal errors when block replication is running: 1. I/O error on primary disk @@ -145,7 +213,7 @@ d. replication_stop_all() things except failover. The caller must hold the I/O mutex lock if it is in migration/checkpoint thread. -== Usage == +== Non-shared disk usage == Primary: -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\ children.0.file.filename=1.raw,\ @@ -234,6 +302,69 @@ Secondary: The primary host is down, so we should do the following thing: { 'execute': 'nbd-server-stop' } +== Shared disk usage == +Primary: + -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw + +Issue qmp command: + { 'execute': 'blockdev-add', + 'arguments': { + 'driver': 'replication', + 'node-name': 'rep', + 'mode': 'primary', + 'shared-disk-id': 'primary_disk0', + 'shared-disk': true, + 'file': { + 'driver': 'nbd', + 'export': 'hidden_disk0', + 'server': { + 'type': 'inet', + 'data': { + 'host': 'xxx.xxx.xxx.xxx', + 'port': 'yyy' + } + } + } + } + } + +Secondary: + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\ + backing.driver=raw,backing.file.filename=1.raw \ + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\ + file.driver=qcow2,top-id=active-disk0,\ + file.file.filename=/mnt/ramfs/active_disk.img,\ + file.backing=hidden_disk0,shared-disk=on + +Issue qmp command: +1. { 'execute': 'nbd-server-start', + 'arguments': { + 'addr': { + 'type': 'inet', + 'data': { + 'host': '0', + 'port': 'yyy' + } + } + } + } +2. { 'execute': 'nbd-server-add', + 'arguments': { + 'device': 'hidden_disk0', + 'writable': true + } + } + +After Failover: +Primary: + { 'execute': 'x-blockdev-del', + 'arguments': { + 'node-name': 'rep' + } + } + +Secondary: + {'execute': 'nbd-server-stop' } + TODO: 1. Continuous block replication -2. Shared disk