From patchwork Thu Oct 20 13:57:34 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 9386961 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 69E3860762 for ; Thu, 20 Oct 2016 14:26:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5BA9629CA7 for ; Thu, 20 Oct 2016 14:26:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 506A629CA9; Thu, 20 Oct 2016 14:26:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 946CA29CA7 for ; Thu, 20 Oct 2016 14:26:25 +0000 (UTC) Received: from localhost ([::1]:55151 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxEIa-0003wX-Oy for patchwork-qemu-devel@patchwork.kernel.org; Thu, 20 Oct 2016 10:26:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36651) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxDsL-0003Ww-Mh for qemu-devel@nongnu.org; Thu, 20 Oct 2016 09:59:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bxDsH-0000h4-Cu for qemu-devel@nongnu.org; Thu, 20 Oct 2016 09:59:17 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:55010) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1bxDsG-0000UA-8a; Thu, 20 Oct 2016 09:59:13 -0400 Received: from 172.24.1.47 (EHLO SZXEML429-HUB.china.huawei.com) ([172.24.1.47]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DPC67909; Thu, 20 Oct 2016 21:58:27 +0800 (CST) Received: from localhost (10.177.24.212) by SZXEML429-HUB.china.huawei.com (10.82.67.184) with Microsoft SMTP Server id 14.3.235.1; Thu, 20 Oct 2016 21:58:18 +0800 From: zhanghailiang To: , Date: Thu, 20 Oct 2016 21:57:34 +0800 Message-ID: <1476971860-20860-2-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.7.2.windows.1 In-Reply-To: <1476971860-20860-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1476971860-20860-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.24.212] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.65 Subject: [Qemu-devel] [PATCH RFC 1/7] docs/block-replication: Add description for shared-disk case X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, xiecl.fnst@cn.fujitsu.com, Zhang Chen , zhanghailiang , mreitz@redhat.com, stefanha@redhat.com, pbonzini@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP Introuduce the scenario of shared-disk block replication and how to use it. Signed-off-by: zhanghailiang Signed-off-by: Wen Congyang Signed-off-by: Zhang Chen --- docs/block-replication.txt | 131 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 127 insertions(+), 4 deletions(-) diff --git a/docs/block-replication.txt b/docs/block-replication.txt index 6bde673..97fcfc1 100644 --- a/docs/block-replication.txt +++ b/docs/block-replication.txt @@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation effort during a vmstate checkpoint, the disk modification operations of the Primary disk are asynchronously forwarded to the Secondary node. -== Workflow == +== Non-shared disk workflow == The following is the image of block replication workflow: +----------------------+ +------------------------+ @@ -57,7 +57,7 @@ The following is the image of block replication workflow: 4) Secondary write requests will be buffered in the Disk buffer and it will overwrite the existing sector content in the buffer. -== Architecture == +== None-shared disk architecture == We are going to implement block replication from many basic blocks that are already in QEMU. @@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through of the NBD server into the secondary disk. So before block replication, the primary disk and secondary disk should contain the same data. +== Shared Disk Mode Workflow == +The following is the image of block replication workflow: + + +----------------------+ +------------------------+ + |Primary Write Requests| |Secondary Write Requests| + +----------------------+ +------------------------+ + | | + | (4) + | V + | /-------------\ + | (2)Forward and write through | | + | +--------------------------> | Disk Buffer | + | | | | + | | \-------------/ + | |(1)read | + | | | + (3)write | | | backing file + V | | + +-----------------------------+ | + | Shared Disk | <-----+ + +-----------------------------+ + + 1) Primary writes will read original data and forward it to Secondary + QEMU. + 2) Before Primary write requests are written to Shared disk, the + original sector content will be read from Shared disk and + forwarded and buffered in the Disk buffer on the secondary site, + but it will not overwrite the existing + sector content(it could be from either "Secondary Write Requests" or + previous COW of "Primary Write Requests") in the Disk buffer. + 3) Primary write requests will be written to Shared disk. + 4) Secondary write requests will be buffered in the Disk buffer and it + will overwrite the existing sector content in the buffer. + +== Shared Disk Mode Architecture == +We are going to implement block replication from many basic +blocks that are already in QEMU. + virtio-blk || .---------- + / || | Secondary + / || '---------- + / || virtio-blk + / || | + | || replication(5) + | NBD --------> NBD (2) | + | client || server ---> hidden disk <-- active disk(4) + | ^ || | + | replication(1) || | + | | || | + | +-----------------' || | + (3) |drive-backup sync=none || | +--------. | +-----------------+ || | +Primary | | | || backing | +--------' | | || | + V | | + +-------------------------------------------+ | + | shared disk | <----------+ + +-------------------------------------------+ + + + 1) Primary writes will read original data and forward it to Secondary + QEMU. + 2) The hidden-disk buffers the original content that is modified by the + primary VM. It should also be an empty disk, and + the driver supports bdrv_make_empty() and backing file. + 3) Primary write requests will be written to Shared disk. + 4) Secondary write requests will be buffered in the active disk and it + will overwrite the existing sector content in the buffer. + == Failure Handling == There are 7 internal errors when block replication is running: 1. I/O error on primary disk @@ -145,7 +213,7 @@ d. replication_stop_all() things except failover. The caller must hold the I/O mutex lock if it is in migration/checkpoint thread. -== Usage == +== Non-shared disk usage == Primary: -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\ children.0.file.filename=1.raw,\ @@ -234,6 +302,61 @@ Secondary: The primary host is down, so we should do the following thing: { 'execute': 'nbd-server-stop' } +== Shared disk usage == +Primary: + -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw + +Issue qmp command: + {'execute': 'human-monitor-command', + 'arguments': { + 'command-line': 'drive_add-nbuddydriver=replication, + mode=primary, + file.driver=nbd, + file.host=9.42.3.17, + file.port=9998, + file.export=hidden_disk0, + shared-disk-id=primary_disk0, + shared-disk=on, + node-name=rep' + } + } + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\ + backing.driver=raw,backing.file.filename=1.raw \ + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\ + file.driver=qcow2,top-id=active-disk0,\ + file.file.filename=/mnt/ramfs/active_disk.img,\ + file.backing=hidden_disk0,shared-disk=on + +Issue qmp command: +1. {'execute': 'nbd-server-start', + 'arguments': { + 'addr': { + 'type': 'inet', + 'data': { + 'host': '0', + 'port': '9998' + } + } + } + } +2. { + 'execute': 'nbd-server-add', + 'arguments': { + 'device': 'hidden_disk0', + 'writable': true + } + } + +After Failover: +Primary: +{'execute': 'human-monitor-command', + 'arguments': { + 'command-line': 'drive_delrep' + } +} + +Secondary: + {'execute': 'nbd-server-stop' } + TODO: 1. Continuous block replication -2. Shared disk