Message ID | 23a8a3d00523999e2b6f52074fa0f4c7f3f469ef.1570280098.git.lukasstraub2@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | colo: Add support for continuous replication | expand |
> -----Original Message----- > From: Lukas Straub <lukasstraub2@web.de> > Sent: Saturday, October 5, 2019 9:06 PM > To: qemu-devel <qemu-devel@nongnu.org> > Cc: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > <jasowang@redhat.com>; Wen Congyang <wencongyang2@huawei.com>; > Xie Changlong <xiechanglong.d@gmail.com>; Kevin Wolf > <kwolf@redhat.com>; Max Reitz <mreitz@redhat.com>; qemu-block > <qemu-block@nongnu.org> > Subject: [PATCH v6 4/4] colo: Update Documentation for continuous > replication > > Document the qemu command-line and qmp commands for continuous > replication > > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > --- > docs/COLO-FT.txt | 213 +++++++++++++++++++++++++++---------- > docs/block-replication.txt | 28 +++-- > 2 files changed, 174 insertions(+), 67 deletions(-) > > diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index > ad24680d13..bc1a0ccb99 100644 > --- a/docs/COLO-FT.txt > +++ b/docs/COLO-FT.txt > @@ -145,35 +145,65 @@ The diagram just shows the main qmp command, > you can get the detail in test procedure. > > == Test procedure == > -1. Startup qemu > -Primary: > -# qemu-system-x86_64 -accel kvm -m 2048 -smp 2 -qmp stdio -name > primary \ > - -device piix3-usb-uhci -vnc :7 \ > - -device usb-tablet -netdev tap,id=hn0,vhost=off \ > - -device virtio-net-pci,id=net-pci0,netdev=hn0 \ > - -drive if=virtio,id=primary-disk0,driver=quorum,read-pattern=fifo,vote- > threshold=1,\ > - children.0.file.filename=1.raw,\ > - children.0.driver=raw -S > -Secondary: > -# qemu-system-x86_64 -accel kvm -m 2048 -smp 2 -qmp stdio -name > secondary \ > - -device piix3-usb-uhci -vnc :7 \ > - -device usb-tablet -netdev tap,id=hn0,vhost=off \ > - -device virtio-net-pci,id=net-pci0,netdev=hn0 \ > - -drive if=none,id=secondary-disk0,file.filename=1.raw,driver=raw,node- > name=node0 \ > - -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\ > - file.driver=qcow2,top-id=active-disk0,\ > - file.file.filename=/mnt/ramfs/active_disk.img,\ > - file.backing.driver=qcow2,\ > - file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\ > - file.backing.backing=secondary-disk0 \ > - -incoming tcp:0:8888 > - > -2. On Secondary VM's QEMU monitor, issue command > +Note: Here we are running both instances on the same Host for testing, > +change the IP Addresses if you want to run it on two Hosts. Initally > +127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host. > + > +== Startup qemu == > +1. Primary: > +Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all Hosts. > +# imagefolder="/mnt/vms/colo-test-primary" > + > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp > 1 -qmp stdio \ > + -device piix3-usb-uhci -device usb-tablet -name primary \ > + -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > \ > + -device rtl8139,id=e0,netdev=hn0 \ > + -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server,nowait \ > + -chardev socket,id=compare1,host=0.0.0.0,port=9004,server,wait \ We should change the host=127.0.0.1 consistent with the expression below. > + -chardev socket,id=compare0,host=127.0.0.1,port=9001,server,nowait \ > + -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \ > + -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server,nowait > \ > + -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \ > + -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \ > + -object filter- > redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \ > + -object filter- > redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \ > + -object iothread,id=iothread1 \ > + -object > +colo-compare,id=comp0,primary_in=compare0- > 0,secondary_in=compare1,\ > +outdev=compare_out0,iothread=iothread1 \ > + -drive > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\ > +children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=q > +cow2 -S > + > +2. Secondary: > +# imagefolder="/mnt/vms/colo-test-secondary" > +# primary_ip=127.0.0.1 > + > +# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G > + > +# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G > + The active disk and hidden disk just need create one time, we can note that here. > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp > 1 -qmp stdio \ > + -device piix3-usb-uhci -device usb-tablet -name secondary \ > + -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > \ > + -device rtl8139,id=e0,netdev=hn0 \ > + -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect=1 \ > + -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect=1 \ > + -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \ > + -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \ > + -object filter-rewriter,id=rew0,netdev=hn0,queue=all \ > + -drive > if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow > 2 \ > + -drive > +if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2, > +\ > +top-id=childs0,file.file.filename=$imagefolder/secondary-active.qcow2,\ > +file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secon > +dary-hidden.qcow2,\ > +file.backing.backing=parent0 \ > + -drive > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\ > +children.0=childs0 \ > + -incoming tcp:0.0.0.0:9998 > + > + > +3. On Secondary VM's QEMU monitor, issue command > {'execute':'qmp_capabilities'} > -{ 'execute': 'nbd-server-start', > - 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': > '8889'} } } -} > -{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', > 'writable': true } } > +{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', > +'data': {'host': '0.0.0.0', 'port': '9999'} } } } > +{'execute': 'nbd-server-add', 'arguments': {'device': 'parent0', > +'writable': true } } > > Note: > a. The qmp command nbd-server-start and nbd-server-add must be run > @@ -182,44 +212,113 @@ Note: > same. > c. It is better to put active disk and hidden disk in ramdisk. > > -3. On Primary VM's QEMU monitor, issue command: > +4. On Primary VM's QEMU monitor, issue command: > {'execute':'qmp_capabilities'} > -{ 'execute': 'human-monitor-command', > - 'arguments': {'command-line': 'drive_add -n buddy > driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.p > ort=8889,file.export=secondary-disk0,node-name=nbd_client0'}} > -{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'primary-disk0', > 'node': 'nbd_client0' } } -{ 'execute': 'migrate-set-capabilities', > - 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } > -{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' } } > +{'execute': 'human-monitor-command', 'arguments': {'command-line': > +'drive_add -n buddy > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,fil > +e.port=9999,file.export=parent0,node-name=replication0'}} > +{'execute': 'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', > +'node': 'replication0' } } > +{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ > +{'capability': 'x-colo', 'state': true } ] } } > +{'execute': 'migrate', 'arguments': {'uri': 'tcp:127.0.0.2:9998' } } > > Note: > a. There should be only one NBD Client for each primary disk. > - b. xx.xx.xx.xx is the secondary physical machine's hostname or IP > - c. The qmp command line must be run after running qmp command line in > + b. The qmp command line must be run after running qmp command line in > secondary qemu. > > -4. After the above steps, you will see, whenever you make changes to PVM, > SVM will be synced. > +5. After the above steps, you will see, whenever you make changes to PVM, > SVM will be synced. > You can issue command '{ "execute": "migrate-set-parameters" , > "arguments":{ "x-checkpoint-delay": 2000 } }' > -to change the checkpoint period time > +to change the idle checkpoint period time > + > +6. Failover test > +You can kill one of the VMs and Failover on the surviving VM: > + > +If you killed the Secondary, then follow "Primary Failover". After > +that, if you want to resume the replication, follow "Primary resume > replication" > + > +If you killed the Primary, then follow "Secondary Failover". After > +that, if you want to resume the replication, follow "Secondary resume > replication" > + > +== Primary Failover == > +The Secondary died, resume on the Primary > + > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', > +'child': 'children.1'} } > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > +'drive_del replication0' } } > +{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } } > +{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } } > +{'execute': 'object-del', 'arguments':{ 'id': 'm0' } } > +{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } } > +{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } } > +{'execute': 'x-colo-lost-heartbeat' } > + > +== Secondary Failover == > +The Primary died, resume on the Secondary and prepare to become the > new > +Primary > + > +{'execute': 'nbd-server-stop'} > +{'execute': 'x-colo-lost-heartbeat'} > + > +{'execute': 'object-del', 'arguments':{ 'id': 'f2' } } > +{'execute': 'object-del', 'arguments':{ 'id': 'f1' } } > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red1' } } > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red0' } } > + > +{'execute': 'chardev-add', 'arguments':{ 'id': 'mirror0', 'backend': > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > +'0.0.0.0', 'port': '9003' } }, 'server': true } } } } Same like I said before. Others statement looks good for me. Thanks Zhang Chen > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare1', 'backend': > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > +'0.0.0.0', 'port': '9004' } }, 'server': true } } } } > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0', 'backend': > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > +'127.0.0.1', 'port': '9001' } }, 'server': true } } } } > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0-0', 'backend': > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > +'127.0.0.1', 'port': '9001' } }, 'server': false } } } } > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out', > +'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': true } } } > +} > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out0', > +'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': false } } > +} } > + > +== Primary resume replication == > +Resume replication after new Secondary is up. > + > +Start the new Secondary (Steps 2 and 3 above), then on the Primary: > +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', > +'job-id': 'resync', 'target': 'nbd://127.0.0.2:9999/parent0', 'mode': > +'existing', 'format': 'raw', 'sync': 'full'} } > + > +Wait until disk is synced, then: > +{'execute': 'stop'} > +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync'} } > + > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > +'drive_add -n buddy > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,fil > +e.port=9999,file.export=parent0,node-name=replication0'}} > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', > +'node': 'replication0' } } > + > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', > +'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': > +'mirror0' } } } > +{'execute': 'object-add', 'arguments':{ 'qom-type': > +'filter-redirector', 'id': 'redire0', 'props': { 'netdev': 'hn0', > +'queue': 'rx', 'indev': 'compare_out' } } } > +{'execute': 'object-add', 'arguments':{ 'qom-type': > +'filter-redirector', 'id': 'redire1', 'props': { 'netdev': 'hn0', > +'queue': 'rx', 'outdev': 'compare0' } } } > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > +'iothread1' } } > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } > + > +{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ > +{'capability': 'x-colo', 'state': true } ] } } > +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.2:9998' } } > + > +Note: > +If this Primary previously was a Secondary, then we need to insert the > +filters before the filter-rewriter by using the > +"'insert': 'before', 'position': 'id=rew0'" Options. See below. > + > +== Secondary resume replication == > +Become Primary and resume replication after new Secondary is up. Note > +that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary. > + > +Start the new Secondary (Steps 2 and 3 above, but with > +primary_ip=127.0.0.2), then on the old Secondary: > +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', > +'job-id': 'resync', 'target': 'nbd://127.0.0.1:9999/parent0', 'mode': > +'existing', 'format': 'raw', 'sync': 'full'} } > + > +Wait until disk is synced, then: > +{'execute': 'stop'} > +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync' } } > > -5. Failover test > -You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's - > monitor at the same time, then SVM will failover and client will not detect > this -change. > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > +'drive_add -n buddy > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,fil > +e.port=9999,file.export=parent0,node-name=replication0'}} > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', > +'node': 'replication0' } } > > -Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to > -issue block related command to stop block replication. > -Primary: > - Remove the nbd child from the quorum: > - { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': > 'children.1'}} > - { 'execute': 'human-monitor-command','arguments': {'command-line': > 'drive_del blk-buddy0'}} > - Note: there is no qmp command to remove the blockdev now > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', > +'id': 'm0', 'props': { 'insert': 'before', 'position': 'id=rew0', > +'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } > +{'execute': 'object-add', 'arguments':{ 'qom-type': > +'filter-redirector', 'id': 'redire0', 'props': { 'insert': 'before', > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': > +'compare_out' } } } > +{'execute': 'object-add', 'arguments':{ 'qom-type': > +'filter-redirector', 'id': 'redire1', 'props': { 'insert': 'before', > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': > +'compare0' } } } > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > +'iothread1' } } > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } > > -Secondary: > - The primary host is down, so we should do the following thing: > - { 'execute': 'nbd-server-stop' } > +{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ > +{'capability': 'x-colo', 'state': true } ] } } > +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.1:9998' } } > > == TODO == > -1. Support continuous VM replication. > -2. Support shared storage. > -3. Develop the heartbeat part. > -4. Reduce checkpoint VM’s downtime while doing checkpoint. > +1. Support shared storage. > +2. Develop the heartbeat part. > +3. Reduce checkpoint VM’s downtime while doing checkpoint. > diff --git a/docs/block-replication.txt b/docs/block-replication.txt index > 6bde6737fb..108e9166a8 100644 > --- a/docs/block-replication.txt > +++ b/docs/block-replication.txt > @@ -65,12 +65,12 @@ blocks that are already in QEMU. > ^ || .---------- > | || | Secondary > 1 Quorum || '---------- > - / \ || > - / \ || > - Primary 2 filter > - disk ^ virtio-blk > - | ^ > - 3 NBD -------> 3 NBD | > + / \ || virtio-blk > + / \ || ^ > + Primary 2 filter | > + disk ^ 7 Quorum > + | / > + 3 NBD -------> 3 NBD / > client || server 2 filter > || ^ ^ > --------. || | | > @@ -106,6 +106,10 @@ any state that would otherwise be lost by the > speculative write-through of the NBD server into the secondary disk. So > before block replication, the primary disk and secondary disk should contain > the same data. > > +7) The secondary also has a quorum node, so after secondary failover it > +can become the new primary and continue replication. > + > + > == Failure Handling == > There are 7 internal errors when block replication is running: > 1. I/O error on primary disk > @@ -171,16 +175,18 @@ Primary: > leading whitespace. > 5. The qmp command line must be run after running qmp command line in > secondary qemu. > - 6. After failover we need remove children.1 (replication driver). > + 6. After primary failover we need remove children.1 (replication driver). > > Secondary: > -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \ > - -drive if=xxx,id=topxxx,driver=replication,mode=secondary,top- > id=topxxx\ > + -drive > + if=none,id=childs1,driver=replication,mode=secondary,top-id=childs1 > file.file.filename=active_disk.qcow2,\ > file.driver=qcow2,\ > file.backing.file.filename=hidden_disk.qcow2,\ > file.backing.driver=qcow2,\ > file.backing.backing=colo1 > + -drive if=xxx,driver=quorum,read-pattern=fifo,id=top-disk1,\ > + vote-threshold=1,children.0=childs1 > > Then run qmp command in secondary qemu: > { 'execute': 'nbd-server-start', > @@ -234,6 +240,8 @@ Secondary: > The primary host is down, so we should do the following thing: > { 'execute': 'nbd-server-stop' } > > +Promote Secondary to Primary: > + see COLO-FT.txt > + > TODO: > -1. Continuous block replication > -2. Shared disk > +1. Shared disk > -- > 2.20.1
On Wed, 9 Oct 2019 08:36:52 +0000 "Zhang, Chen" <chen.zhang@intel.com> wrote: > > -----Original Message----- > > From: Lukas Straub <lukasstraub2@web.de> > > Sent: Saturday, October 5, 2019 9:06 PM > > To: qemu-devel <qemu-devel@nongnu.org> > > Cc: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > > <jasowang@redhat.com>; Wen Congyang <wencongyang2@huawei.com>; > > Xie Changlong <xiechanglong.d@gmail.com>; Kevin Wolf > > <kwolf@redhat.com>; Max Reitz <mreitz@redhat.com>; qemu-block > > <qemu-block@nongnu.org> > > Subject: [PATCH v6 4/4] colo: Update Documentation for continuous > > replication > > > > Document the qemu command-line and qmp commands for continuous > > replication > > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > > --- > > docs/COLO-FT.txt | 213 +++++++++++++++++++++++++++---------- > > docs/block-replication.txt | 28 +++-- > > 2 files changed, 174 insertions(+), 67 deletions(-) > > > > diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index > > ad24680d13..bc1a0ccb99 100644 > > --- a/docs/COLO-FT.txt > > +++ b/docs/COLO-FT.txt > > @@ -145,35 +145,65 @@ The diagram just shows the main qmp command, > > you can get the detail in test procedure. > > > > ... > > > > +Note: Here we are running both instances on the same Host for testing, > > +change the IP Addresses if you want to run it on two Hosts. Initally > > +127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host. > > + > > +== Startup qemu == > > +1. Primary: > > +Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all Hosts. > > +# imagefolder="/mnt/vms/colo-test-primary" > > + > > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp > > 1 -qmp stdio \ > > + -device piix3-usb-uhci -device usb-tablet -name primary \ > > + -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > > \ > > + -device rtl8139,id=e0,netdev=hn0 \ > > + -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server,nowait \ > > + -chardev socket,id=compare1,host=0.0.0.0,port=9004,server,wait \ > > We should change the host=127.0.0.1 consistent with the expression below. Hi, This (and the IPs below in the QMP commands) needs to be this way, because it's a listening port and with 127.0.0.1 it would only listen on the loopback ip and wouldn't be reachable from another node for example. With 0.0.0.0 it will listen on all Interfaces. > > + -chardev socket,id=compare0,host=127.0.0.1,port=9001,server,nowait \ > > + -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \ > > + -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server,nowait > > \ > > + -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \ > > + -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \ > > + -object filter- > > redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \ > > + -object filter- > > redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \ > > + -object iothread,id=iothread1 \ > > + -object > > +colo-compare,id=comp0,primary_in=compare0- > > 0,secondary_in=compare1,\ > > +outdev=compare_out0,iothread=iothread1 \ > > + -drive > > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\ > > +children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=q > > +cow2 -S > > + > > +2. Secondary: > > +# imagefolder="/mnt/vms/colo-test-secondary" > > +# primary_ip=127.0.0.1 > > + > > +# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G > > + > > +# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G > > + > > The active disk and hidden disk just need create one time, we can note that here. Ok, I will Note that. But I will wait until the block changes are reviewed before sending the next version. Regards, Lukas Straub > > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp > > 1 -qmp stdio \ > > + -device piix3-usb-uhci -device usb-tablet -name secondary \ > > + -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > > \ > > + -device rtl8139,id=e0,netdev=hn0 \ > > + -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect=1 \ > > + -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect=1 \ > > + -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \ > > + -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \ > > + -object filter-rewriter,id=rew0,netdev=hn0,queue=all \ > > + -drive > > if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow > > 2 \ > > + -drive > > +if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2, > > +\ > > +top-id=childs0,file.file.filename=$imagefolder/secondary-active.qcow2,\ > > +file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secon > > +dary-hidden.qcow2,\ > > +file.backing.backing=parent0 \ > > + -drive > > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\ > > +children.0=childs0 \ > > + -incoming tcp:0.0.0.0:9998 > > + > > + > > +3. On Secondary VM's QEMU monitor, issue command > > {'execute':'qmp_capabilities'} > > -{ 'execute': 'nbd-server-start', > > - 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': > > '8889'} } } -} > > -{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', > > 'writable': true } } > > +{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', > > +'data': {'host': '0.0.0.0', 'port': '9999'} } } } > > +{'execute': 'nbd-server-add', 'arguments': {'device': 'parent0', > > +'writable': true } } > > > > Note: > > a. The qmp command nbd-server-start and nbd-server-add must be run > > @@ -182,44 +212,113 @@ Note: > > same. > > c. It is better to put active disk and hidden disk in ramdisk. > > > > -3. On Primary VM's QEMU monitor, issue command: > > +4. On Primary VM's QEMU monitor, issue command: > > {'execute':'qmp_capabilities'} > > -{ 'execute': 'human-monitor-command', > > - 'arguments': {'command-line': 'drive_add -n buddy > > driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.p > > ort=8889,file.export=secondary-disk0,node-name=nbd_client0'}} > > -{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'primary-disk0', > > 'node': 'nbd_client0' } } -{ 'execute': 'migrate-set-capabilities', > > - 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } > > -{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' } } > > +{'execute': 'human-monitor-command', 'arguments': {'command-line': > > +'drive_add -n buddy > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,fil > > +e.port=9999,file.export=parent0,node-name=replication0'}} > > +{'execute': 'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', > > +'node': 'replication0' } } > > +{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ > > +{'capability': 'x-colo', 'state': true } ] } } > > +{'execute': 'migrate', 'arguments': {'uri': 'tcp:127.0.0.2:9998' } } > > > > Note: > > a. There should be only one NBD Client for each primary disk. > > - b. xx.xx.xx.xx is the secondary physical machine's hostname or IP > > - c. The qmp command line must be run after running qmp command line in > > + b. The qmp command line must be run after running qmp command line in > > secondary qemu. > > > > -4. After the above steps, you will see, whenever you make changes to PVM, > > SVM will be synced. > > +5. After the above steps, you will see, whenever you make changes to PVM, > > SVM will be synced. > > You can issue command '{ "execute": "migrate-set-parameters" , > > "arguments":{ "x-checkpoint-delay": 2000 } }' > > -to change the checkpoint period time > > +to change the idle checkpoint period time > > + > > +6. Failover test > > +You can kill one of the VMs and Failover on the surviving VM: > > + > > +If you killed the Secondary, then follow "Primary Failover". After > > +that, if you want to resume the replication, follow "Primary resume > > replication" > > + > > +If you killed the Primary, then follow "Secondary Failover". After > > +that, if you want to resume the replication, follow "Secondary resume > > replication" > > + > > +== Primary Failover == > > +The Secondary died, resume on the Primary > > + > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', > > +'child': 'children.1'} } > > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > > +'drive_del replication0' } } > > +{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } } > > +{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } } > > +{'execute': 'object-del', 'arguments':{ 'id': 'm0' } } > > +{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } } > > +{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } } > > +{'execute': 'x-colo-lost-heartbeat' } > > + > > +== Secondary Failover == > > +The Primary died, resume on the Secondary and prepare to become the > > new > > +Primary > > + > > +{'execute': 'nbd-server-stop'} > > +{'execute': 'x-colo-lost-heartbeat'} > > + > > +{'execute': 'object-del', 'arguments':{ 'id': 'f2' } } > > +{'execute': 'object-del', 'arguments':{ 'id': 'f1' } } > > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red1' } } > > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red0' } } > > + > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'mirror0', 'backend': > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > +'0.0.0.0', 'port': '9003' } }, 'server': true } } } } > > Same like I said before. > > Others statement looks good for me. > > Thanks > Zhang Chen > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare1', 'backend': > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > +'0.0.0.0', 'port': '9004' } }, 'server': true } } } } > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0', 'backend': > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > +'127.0.0.1', 'port': '9001' } }, 'server': true } } } } > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0-0', 'backend': > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > +'127.0.0.1', 'port': '9001' } }, 'server': false } } } } > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out', > > +'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', > > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': true } } } > > +} > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out0', > > +'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', > > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': false } } > > +} } > > + > > +== Primary resume replication == > > +Resume replication after new Secondary is up. > > + > > +Start the new Secondary (Steps 2 and 3 above), then on the Primary: > > +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', > > +'job-id': 'resync', 'target': 'nbd://127.0.0.2:9999/parent0', 'mode': > > +'existing', 'format': 'raw', 'sync': 'full'} } > > + > > +Wait until disk is synced, then: > > +{'execute': 'stop'} > > +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync'} } > > + > > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > > +'drive_add -n buddy > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,fil > > +e.port=9999,file.export=parent0,node-name=replication0'}} > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', > > +'node': 'replication0' } } > > + > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', > > +'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': > > +'mirror0' } } } > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > +'filter-redirector', 'id': 'redire0', 'props': { 'netdev': 'hn0', > > +'queue': 'rx', 'indev': 'compare_out' } } } > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > +'filter-redirector', 'id': 'redire1', 'props': { 'netdev': 'hn0', > > +'queue': 'rx', 'outdev': 'compare0' } } } > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > > +'iothread1' } } > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', > > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } > > + > > +{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ > > +{'capability': 'x-colo', 'state': true } ] } } > > +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.2:9998' } } > > + > > +Note: > > +If this Primary previously was a Secondary, then we need to insert the > > +filters before the filter-rewriter by using the > > +"'insert': 'before', 'position': 'id=rew0'" Options. See below. > > + > > +== Secondary resume replication == > > +Become Primary and resume replication after new Secondary is up. Note > > +that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary. > > + > > +Start the new Secondary (Steps 2 and 3 above, but with > > +primary_ip=127.0.0.2), then on the old Secondary: > > +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', > > +'job-id': 'resync', 'target': 'nbd://127.0.0.1:9999/parent0', 'mode': > > +'existing', 'format': 'raw', 'sync': 'full'} } > > + > > +Wait until disk is synced, then: > > +{'execute': 'stop'} > > +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync' } } > > > > -5. Failover test > > -You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's - > > monitor at the same time, then SVM will failover and client will not detect > > this -change. > > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > > +'drive_add -n buddy > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,fil > > +e.port=9999,file.export=parent0,node-name=replication0'}} > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', > > +'node': 'replication0' } } > > > > -Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to > > -issue block related command to stop block replication. > > -Primary: > > - Remove the nbd child from the quorum: > > - { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': > > 'children.1'}} > > - { 'execute': 'human-monitor-command','arguments': {'command-line': > > 'drive_del blk-buddy0'}} > > - Note: there is no qmp command to remove the blockdev now > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', > > +'id': 'm0', 'props': { 'insert': 'before', 'position': 'id=rew0', > > +'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > +'filter-redirector', 'id': 'redire0', 'props': { 'insert': 'before', > > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': > > +'compare_out' } } } > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > +'filter-redirector', 'id': 'redire1', 'props': { 'insert': 'before', > > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': > > +'compare0' } } } > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > > +'iothread1' } } > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', > > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } > > > > -Secondary: > > - The primary host is down, so we should do the following thing: > > - { 'execute': 'nbd-server-stop' } > > +{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ > > +{'capability': 'x-colo', 'state': true } ] } } > > +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.1:9998' } } > > > > == TODO == > > -1. Support continuous VM replication. > > -2. Support shared storage. > > -3. Develop the heartbeat part. > > -4. Reduce checkpoint VM’s downtime while doing checkpoint. > > +1. Support shared storage. > > +2. Develop the heartbeat part. > > +3. Reduce checkpoint VM’s downtime while doing checkpoint.
> -----Original Message----- > From: Lukas Straub <lukasstraub2@web.de> > Sent: Wednesday, October 9, 2019 11:17 PM > To: Zhang, Chen <chen.zhang@intel.com> > Cc: qemu-devel <qemu-devel@nongnu.org>; Jason Wang > <jasowang@redhat.com>; Wen Congyang <wencongyang2@huawei.com>; > Xie Changlong <xiechanglong.d@gmail.com>; Kevin Wolf > <kwolf@redhat.com>; Max Reitz <mreitz@redhat.com>; qemu-block > <qemu-block@nongnu.org> > Subject: Re: [PATCH v6 4/4] colo: Update Documentation for continuous > replication > > On Wed, 9 Oct 2019 08:36:52 +0000 > "Zhang, Chen" <chen.zhang@intel.com> wrote: > > > > -----Original Message----- > > > From: Lukas Straub <lukasstraub2@web.de> > > > Sent: Saturday, October 5, 2019 9:06 PM > > > To: qemu-devel <qemu-devel@nongnu.org> > > > Cc: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > > > <jasowang@redhat.com>; Wen Congyang > <wencongyang2@huawei.com>; Xie > > > Changlong <xiechanglong.d@gmail.com>; Kevin Wolf > <kwolf@redhat.com>; > > > Max Reitz <mreitz@redhat.com>; qemu-block <qemu- > block@nongnu.org> > > > Subject: [PATCH v6 4/4] colo: Update Documentation for continuous > > > replication > > > > > > Document the qemu command-line and qmp commands for continuous > > > replication > > > > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > > > --- > > > docs/COLO-FT.txt | 213 +++++++++++++++++++++++++++---------- > > > docs/block-replication.txt | 28 +++-- > > > 2 files changed, 174 insertions(+), 67 deletions(-) > > > > > > diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index > > > ad24680d13..bc1a0ccb99 100644 > > > --- a/docs/COLO-FT.txt > > > +++ b/docs/COLO-FT.txt > > > @@ -145,35 +145,65 @@ The diagram just shows the main qmp > command, > > > you can get the detail in test procedure. > > > > > > ... > > > > > > +Note: Here we are running both instances on the same Host for > > > +testing, change the IP Addresses if you want to run it on two > > > +Hosts. Initally > > > +127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host. > > > + > > > +== Startup qemu == > > > +1. Primary: > > > +Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all > Hosts. > > > +# imagefolder="/mnt/vms/colo-test-primary" > > > + > > > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 - > smp > > > 1 -qmp stdio \ > > > + -device piix3-usb-uhci -device usb-tablet -name primary \ > > > + -netdev > > > + tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > > > \ > > > + -device rtl8139,id=e0,netdev=hn0 \ > > > + -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server,nowait \ > > > + -chardev socket,id=compare1,host=0.0.0.0,port=9004,server,wait \ > > > > We should change the host=127.0.0.1 consistent with the expression below. > > Hi, > This (and the IPs below in the QMP commands) needs to be this way, > because it's a listening port and with 127.0.0.1 it would only listen on the > loopback ip and wouldn't be reachable from another node for example. With > 0.0.0.0 it will listen on all Interfaces. Yes, I know. For this command demo, maybe use 192.168.0.1/192.168.0.2 are more clear. > > > > + -chardev socket,id=compare0,host=127.0.0.1,port=9001,server,nowait > \ > > > + -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \ > > > + -chardev > > > + socket,id=compare_out,host=127.0.0.1,port=9005,server,nowait > > > \ > > > + -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \ > > > + -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \ > > > + -object filter- > > > redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \ > > > + -object filter- > > > redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \ > > > + -object iothread,id=iothread1 \ > > > + -object > > > +colo-compare,id=comp0,primary_in=compare0- > > > 0,secondary_in=compare1,\ > > > +outdev=compare_out0,iothread=iothread1 \ > > > + -drive > > > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold > > > +=1,\ > > > +children.0.file.filename=$imagefolder/primary.qcow2,children.0.driv > > > +er=q > > > +cow2 -S > > > + > > > +2. Secondary: > > > +# imagefolder="/mnt/vms/colo-test-secondary" > > > +# primary_ip=127.0.0.1 > > > + > > > +# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G > > > + > > > +# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 > 10G > > > + > > > > The active disk and hidden disk just need create one time, we can note that > here. > > Ok, I will Note that. But I will wait until the block changes are reviewed > before sending the next version. That's fine for me. Thanks Zhang Chen > > Regards, > Lukas Straub > > > > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 - > smp > > > 1 -qmp stdio \ > > > + -device piix3-usb-uhci -device usb-tablet -name secondary \ > > > + -netdev > > > + tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > > > \ > > > + -device rtl8139,id=e0,netdev=hn0 \ > > > + -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect=1 \ > > > + -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect=1 \ > > > + -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \ > > > + -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \ > > > + -object filter-rewriter,id=rew0,netdev=hn0,queue=all \ > > > + -drive > > > if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=q > > > cow > > > 2 \ > > > + -drive > > > +if=none,id=childs0,driver=replication,mode=secondary,file.driver=qc > > > +ow2, > > > +\ > > > +top-id=childs0,file.file.filename=$imagefolder/secondary-active.qco > > > +w2,\ > > > +file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/s > > > +econ > > > +dary-hidden.qcow2,\ > > > +file.backing.backing=parent0 \ > > > + -drive > > > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold > > > +=1,\ > > > +children.0=childs0 \ > > > + -incoming tcp:0.0.0.0:9998 > > > + > > > + > > > +3. On Secondary VM's QEMU monitor, issue command > > > {'execute':'qmp_capabilities'} > > > -{ 'execute': 'nbd-server-start', > > > - 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': > > > '8889'} } } -} > > > -{'execute': 'nbd-server-add', 'arguments': {'device': > > > 'secondary-disk0', > > > 'writable': true } } > > > +{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': > > > +'inet', > > > +'data': {'host': '0.0.0.0', 'port': '9999'} } } } > > > +{'execute': 'nbd-server-add', 'arguments': {'device': 'parent0', > > > +'writable': true } } > > > > > > Note: > > > a. The qmp command nbd-server-start and nbd-server-add must be > > > run @@ -182,44 +212,113 @@ Note: > > > same. > > > c. It is better to put active disk and hidden disk in ramdisk. > > > > > > -3. On Primary VM's QEMU monitor, issue command: > > > +4. On Primary VM's QEMU monitor, issue command: > > > {'execute':'qmp_capabilities'} > > > -{ 'execute': 'human-monitor-command', > > > - 'arguments': {'command-line': 'drive_add -n buddy > > > driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.x > > > x,file.p > > > ort=8889,file.export=secondary-disk0,node-name=nbd_client0'}} > > > -{ 'execute':'x-blockdev-change', 'arguments':{'parent': > > > 'primary-disk0', > > > 'node': 'nbd_client0' } } -{ 'execute': 'migrate-set-capabilities', > > > - 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } > > > -{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' > > > } } > > > +{'execute': 'human-monitor-command', 'arguments': {'command-line': > > > +'drive_add -n buddy > > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2 > > > +,fil e.port=9999,file.export=parent0,node-name=replication0'}} > > > +{'execute': 'x-blockdev-change', 'arguments':{'parent': > > > +'colo-disk0', > > > +'node': 'replication0' } } > > > +{'execute': 'migrate-set-capabilities', 'arguments': > > > +{'capabilities': [ > > > +{'capability': 'x-colo', 'state': true } ] } } > > > +{'execute': 'migrate', 'arguments': {'uri': 'tcp:127.0.0.2:9998' } > > > +} > > > > > > Note: > > > a. There should be only one NBD Client for each primary disk. > > > - b. xx.xx.xx.xx is the secondary physical machine's hostname or IP > > > - c. The qmp command line must be run after running qmp command > > > line in > > > + b. The qmp command line must be run after running qmp command > > > + line in > > > secondary qemu. > > > > > > -4. After the above steps, you will see, whenever you make changes > > > to PVM, SVM will be synced. > > > +5. After the above steps, you will see, whenever you make changes > > > +to PVM, > > > SVM will be synced. > > > You can issue command '{ "execute": "migrate-set-parameters" , > > > "arguments":{ "x-checkpoint-delay": 2000 } }' > > > -to change the checkpoint period time > > > +to change the idle checkpoint period time > > > + > > > +6. Failover test > > > +You can kill one of the VMs and Failover on the surviving VM: > > > + > > > +If you killed the Secondary, then follow "Primary Failover". After > > > +that, if you want to resume the replication, follow "Primary resume > > > replication" > > > + > > > +If you killed the Primary, then follow "Secondary Failover". After > > > +that, if you want to resume the replication, follow "Secondary > > > +resume > > > replication" > > > + > > > +== Primary Failover == > > > +The Secondary died, resume on the Primary > > > + > > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': > > > +'colo-disk0', > > > +'child': 'children.1'} } > > > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > > > +'drive_del replication0' } } > > > +{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } } > > > +{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } } > > > +{'execute': 'object-del', 'arguments':{ 'id': 'm0' } } > > > +{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } } > > > +{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } } > > > +{'execute': 'x-colo-lost-heartbeat' } > > > + > > > +== Secondary Failover == > > > +The Primary died, resume on the Secondary and prepare to become the > > > new > > > +Primary > > > + > > > +{'execute': 'nbd-server-stop'} > > > +{'execute': 'x-colo-lost-heartbeat'} > > > + > > > +{'execute': 'object-del', 'arguments':{ 'id': 'f2' } } > > > +{'execute': 'object-del', 'arguments':{ 'id': 'f1' } } > > > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red1' } } > > > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red0' } } > > > + > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'mirror0', 'backend': > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > +'0.0.0.0', 'port': '9003' } }, 'server': true } } } } > > > > Same like I said before. > > > > Others statement looks good for me. > > > > Thanks > > Zhang Chen > > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare1', 'backend': > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > +'0.0.0.0', 'port': '9004' } }, 'server': true } } } } > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0', 'backend': > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > +'127.0.0.1', 'port': '9001' } }, 'server': true } } } } > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0-0', 'backend': > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > +'127.0.0.1', 'port': '9001' } }, 'server': false } } } } > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out', > > > +'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', > > > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': true } > > > +} } } > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out0', > > > +'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', > > > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': false > > > +} } } } > > > + > > > +== Primary resume replication == > > > +Resume replication after new Secondary is up. > > > + > > > +Start the new Secondary (Steps 2 and 3 above), then on the Primary: > > > +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', > > > +'job-id': 'resync', 'target': 'nbd://127.0.0.2:9999/parent0', 'mode': > > > +'existing', 'format': 'raw', 'sync': 'full'} } > > > + > > > +Wait until disk is synced, then: > > > +{'execute': 'stop'} > > > +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync'} } > > > + > > > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > > > +'drive_add -n buddy > > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2 > > > +,fil e.port=9999,file.export=parent0,node-name=replication0'}} > > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': > > > +'colo-disk0', > > > +'node': 'replication0' } } > > > + > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > +'filter-mirror', > > > +'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': > > > +'mirror0' } } } > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > +'filter-redirector', 'id': 'redire0', 'props': { 'netdev': 'hn0', > > > +'queue': 'rx', 'indev': 'compare_out' } } } > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > +'filter-redirector', 'id': 'redire1', 'props': { 'netdev': 'hn0', > > > +'queue': 'rx', 'outdev': 'compare0' } } } > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > > > +'iothread1' } } > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', > > > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > > > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } > > > + > > > +{'execute': 'migrate-set-capabilities', 'arguments':{ > > > +'capabilities': [ > > > +{'capability': 'x-colo', 'state': true } ] } } > > > +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.2:9998' } > > > +} > > > + > > > +Note: > > > +If this Primary previously was a Secondary, then we need to insert > > > +the filters before the filter-rewriter by using the > > > +"'insert': 'before', 'position': 'id=rew0'" Options. See below. > > > + > > > +== Secondary resume replication == > > > +Become Primary and resume replication after new Secondary is up. > > > +Note that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary. > > > + > > > +Start the new Secondary (Steps 2 and 3 above, but with > > > +primary_ip=127.0.0.2), then on the old Secondary: > > > +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', > > > +'job-id': 'resync', 'target': 'nbd://127.0.0.1:9999/parent0', 'mode': > > > +'existing', 'format': 'raw', 'sync': 'full'} } > > > + > > > +Wait until disk is synced, then: > > > +{'execute': 'stop'} > > > +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync' } > > > +} > > > > > > -5. Failover test > > > -You can kill Primary VM and run 'x_colo_lost_heartbeat' in > > > Secondary VM's - monitor at the same time, then SVM will failover > > > and client will not detect this -change. > > > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > > > +'drive_add -n buddy > > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1 > > > +,fil e.port=9999,file.export=parent0,node-name=replication0'}} > > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': > > > +'colo-disk0', > > > +'node': 'replication0' } } > > > > > > -Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we > > > have to -issue block related command to stop block replication. > > > -Primary: > > > - Remove the nbd child from the quorum: > > > - { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', > 'child': > > > 'children.1'}} > > > - { 'execute': 'human-monitor-command','arguments': {'command-line': > > > 'drive_del blk-buddy0'}} > > > - Note: there is no qmp command to remove the blockdev now > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > +'filter-mirror', > > > +'id': 'm0', 'props': { 'insert': 'before', 'position': 'id=rew0', > > > +'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > +'filter-redirector', 'id': 'redire0', 'props': { 'insert': > > > +'before', > > > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': > > > +'compare_out' } } } > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > +'filter-redirector', 'id': 'redire1', 'props': { 'insert': > > > +'before', > > > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': > > > +'compare0' } } } > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > > > +'iothread1' } } > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', > > > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > > > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } > > > > > > -Secondary: > > > - The primary host is down, so we should do the following thing: > > > - { 'execute': 'nbd-server-stop' } > > > +{'execute': 'migrate-set-capabilities', 'arguments':{ > > > +'capabilities': [ > > > +{'capability': 'x-colo', 'state': true } ] } } > > > +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.1:9998' } > > > +} > > > > > > == TODO == > > > -1. Support continuous VM replication. > > > -2. Support shared storage. > > > -3. Develop the heartbeat part. > > > -4. Reduce checkpoint VM’s downtime while doing checkpoint. > > > +1. Support shared storage. > > > +2. Develop the heartbeat part. > > > +3. Reduce checkpoint VM’s downtime while doing checkpoint.
On Thu, 10 Oct 2019 10:34:15 +0000 "Zhang, Chen" <chen.zhang@intel.com> wrote: > > -----Original Message----- > > From: Lukas Straub <lukasstraub2@web.de> > > Sent: Wednesday, October 9, 2019 11:17 PM > > To: Zhang, Chen <chen.zhang@intel.com> > > Cc: qemu-devel <qemu-devel@nongnu.org>; Jason Wang > > <jasowang@redhat.com>; Wen Congyang <wencongyang2@huawei.com>; > > Xie Changlong <xiechanglong.d@gmail.com>; Kevin Wolf > > <kwolf@redhat.com>; Max Reitz <mreitz@redhat.com>; qemu-block > > <qemu-block@nongnu.org> > > Subject: Re: [PATCH v6 4/4] colo: Update Documentation for continuous > > replication > > > > On Wed, 9 Oct 2019 08:36:52 +0000 > > "Zhang, Chen" <chen.zhang@intel.com> wrote: > > > > > > -----Original Message----- > > > > From: Lukas Straub <lukasstraub2@web.de> > > > > Sent: Saturday, October 5, 2019 9:06 PM > > > > To: qemu-devel <qemu-devel@nongnu.org> > > > > Cc: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > > > > <jasowang@redhat.com>; Wen Congyang > > <wencongyang2@huawei.com>; Xie > > > > Changlong <xiechanglong.d@gmail.com>; Kevin Wolf > > <kwolf@redhat.com>; > > > > Max Reitz <mreitz@redhat.com>; qemu-block <qemu- > > block@nongnu.org> > > > > Subject: [PATCH v6 4/4] colo: Update Documentation for continuous > > > > replication > > > > > > > > Document the qemu command-line and qmp commands for continuous > > > > replication > > > > > > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > > > > --- > > > > docs/COLO-FT.txt | 213 +++++++++++++++++++++++++++---------- > > > > docs/block-replication.txt | 28 +++-- > > > > 2 files changed, 174 insertions(+), 67 deletions(-) > > > > > > > > diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index > > > > ad24680d13..bc1a0ccb99 100644 > > > > --- a/docs/COLO-FT.txt > > > > +++ b/docs/COLO-FT.txt > > > > @@ -145,35 +145,65 @@ The diagram just shows the main qmp > > command, > > > > you can get the detail in test procedure. > > > > > > > > ... > > > > > > > > +Note: Here we are running both instances on the same Host for > > > > +testing, change the IP Addresses if you want to run it on two > > > > +Hosts. Initally > > > > +127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host. > > > > + > > > > +== Startup qemu == > > > > +1. Primary: > > > > +Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all > > Hosts. > > > > +# imagefolder="/mnt/vms/colo-test-primary" > > > > + > > > > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 - > > smp > > > > 1 -qmp stdio \ > > > > + -device piix3-usb-uhci -device usb-tablet -name primary \ > > > > + -netdev > > > > + tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > > > > \ > > > > + -device rtl8139,id=e0,netdev=hn0 \ > > > > + -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server,nowait \ > > > > + -chardev socket,id=compare1,host=0.0.0.0,port=9004,server,wait \ > > > > > > We should change the host=127.0.0.1 consistent with the expression below. > > > > Hi, > > This (and the IPs below in the QMP commands) needs to be this way, > > because it's a listening port and with 127.0.0.1 it would only listen on the > > loopback ip and wouldn't be reachable from another node for example. With > > 0.0.0.0 it will listen on all Interfaces. > > Yes, I know. For this command demo, maybe use 192.168.0.1/192.168.0.2 are more clear. Hmm, the compare0 and compare_out actually can be replaced by unix sockets. So what do you think about the following? -chardev socket,id=mirror0,host=127.0.0.1,port=9003,server,nowait \ -chardev socket,id=compare1,host=127.0.0.1,port=9004,server,wait \ -chardev socket,id=compare0,path=/tmp/compare0.sock,server,nowait \ -chardev socket,id=compare0-0,path=/tmp/compare0.sock \ -chardev socket,id=compare_out,path=/tmp/compare_out.sock,server,nowait \ -chardev socket,id=compare_out0,path=/tmp/compare_out.sock \ > > > > > > + -chardev socket,id=compare0,host=127.0.0.1,port=9001,server,nowait > > \ > > > > + -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \ > > > > + -chardev > > > > + socket,id=compare_out,host=127.0.0.1,port=9005,server,nowait > > > > \ > > > > + -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \ > > > > + -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \ > > > > + -object filter- > > > > redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \ > > > > + -object filter- > > > > redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \ > > > > + -object iothread,id=iothread1 \ > > > > + -object > > > > +colo-compare,id=comp0,primary_in=compare0- > > > > 0,secondary_in=compare1,\ > > > > +outdev=compare_out0,iothread=iothread1 \ > > > > + -drive > > > > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold > > > > +=1,\ > > > > +children.0.file.filename=$imagefolder/primary.qcow2,children.0.driv > > > > +er=q > > > > +cow2 -S > > > > + > > > > +2. Secondary: > > > > +# imagefolder="/mnt/vms/colo-test-secondary" > > > > +# primary_ip=127.0.0.1 > > > > + > > > > +# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G > > > > + > > > > +# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 > > 10G > > > > + > > > > > > The active disk and hidden disk just need create one time, we can note that > > here. > > > > Ok, I will Note that. But I will wait until the block changes are reviewed > > before sending the next version. > > That's fine for me. > > Thanks > Zhang Chen > > > > > Regards, > > Lukas Straub > > > > > > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 - > > smp > > > > 1 -qmp stdio \ > > > > + -device piix3-usb-uhci -device usb-tablet -name secondary \ > > > > + -netdev > > > > + tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > > > > \ > > > > + -device rtl8139,id=e0,netdev=hn0 \ > > > > + -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect=1 \ > > > > + -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect=1 \ > > > > + -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \ > > > > + -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \ > > > > + -object filter-rewriter,id=rew0,netdev=hn0,queue=all \ > > > > + -drive > > > > if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=q > > > > cow > > > > 2 \ > > > > + -drive > > > > +if=none,id=childs0,driver=replication,mode=secondary,file.driver=qc > > > > +ow2, > > > > +\ > > > > +top-id=childs0,file.file.filename=$imagefolder/secondary-active.qco > > > > +w2,\ > > > > +file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/s > > > > +econ > > > > +dary-hidden.qcow2,\ > > > > +file.backing.backing=parent0 \ > > > > + -drive > > > > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold > > > > +=1,\ > > > > +children.0=childs0 \ > > > > + -incoming tcp:0.0.0.0:9998 > > > > + > > > > + > > > > +3. On Secondary VM's QEMU monitor, issue command > > > > {'execute':'qmp_capabilities'} > > > > -{ 'execute': 'nbd-server-start', > > > > - 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': > > > > '8889'} } } -} > > > > -{'execute': 'nbd-server-add', 'arguments': {'device': > > > > 'secondary-disk0', > > > > 'writable': true } } > > > > +{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': > > > > +'inet', > > > > +'data': {'host': '0.0.0.0', 'port': '9999'} } } } > > > > +{'execute': 'nbd-server-add', 'arguments': {'device': 'parent0', > > > > +'writable': true } } > > > > > > > > Note: > > > > a. The qmp command nbd-server-start and nbd-server-add must be > > > > run @@ -182,44 +212,113 @@ Note: > > > > same. > > > > c. It is better to put active disk and hidden disk in ramdisk. > > > > > > > > -3. On Primary VM's QEMU monitor, issue command: > > > > +4. On Primary VM's QEMU monitor, issue command: > > > > {'execute':'qmp_capabilities'} > > > > -{ 'execute': 'human-monitor-command', > > > > - 'arguments': {'command-line': 'drive_add -n buddy > > > > driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.x > > > > x,file.p > > > > ort=8889,file.export=secondary-disk0,node-name=nbd_client0'}} > > > > -{ 'execute':'x-blockdev-change', 'arguments':{'parent': > > > > 'primary-disk0', > > > > 'node': 'nbd_client0' } } -{ 'execute': 'migrate-set-capabilities', > > > > - 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } > > > > -{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' > > > > } } > > > > +{'execute': 'human-monitor-command', 'arguments': {'command-line': > > > > +'drive_add -n buddy > > > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2 > > > > +,fil e.port=9999,file.export=parent0,node-name=replication0'}} > > > > +{'execute': 'x-blockdev-change', 'arguments':{'parent': > > > > +'colo-disk0', > > > > +'node': 'replication0' } } > > > > +{'execute': 'migrate-set-capabilities', 'arguments': > > > > +{'capabilities': [ > > > > +{'capability': 'x-colo', 'state': true } ] } } > > > > +{'execute': 'migrate', 'arguments': {'uri': 'tcp:127.0.0.2:9998' } > > > > +} > > > > > > > > Note: > > > > a. There should be only one NBD Client for each primary disk. > > > > - b. xx.xx.xx.xx is the secondary physical machine's hostname or IP > > > > - c. The qmp command line must be run after running qmp command > > > > line in > > > > + b. The qmp command line must be run after running qmp command > > > > + line in > > > > secondary qemu. > > > > > > > > -4. After the above steps, you will see, whenever you make changes > > > > to PVM, SVM will be synced. > > > > +5. After the above steps, you will see, whenever you make changes > > > > +to PVM, > > > > SVM will be synced. > > > > You can issue command '{ "execute": "migrate-set-parameters" , > > > > "arguments":{ "x-checkpoint-delay": 2000 } }' > > > > -to change the checkpoint period time > > > > +to change the idle checkpoint period time > > > > + > > > > +6. Failover test > > > > +You can kill one of the VMs and Failover on the surviving VM: > > > > + > > > > +If you killed the Secondary, then follow "Primary Failover". After > > > > +that, if you want to resume the replication, follow "Primary resume > > > > replication" > > > > + > > > > +If you killed the Primary, then follow "Secondary Failover". After > > > > +that, if you want to resume the replication, follow "Secondary > > > > +resume > > > > replication" > > > > + > > > > +== Primary Failover == > > > > +The Secondary died, resume on the Primary > > > > + > > > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': > > > > +'colo-disk0', > > > > +'child': 'children.1'} } > > > > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > > > > +'drive_del replication0' } } > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } } > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } } > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'm0' } } > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } } > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } } > > > > +{'execute': 'x-colo-lost-heartbeat' } > > > > + > > > > +== Secondary Failover == > > > > +The Primary died, resume on the Secondary and prepare to become the > > > > new > > > > +Primary > > > > + > > > > +{'execute': 'nbd-server-stop'} > > > > +{'execute': 'x-colo-lost-heartbeat'} > > > > + > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'f2' } } > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'f1' } } > > > > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red1' } } > > > > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red0' } } > > > > + > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'mirror0', 'backend': > > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > > +'0.0.0.0', 'port': '9003' } }, 'server': true } } } } > > > > > > Same like I said before. > > > > > > Others statement looks good for me. > > > > > > Thanks > > > Zhang Chen > > > > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare1', 'backend': > > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > > +'0.0.0.0', 'port': '9004' } }, 'server': true } } } } > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0', 'backend': > > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > > +'127.0.0.1', 'port': '9001' } }, 'server': true } } } } > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0-0', 'backend': > > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > > +'127.0.0.1', 'port': '9001' } }, 'server': false } } } } > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out', > > > > +'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', > > > > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': true } > > > > +} } } > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out0', > > > > +'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', > > > > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': false > > > > +} } } } > > > > + > > > > +== Primary resume replication == > > > > +Resume replication after new Secondary is up. > > > > + > > > > +Start the new Secondary (Steps 2 and 3 above), then on the Primary: > > > > +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', > > > > +'job-id': 'resync', 'target': 'nbd://127.0.0.2:9999/parent0', 'mode': > > > > +'existing', 'format': 'raw', 'sync': 'full'} } > > > > + > > > > +Wait until disk is synced, then: > > > > +{'execute': 'stop'} > > > > +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync'} } > > > > + > > > > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > > > > +'drive_add -n buddy > > > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2 > > > > +,fil e.port=9999,file.export=parent0,node-name=replication0'}} > > > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': > > > > +'colo-disk0', > > > > +'node': 'replication0' } } > > > > + > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > +'filter-mirror', > > > > +'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': > > > > +'mirror0' } } } > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > +'filter-redirector', 'id': 'redire0', 'props': { 'netdev': 'hn0', > > > > +'queue': 'rx', 'indev': 'compare_out' } } } > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > +'filter-redirector', 'id': 'redire1', 'props': { 'netdev': 'hn0', > > > > +'queue': 'rx', 'outdev': 'compare0' } } } > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > > > > +'iothread1' } } > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', > > > > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > > > > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } > > > > + > > > > +{'execute': 'migrate-set-capabilities', 'arguments':{ > > > > +'capabilities': [ > > > > +{'capability': 'x-colo', 'state': true } ] } } > > > > +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.2:9998' } > > > > +} > > > > + > > > > +Note: > > > > +If this Primary previously was a Secondary, then we need to insert > > > > +the filters before the filter-rewriter by using the > > > > +"'insert': 'before', 'position': 'id=rew0'" Options. See below. > > > > + > > > > +== Secondary resume replication == > > > > +Become Primary and resume replication after new Secondary is up. > > > > +Note that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary. > > > > + > > > > +Start the new Secondary (Steps 2 and 3 above, but with > > > > +primary_ip=127.0.0.2), then on the old Secondary: > > > > +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', > > > > +'job-id': 'resync', 'target': 'nbd://127.0.0.1:9999/parent0', 'mode': > > > > +'existing', 'format': 'raw', 'sync': 'full'} } > > > > + > > > > +Wait until disk is synced, then: > > > > +{'execute': 'stop'} > > > > +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync' } > > > > +} > > > > > > > > -5. Failover test > > > > -You can kill Primary VM and run 'x_colo_lost_heartbeat' in > > > > Secondary VM's - monitor at the same time, then SVM will failover > > > > and client will not detect this -change. > > > > +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': > > > > +'drive_add -n buddy > > > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1 > > > > +,fil e.port=9999,file.export=parent0,node-name=replication0'}} > > > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': > > > > +'colo-disk0', > > > > +'node': 'replication0' } } > > > > > > > > -Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we > > > > have to -issue block related command to stop block replication. > > > > -Primary: > > > > - Remove the nbd child from the quorum: > > > > - { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', > > 'child': > > > > 'children.1'}} > > > > - { 'execute': 'human-monitor-command','arguments': {'command-line': > > > > 'drive_del blk-buddy0'}} > > > > - Note: there is no qmp command to remove the blockdev now > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > +'filter-mirror', > > > > +'id': 'm0', 'props': { 'insert': 'before', 'position': 'id=rew0', > > > > +'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > +'filter-redirector', 'id': 'redire0', 'props': { 'insert': > > > > +'before', > > > > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': > > > > +'compare_out' } } } > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > +'filter-redirector', 'id': 'redire1', 'props': { 'insert': > > > > +'before', > > > > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': > > > > +'compare0' } } } > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > > > > +'iothread1' } } > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', > > > > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > > > > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } > > > > > > > > -Secondary: > > > > - The primary host is down, so we should do the following thing: > > > > - { 'execute': 'nbd-server-stop' } > > > > +{'execute': 'migrate-set-capabilities', 'arguments':{ > > > > +'capabilities': [ > > > > +{'capability': 'x-colo', 'state': true } ] } } > > > > +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.1:9998' } > > > > +} > > > > > > > > == TODO == > > > > -1. Support continuous VM replication. > > > > -2. Support shared storage. > > > > -3. Develop the heartbeat part. > > > > -4. Reduce checkpoint VM’s downtime while doing checkpoint. > > > > +1. Support shared storage. > > > > +2. Develop the heartbeat part. > > > > +3. Reduce checkpoint VM’s downtime while doing checkpoint.
> -----Original Message----- > From: Lukas Straub <lukasstraub2@web.de> > Sent: Saturday, October 12, 2019 12:01 AM > To: Zhang, Chen <chen.zhang@intel.com> > Cc: qemu-devel <qemu-devel@nongnu.org>; Jason Wang > <jasowang@redhat.com>; Wen Congyang <wencongyang2@huawei.com>; > Xie Changlong <xiechanglong.d@gmail.com>; Kevin Wolf > <kwolf@redhat.com>; Max Reitz <mreitz@redhat.com>; qemu-block > <qemu-block@nongnu.org> > Subject: Re: [PATCH v6 4/4] colo: Update Documentation for continuous > replication > > On Thu, 10 Oct 2019 10:34:15 +0000 > "Zhang, Chen" <chen.zhang@intel.com> wrote: > > > > -----Original Message----- > > > From: Lukas Straub <lukasstraub2@web.de> > > > Sent: Wednesday, October 9, 2019 11:17 PM > > > To: Zhang, Chen <chen.zhang@intel.com> > > > Cc: qemu-devel <qemu-devel@nongnu.org>; Jason Wang > > > <jasowang@redhat.com>; Wen Congyang > <wencongyang2@huawei.com>; Xie > > > Changlong <xiechanglong.d@gmail.com>; Kevin Wolf > <kwolf@redhat.com>; > > > Max Reitz <mreitz@redhat.com>; qemu-block <qemu- > block@nongnu.org> > > > Subject: Re: [PATCH v6 4/4] colo: Update Documentation for > > > continuous replication > > > > > > On Wed, 9 Oct 2019 08:36:52 +0000 > > > "Zhang, Chen" <chen.zhang@intel.com> wrote: > > > > > > > > -----Original Message----- > > > > > From: Lukas Straub <lukasstraub2@web.de> > > > > > Sent: Saturday, October 5, 2019 9:06 PM > > > > > To: qemu-devel <qemu-devel@nongnu.org> > > > > > Cc: Zhang, Chen <chen.zhang@intel.com>; Jason Wang > > > > > <jasowang@redhat.com>; Wen Congyang > > > <wencongyang2@huawei.com>; Xie > > > > > Changlong <xiechanglong.d@gmail.com>; Kevin Wolf > > > <kwolf@redhat.com>; > > > > > Max Reitz <mreitz@redhat.com>; qemu-block <qemu- > > > block@nongnu.org> > > > > > Subject: [PATCH v6 4/4] colo: Update Documentation for > > > > > continuous replication > > > > > > > > > > Document the qemu command-line and qmp commands for > continuous > > > > > replication > > > > > > > > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > > > > > --- > > > > > docs/COLO-FT.txt | 213 +++++++++++++++++++++++++++------ > ---- > > > > > docs/block-replication.txt | 28 +++-- > > > > > 2 files changed, 174 insertions(+), 67 deletions(-) > > > > > > > > > > diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index > > > > > ad24680d13..bc1a0ccb99 100644 > > > > > --- a/docs/COLO-FT.txt > > > > > +++ b/docs/COLO-FT.txt > > > > > @@ -145,35 +145,65 @@ The diagram just shows the main qmp > > > command, > > > > > you can get the detail in test procedure. > > > > > > > > > > ... > > > > > > > > > > +Note: Here we are running both instances on the same Host for > > > > > +testing, change the IP Addresses if you want to run it on two > > > > > +Hosts. Initally > > > > > +127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host. > > > > > + > > > > > +== Startup qemu == > > > > > +1. Primary: > > > > > +Note: Initally, $imagefolder/primary.qcow2 needs to be copied > > > > > +to all > > > Hosts. > > > > > +# imagefolder="/mnt/vms/colo-test-primary" > > > > > + > > > > > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m > 512 - > > > smp > > > > > 1 -qmp stdio \ > > > > > + -device piix3-usb-uhci -device usb-tablet -name primary \ > > > > > + -netdev > > > > > + tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > > > > > \ > > > > > + -device rtl8139,id=e0,netdev=hn0 \ > > > > > + -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server,nowait \ > > > > > + -chardev > > > > > + socket,id=compare1,host=0.0.0.0,port=9004,server,wait \ > > > > > > > > We should change the host=127.0.0.1 consistent with the expression > below. > > > > > > Hi, > > > This (and the IPs below in the QMP commands) needs to be this way, > > > because it's a listening port and with 127.0.0.1 it would only > > > listen on the loopback ip and wouldn't be reachable from another > > > node for example. With > > > 0.0.0.0 it will listen on all Interfaces. > > > > Yes, I know. For this command demo, maybe use 192.168.0.1/192.168.0.2 > are more clear. > > Hmm, > the compare0 and compare_out actually can be replaced by unix sockets. > So what do you think about the following? > > -chardev socket,id=mirror0,host=127.0.0.1,port=9003,server,nowait \ > -chardev socket,id=compare1,host=127.0.0.1,port=9004,server,wait \ > -chardev socket,id=compare0,path=/tmp/compare0.sock,server,nowait \ > -chardev socket,id=compare0-0,path=/tmp/compare0.sock \ > -chardev > socket,id=compare_out,path=/tmp/compare_out.sock,server,nowait \ > -chardev socket,id=compare_out0,path=/tmp/compare_out.sock \ In this way, user must create the unix socket node before start COLO, it is very hard to use. I re-considered the issue here, looks keep the 0.0.0.0 is a relatively reasonably choice. We can add some note here like you said before, to make sure user know the means of 0.0.0.0. Thanks Zhang Chen > > > > > > > > > + -chardev > > > > > + socket,id=compare0,host=127.0.0.1,port=9001,server,nowait > > > \ > > > > > + -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \ > > > > > + -chardev > > > > > + socket,id=compare_out,host=127.0.0.1,port=9005,server,nowait > > > > > \ > > > > > + -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \ > > > > > + -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 > \ > > > > > + -object filter- > > > > > redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \ > > > > > + -object filter- > > > > > redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \ > > > > > + -object iothread,id=iothread1 \ > > > > > + -object > > > > > +colo-compare,id=comp0,primary_in=compare0- > > > > > 0,secondary_in=compare1,\ > > > > > +outdev=compare_out0,iothread=iothread1 \ > > > > > + -drive > > > > > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-thres > > > > > +hold > > > > > +=1,\ > > > > > +children.0.file.filename=$imagefolder/primary.qcow2,children.0. > > > > > +driv > > > > > +er=q > > > > > +cow2 -S > > > > > + > > > > > +2. Secondary: > > > > > +# imagefolder="/mnt/vms/colo-test-secondary" > > > > > +# primary_ip=127.0.0.1 > > > > > + > > > > > +# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 > > > > > +10G > > > > > + > > > > > +# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 > > > 10G > > > > > + > > > > > > > > The active disk and hidden disk just need create one time, we can > > > > note that > > > here. > > > > > > Ok, I will Note that. But I will wait until the block changes are > > > reviewed before sending the next version. > > > > That's fine for me. > > > > Thanks > > Zhang Chen > > > > > > > > Regards, > > > Lukas Straub > > > > > > > > +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m > 512 - > > > smp > > > > > 1 -qmp stdio \ > > > > > + -device piix3-usb-uhci -device usb-tablet -name secondary \ > > > > > + -netdev > > > > > + tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper > > > > > \ > > > > > + -device rtl8139,id=e0,netdev=hn0 \ > > > > > + -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect=1 > \ > > > > > + -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect=1 > \ > > > > > + -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \ > > > > > + -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 > \ > > > > > + -object filter-rewriter,id=rew0,netdev=hn0,queue=all \ > > > > > + -drive > > > > > if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driv > > > > > er=q > > > > > cow > > > > > 2 \ > > > > > + -drive > > > > > +if=none,id=childs0,driver=replication,mode=secondary,file.drive > > > > > +r=qc > > > > > +ow2, > > > > > +\ > > > > > +top-id=childs0,file.file.filename=$imagefolder/secondary-active > > > > > +.qco > > > > > +w2,\ > > > > > +file.backing.driver=qcow2,file.backing.file.filename=$imagefold > > > > > +er/s > > > > > +econ > > > > > +dary-hidden.qcow2,\ > > > > > +file.backing.backing=parent0 \ > > > > > + -drive > > > > > +if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-thres > > > > > +hold > > > > > +=1,\ > > > > > +children.0=childs0 \ > > > > > + -incoming tcp:0.0.0.0:9998 > > > > > + > > > > > + > > > > > +3. On Secondary VM's QEMU monitor, issue command > > > > > {'execute':'qmp_capabilities'} > > > > > -{ 'execute': 'nbd-server-start', > > > > > - 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': > > > > > '8889'} } } -} > > > > > -{'execute': 'nbd-server-add', 'arguments': {'device': > > > > > 'secondary-disk0', > > > > > 'writable': true } } > > > > > +{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': > > > > > +'inet', > > > > > +'data': {'host': '0.0.0.0', 'port': '9999'} } } } > > > > > +{'execute': 'nbd-server-add', 'arguments': {'device': > > > > > +'parent0', > > > > > +'writable': true } } > > > > > > > > > > Note: > > > > > a. The qmp command nbd-server-start and nbd-server-add must > > > > > be run @@ -182,44 +212,113 @@ Note: > > > > > same. > > > > > c. It is better to put active disk and hidden disk in ramdisk. > > > > > > > > > > -3. On Primary VM's QEMU monitor, issue command: > > > > > +4. On Primary VM's QEMU monitor, issue command: > > > > > {'execute':'qmp_capabilities'} > > > > > -{ 'execute': 'human-monitor-command', > > > > > - 'arguments': {'command-line': 'drive_add -n buddy > > > > > driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx. > > > > > xx.x > > > > > x,file.p > > > > > ort=8889,file.export=secondary-disk0,node-name=nbd_client0'}} > > > > > -{ 'execute':'x-blockdev-change', 'arguments':{'parent': > > > > > 'primary-disk0', > > > > > 'node': 'nbd_client0' } } -{ 'execute': 'migrate-set-capabilities', > > > > > - 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } > > > > > -{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' > > > > > } } > > > > > +{'execute': 'human-monitor-command', 'arguments': {'command- > line': > > > > > +'drive_add -n buddy > > > > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0 > > > > > +.0.2 ,fil > > > > > +e.port=9999,file.export=parent0,node-name=replication0'}} > > > > > +{'execute': 'x-blockdev-change', 'arguments':{'parent': > > > > > +'colo-disk0', > > > > > +'node': 'replication0' } } > > > > > +{'execute': 'migrate-set-capabilities', 'arguments': > > > > > +{'capabilities': [ > > > > > +{'capability': 'x-colo', 'state': true } ] } } > > > > > +{'execute': 'migrate', 'arguments': {'uri': > > > > > +'tcp:127.0.0.2:9998' } } > > > > > > > > > > Note: > > > > > a. There should be only one NBD Client for each primary disk. > > > > > - b. xx.xx.xx.xx is the secondary physical machine's hostname > > > > > or IP > > > > > - c. The qmp command line must be run after running qmp command > > > > > line in > > > > > + b. The qmp command line must be run after running qmp > command > > > > > + line in > > > > > secondary qemu. > > > > > > > > > > -4. After the above steps, you will see, whenever you make > > > > > changes to PVM, SVM will be synced. > > > > > +5. After the above steps, you will see, whenever you make > > > > > +changes to PVM, > > > > > SVM will be synced. > > > > > You can issue command '{ "execute": "migrate-set-parameters" , > > > > > "arguments":{ "x-checkpoint-delay": 2000 } }' > > > > > -to change the checkpoint period time > > > > > +to change the idle checkpoint period time > > > > > + > > > > > +6. Failover test > > > > > +You can kill one of the VMs and Failover on the surviving VM: > > > > > + > > > > > +If you killed the Secondary, then follow "Primary Failover". > > > > > +After that, if you want to resume the replication, follow > > > > > +"Primary resume > > > > > replication" > > > > > + > > > > > +If you killed the Primary, then follow "Secondary Failover". > > > > > +After that, if you want to resume the replication, follow > > > > > +"Secondary resume > > > > > replication" > > > > > + > > > > > +== Primary Failover == > > > > > +The Secondary died, resume on the Primary > > > > > + > > > > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': > > > > > +'colo-disk0', > > > > > +'child': 'children.1'} } > > > > > +{'execute': 'human-monitor-command', 'arguments':{ 'command- > line': > > > > > +'drive_del replication0' } } > > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } } > > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } } > > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'm0' } } > > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } } > > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } } > > > > > +{'execute': 'x-colo-lost-heartbeat' } > > > > > + > > > > > +== Secondary Failover == > > > > > +The Primary died, resume on the Secondary and prepare to become > > > > > +the > > > > > new > > > > > +Primary > > > > > + > > > > > +{'execute': 'nbd-server-stop'} > > > > > +{'execute': 'x-colo-lost-heartbeat'} > > > > > + > > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'f2' } } > > > > > +{'execute': 'object-del', 'arguments':{ 'id': 'f1' } } > > > > > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red1' } } > > > > > +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red0' } } > > > > > + > > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'mirror0', 'backend': > > > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > > > +'0.0.0.0', 'port': '9003' } }, 'server': true } } } } > > > > > > > > Same like I said before. > > > > > > > > Others statement looks good for me. > > > > > > > > Thanks > > > > Zhang Chen > > > > > > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare1', 'backend': > > > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > > > +'0.0.0.0', 'port': '9004' } }, 'server': true } } } } > > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0', 'backend': > > > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > > > +'127.0.0.1', 'port': '9001' } }, 'server': true } } } } > > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0-0', 'backend': > > > > > +{'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': > > > > > +'127.0.0.1', 'port': '9001' } }, 'server': false } } } } > > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out', > > > > > +'backend': {'type': 'socket', 'data': {'addr': { 'type': > > > > > +'inet', > > > > > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': > > > > > +true } } } } > > > > > +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out0', > > > > > +'backend': {'type': 'socket', 'data': {'addr': { 'type': > > > > > +'inet', > > > > > +'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': > > > > > +false } } } } > > > > > + > > > > > +== Primary resume replication == Resume replication after new > > > > > +Secondary is up. > > > > > + > > > > > +Start the new Secondary (Steps 2 and 3 above), then on the Primary: > > > > > +{'execute': 'drive-mirror', 'arguments':{ 'device': > > > > > +'colo-disk0', > > > > > +'job-id': 'resync', 'target': 'nbd://127.0.0.2:9999/parent0', 'mode': > > > > > +'existing', 'format': 'raw', 'sync': 'full'} } > > > > > + > > > > > +Wait until disk is synced, then: > > > > > +{'execute': 'stop'} > > > > > +{'execute': 'block-job-cancel', 'arguments':{ 'device': > > > > > +'resync'} } > > > > > + > > > > > +{'execute': 'human-monitor-command', 'arguments':{ 'command- > line': > > > > > +'drive_add -n buddy > > > > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0 > > > > > +.0.2 ,fil > > > > > +e.port=9999,file.export=parent0,node-name=replication0'}} > > > > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': > > > > > +'colo-disk0', > > > > > +'node': 'replication0' } } > > > > > + > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > > +'filter-mirror', > > > > > +'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': > > > > > +'mirror0' } } } > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > > +'filter-redirector', 'id': 'redire0', 'props': { 'netdev': > > > > > +'hn0', > > > > > +'queue': 'rx', 'indev': 'compare_out' } } } > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > > +'filter-redirector', 'id': 'redire1', 'props': { 'netdev': > > > > > +'hn0', > > > > > +'queue': 'rx', 'outdev': 'compare0' } } } > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > > > > > +'iothread1' } } > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > > +'colo-compare', > > > > > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > > > > > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } > > > > > +} } > > > > > + > > > > > +{'execute': 'migrate-set-capabilities', 'arguments':{ > > > > > +'capabilities': [ > > > > > +{'capability': 'x-colo', 'state': true } ] } } > > > > > +{'execute': 'migrate', 'arguments':{ 'uri': > > > > > +'tcp:127.0.0.2:9998' } } > > > > > + > > > > > +Note: > > > > > +If this Primary previously was a Secondary, then we need to > > > > > +insert the filters before the filter-rewriter by using the > > > > > +"'insert': 'before', 'position': 'id=rew0'" Options. See below. > > > > > + > > > > > +== Secondary resume replication == Become Primary and resume > > > > > +replication after new Secondary is up. > > > > > +Note that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary. > > > > > + > > > > > +Start the new Secondary (Steps 2 and 3 above, but with > > > > > +primary_ip=127.0.0.2), then on the old Secondary: > > > > > +{'execute': 'drive-mirror', 'arguments':{ 'device': > > > > > +'colo-disk0', > > > > > +'job-id': 'resync', 'target': 'nbd://127.0.0.1:9999/parent0', 'mode': > > > > > +'existing', 'format': 'raw', 'sync': 'full'} } > > > > > + > > > > > +Wait until disk is synced, then: > > > > > +{'execute': 'stop'} > > > > > +{'execute': 'block-job-cancel', 'arguments':{ 'device': > > > > > +'resync' } } > > > > > > > > > > -5. Failover test > > > > > -You can kill Primary VM and run 'x_colo_lost_heartbeat' in > > > > > Secondary VM's - monitor at the same time, then SVM will > > > > > failover and client will not detect this -change. > > > > > +{'execute': 'human-monitor-command', 'arguments':{ 'command- > line': > > > > > +'drive_add -n buddy > > > > > +driver=replication,mode=primary,file.driver=nbd,file.host=127.0 > > > > > +.0.1 ,fil > > > > > +e.port=9999,file.export=parent0,node-name=replication0'}} > > > > > +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': > > > > > +'colo-disk0', > > > > > +'node': 'replication0' } } > > > > > > > > > > -Before issuing '{ "execute": "x-colo-lost-heartbeat" }' > > > > > command, we have to -issue block related command to stop block > replication. > > > > > -Primary: > > > > > - Remove the nbd child from the quorum: > > > > > - { 'execute': 'x-blockdev-change', 'arguments': {'parent': > > > > > 'colo-disk0', > > > 'child': > > > > > 'children.1'}} > > > > > - { 'execute': 'human-monitor-command','arguments': {'command- > line': > > > > > 'drive_del blk-buddy0'}} > > > > > - Note: there is no qmp command to remove the blockdev now > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > > +'filter-mirror', > > > > > +'id': 'm0', 'props': { 'insert': 'before', 'position': > > > > > +'id=rew0', > > > > > +'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > > +'filter-redirector', 'id': 'redire0', 'props': { 'insert': > > > > > +'before', > > > > > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': > > > > > +'compare_out' } } } > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > > +'filter-redirector', 'id': 'redire1', 'props': { 'insert': > > > > > +'before', > > > > > +'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': > > > > > +'compare0' } } } > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': > > > > > +'iothread1' } } > > > > > +{'execute': 'object-add', 'arguments':{ 'qom-type': > > > > > +'colo-compare', > > > > > +'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': > > > > > +'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } > > > > > +} } > > > > > > > > > > -Secondary: > > > > > - The primary host is down, so we should do the following thing: > > > > > - { 'execute': 'nbd-server-stop' } > > > > > +{'execute': 'migrate-set-capabilities', 'arguments':{ > > > > > +'capabilities': [ > > > > > +{'capability': 'x-colo', 'state': true } ] } } > > > > > +{'execute': 'migrate', 'arguments':{ 'uri': > > > > > +'tcp:127.0.0.1:9998' } } > > > > > > > > > > == TODO == > > > > > -1. Support continuous VM replication. > > > > > -2. Support shared storage. > > > > > -3. Develop the heartbeat part. > > > > > -4. Reduce checkpoint VM’s downtime while doing checkpoint. > > > > > +1. Support shared storage. > > > > > +2. Develop the heartbeat part. > > > > > +3. Reduce checkpoint VM’s downtime while doing checkpoint.
diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index ad24680d13..bc1a0ccb99 100644 --- a/docs/COLO-FT.txt +++ b/docs/COLO-FT.txt @@ -145,35 +145,65 @@ The diagram just shows the main qmp command, you can get the detail in test procedure. == Test procedure == -1. Startup qemu -Primary: -# qemu-system-x86_64 -accel kvm -m 2048 -smp 2 -qmp stdio -name primary \ - -device piix3-usb-uhci -vnc :7 \ - -device usb-tablet -netdev tap,id=hn0,vhost=off \ - -device virtio-net-pci,id=net-pci0,netdev=hn0 \ - -drive if=virtio,id=primary-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\ - children.0.file.filename=1.raw,\ - children.0.driver=raw -S -Secondary: -# qemu-system-x86_64 -accel kvm -m 2048 -smp 2 -qmp stdio -name secondary \ - -device piix3-usb-uhci -vnc :7 \ - -device usb-tablet -netdev tap,id=hn0,vhost=off \ - -device virtio-net-pci,id=net-pci0,netdev=hn0 \ - -drive if=none,id=secondary-disk0,file.filename=1.raw,driver=raw,node-name=node0 \ - -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\ - file.driver=qcow2,top-id=active-disk0,\ - file.file.filename=/mnt/ramfs/active_disk.img,\ - file.backing.driver=qcow2,\ - file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\ - file.backing.backing=secondary-disk0 \ - -incoming tcp:0:8888 - -2. On Secondary VM's QEMU monitor, issue command +Note: Here we are running both instances on the same Host for testing, +change the IP Addresses if you want to run it on two Hosts. Initally +127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host. + +== Startup qemu == +1. Primary: +Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all Hosts. +# imagefolder="/mnt/vms/colo-test-primary" + +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp 1 -qmp stdio \ + -device piix3-usb-uhci -device usb-tablet -name primary \ + -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \ + -device rtl8139,id=e0,netdev=hn0 \ + -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server,nowait \ + -chardev socket,id=compare1,host=0.0.0.0,port=9004,server,wait \ + -chardev socket,id=compare0,host=127.0.0.1,port=9001,server,nowait \ + -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \ + -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server,nowait \ + -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \ + -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \ + -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \ + -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \ + -object iothread,id=iothread1 \ + -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,\ +outdev=compare_out0,iothread=iothread1 \ + -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\ +children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=qcow2 -S + +2. Secondary: +# imagefolder="/mnt/vms/colo-test-secondary" +# primary_ip=127.0.0.1 + +# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G + +# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G + +# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp 1 -qmp stdio \ + -device piix3-usb-uhci -device usb-tablet -name secondary \ + -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \ + -device rtl8139,id=e0,netdev=hn0 \ + -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect=1 \ + -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect=1 \ + -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \ + -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \ + -object filter-rewriter,id=rew0,netdev=hn0,queue=all \ + -drive if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow2 \ + -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,\ +top-id=childs0,file.file.filename=$imagefolder/secondary-active.qcow2,\ +file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secondary-hidden.qcow2,\ +file.backing.backing=parent0 \ + -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\ +children.0=childs0 \ + -incoming tcp:0.0.0.0:9998 + + +3. On Secondary VM's QEMU monitor, issue command {'execute':'qmp_capabilities'} -{ 'execute': 'nbd-server-start', - 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': '8889'} } } -} -{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 'writable': true } } +{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '0.0.0.0', 'port': '9999'} } } } +{'execute': 'nbd-server-add', 'arguments': {'device': 'parent0', 'writable': true } } Note: a. The qmp command nbd-server-start and nbd-server-add must be run @@ -182,44 +212,113 @@ Note: same. c. It is better to put active disk and hidden disk in ramdisk. -3. On Primary VM's QEMU monitor, issue command: +4. On Primary VM's QEMU monitor, issue command: {'execute':'qmp_capabilities'} -{ 'execute': 'human-monitor-command', - 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=secondary-disk0,node-name=nbd_client0'}} -{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'primary-disk0', 'node': 'nbd_client0' } } -{ 'execute': 'migrate-set-capabilities', - 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } -{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' } } +{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0'}} +{'execute': 'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'replication0' } } +{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } +{'execute': 'migrate', 'arguments': {'uri': 'tcp:127.0.0.2:9998' } } Note: a. There should be only one NBD Client for each primary disk. - b. xx.xx.xx.xx is the secondary physical machine's hostname or IP - c. The qmp command line must be run after running qmp command line in + b. The qmp command line must be run after running qmp command line in secondary qemu. -4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced. +5. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced. You can issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }' -to change the checkpoint period time +to change the idle checkpoint period time + +6. Failover test +You can kill one of the VMs and Failover on the surviving VM: + +If you killed the Secondary, then follow "Primary Failover". After that, +if you want to resume the replication, follow "Primary resume replication" + +If you killed the Primary, then follow "Secondary Failover". After that, +if you want to resume the replication, follow "Secondary resume replication" + +== Primary Failover == +The Secondary died, resume on the Primary + +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'child': 'children.1'} } +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_del replication0' } } +{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } } +{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } } +{'execute': 'object-del', 'arguments':{ 'id': 'm0' } } +{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } } +{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } } +{'execute': 'x-colo-lost-heartbeat' } + +== Secondary Failover == +The Primary died, resume on the Secondary and prepare to become the new Primary + +{'execute': 'nbd-server-stop'} +{'execute': 'x-colo-lost-heartbeat'} + +{'execute': 'object-del', 'arguments':{ 'id': 'f2' } } +{'execute': 'object-del', 'arguments':{ 'id': 'f1' } } +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red1' } } +{'execute': 'chardev-remove', 'arguments':{ 'id': 'red0' } } + +{'execute': 'chardev-add', 'arguments':{ 'id': 'mirror0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '0.0.0.0', 'port': '9003' } }, 'server': true } } } } +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare1', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '0.0.0.0', 'port': '9004' } }, 'server': true } } } } +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9001' } }, 'server': true } } } } +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0-0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9001' } }, 'server': false } } } } +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': true } } } } +{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': false } } } } + +== Primary resume replication == +Resume replication after new Secondary is up. + +Start the new Secondary (Steps 2 and 3 above), then on the Primary: +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', 'job-id': 'resync', 'target': 'nbd://127.0.0.2:9999/parent0', 'mode': 'existing', 'format': 'raw', 'sync': 'full'} } + +Wait until disk is synced, then: +{'execute': 'stop'} +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync'} } + +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0'}} +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } } + +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': 'iothread1' } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } + +{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.2:9998' } } + +Note: +If this Primary previously was a Secondary, then we need to insert the +filters before the filter-rewriter by using the +"'insert': 'before', 'position': 'id=rew0'" Options. See below. + +== Secondary resume replication == +Become Primary and resume replication after new Secondary is up. Note +that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary. + +Start the new Secondary (Steps 2 and 3 above, but with primary_ip=127.0.0.2), +then on the old Secondary: +{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', 'job-id': 'resync', 'target': 'nbd://127.0.0.1:9999/parent0', 'mode': 'existing', 'format': 'raw', 'sync': 'full'} } + +Wait until disk is synced, then: +{'execute': 'stop'} +{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync' } } -5. Failover test -You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's -monitor at the same time, then SVM will failover and client will not detect this -change. +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0'}} +{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } } -Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to -issue block related command to stop block replication. -Primary: - Remove the nbd child from the quorum: - { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}} - { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}} - Note: there is no qmp command to remove the blockdev now +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': 'iothread1' } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } -Secondary: - The primary host is down, so we should do the following thing: - { 'execute': 'nbd-server-stop' } +{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } +{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.1:9998' } } == TODO == -1. Support continuous VM replication. -2. Support shared storage. -3. Develop the heartbeat part. -4. Reduce checkpoint VM’s downtime while doing checkpoint. +1. Support shared storage. +2. Develop the heartbeat part. +3. Reduce checkpoint VM’s downtime while doing checkpoint. diff --git a/docs/block-replication.txt b/docs/block-replication.txt index 6bde6737fb..108e9166a8 100644 --- a/docs/block-replication.txt +++ b/docs/block-replication.txt @@ -65,12 +65,12 @@ blocks that are already in QEMU. ^ || .---------- | || | Secondary 1 Quorum || '---------- - / \ || - / \ || - Primary 2 filter - disk ^ virtio-blk - | ^ - 3 NBD -------> 3 NBD | + / \ || virtio-blk + / \ || ^ + Primary 2 filter | + disk ^ 7 Quorum + | / + 3 NBD -------> 3 NBD / client || server 2 filter || ^ ^ --------. || | | @@ -106,6 +106,10 @@ any state that would otherwise be lost by the speculative write-through of the NBD server into the secondary disk. So before block replication, the primary disk and secondary disk should contain the same data. +7) The secondary also has a quorum node, so after secondary failover it +can become the new primary and continue replication. + + == Failure Handling == There are 7 internal errors when block replication is running: 1. I/O error on primary disk @@ -171,16 +175,18 @@ Primary: leading whitespace. 5. The qmp command line must be run after running qmp command line in secondary qemu. - 6. After failover we need remove children.1 (replication driver). + 6. After primary failover we need remove children.1 (replication driver). Secondary: -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \ - -drive if=xxx,id=topxxx,driver=replication,mode=secondary,top-id=topxxx\ + -drive if=none,id=childs1,driver=replication,mode=secondary,top-id=childs1 file.file.filename=active_disk.qcow2,\ file.driver=qcow2,\ file.backing.file.filename=hidden_disk.qcow2,\ file.backing.driver=qcow2,\ file.backing.backing=colo1 + -drive if=xxx,driver=quorum,read-pattern=fifo,id=top-disk1,\ + vote-threshold=1,children.0=childs1 Then run qmp command in secondary qemu: { 'execute': 'nbd-server-start', @@ -234,6 +240,8 @@ Secondary: The primary host is down, so we should do the following thing: { 'execute': 'nbd-server-stop' } +Promote Secondary to Primary: + see COLO-FT.txt + TODO: -1. Continuous block replication -2. Shared disk +1. Shared disk
Document the qemu command-line and qmp commands for continuous replication Signed-off-by: Lukas Straub <lukasstraub2@web.de> --- docs/COLO-FT.txt | 213 +++++++++++++++++++++++++++---------- docs/block-replication.txt | 28 +++-- 2 files changed, 174 insertions(+), 67 deletions(-)