Message ID | 1475138797-9908-17-git-send-email-zhang.zhanghailiang@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Sep 29, 2016 at 04:46:36PM +0800, zhanghailiang wrote: > Introduce the design of COLO, and how to test it. I think this patch could be placed much earlier in the series, so the purpose of the other patches is clearer, when they are read in order. > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> [...] > +COLO Proxy: > +Delivers packets to Primary and Seconday, and then compare the responses from "Secondary" > +both side. Then decide whether to start a checkpoint according to some rules. "both sides" Thanks, Jonathan Neuschäfer
On 09/29/2016 03:46 AM, zhanghailiang wrote: > Introduce the design of COLO, and how to test it. > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > --- > docs/COLO-FT.txt | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 190 insertions(+) > create mode 100644 docs/COLO-FT.txt > > + > +== Background == > +Virtual machine (VM) replication is a well known technique for providing > +application-agnostic software-implemented hardware fault tolerance > +"non-stop service". Do you want s/tolerance/tolerance, also known as/ ? > +== Architecture == > + > +The architecture of COLO is shown in the bellow diagram. s/bellow diagram/diagram below/ > +It consists of a pair of networked physical nodes: > +The primary node running the PVM, and the secondary node running the SVM > +to maintain a valid replica of the PVM. > +PVM and SVM execute in parallel and generate output of response packets for > +client requests according to the application semantics. > + > +The incoming packets from the client or external network are received by the > +primary node, and then forwarded to the secondary node, so that Both the PVM s/Both/both/ > +and the SVM are stimulated with the same requests. > + > +COLO receives the outbound packets from both the PVM and SVM and compares them > +before allowing the output to be sent to clients. > + > +The SVM is qualified as a valid replica of the PVM, as long as it generates > +identical responses to all client requests. Once the differences in the outputs > +are detected between the PVM and SVM, COLO withholds transmission of the > +outbound packets until it has successfully synchronized the PVM state to the SVM. > + > +== Components introduction == > + > +You can see there are several components in COLO's diagram of architecture. > +Their functions are described as bellow. s/as bellow/below/ > + > +HeartBeat: > +Runs on both the primary and secondary nodes, to periodically check platform > +availability. When the primary node suffers a hardware fail-stop failure, > +the heartbeat stops responding, the secondary node will trigger a failover > +as soon as it determines the absence. > + > +COLO disk Manager: > +When primary VM writes data into image, the colo disk manger captures this data > +and send it to secondary VM’s which makes sure the context of secondary VM's s/send/sends/ > +image is consentient with the context of primary VM 's image. s/consentient/consistent/ s/VM 's/VM's/ > +For more details, please refer to docs/block-replication.txt. > + > +Checkpoint/Failover Controller: > +Modifications of save/restore flow to realize continuous migration, > +to make sure the state of VM in Secondary side always be consistent with VM in s/always be/is always/ > +Primary side. > + > +COLO Proxy: > +Delivers packets to Primary and Seconday, and then compare the responses from > +both side. Then decide whether to start a checkpoint according to some rules. > + > +Note: > + a. HeartBeat is not been realized, so you need to trigger failover process s/is/has/ s/realized/implemented yet/ Is this note going to be stale once heartbeat is implemented? > + by using 'x-colo-lost-heartbeat' command. > + b. COLO proxy compents is work-in-process, it only support periodic checkpoint s/compents is/components are a/ > + mode now, just as Micro-checkpointing. > + > +3. On Primary VM's QEMU monitor, issue command: > +{'execute':'qmp_capabilities'} > +{ 'execute': 'human-monitor-command', > + 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=colo-disk0,node-name=node0'}} It would be really nice if we could get this done through QMP blockdev-add instead of HMP drive_add. > + > +Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to > +issue block related command to stop block replication. > +Primary: > + Remove the nbd child from the quorum: > + { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}} > + { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}} > + Note: there is no qmp command to remove the blockdev now Don't we have x-blockdev-del? > + > +Secondary: > + The primary host is down, so we should do the following thing: > + { 'execute': 'nbd-server-stop' } > + > +== TODO == > +1. Support continuously VM replication. s/continuously/continuous/ > +2. Support shared storage. > +3. Develop the heartbeat part. > +4. Reduce checkpoint VM’s downtime while do checkpoint. s/do/doing/ >
On 2016/10/5 21:37, Eric Blake wrote: > On 09/29/2016 03:46 AM, zhanghailiang wrote: >> Introduce the design of COLO, and how to test it. >> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >> --- >> docs/COLO-FT.txt | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 190 insertions(+) >> create mode 100644 docs/COLO-FT.txt >> > >> + >> +== Background == >> +Virtual machine (VM) replication is a well known technique for providing >> +application-agnostic software-implemented hardware fault tolerance >> +"non-stop service". > > Do you want s/tolerance/tolerance, also known as/ ? > Yes, that is more appropriate. > >> +== Architecture == >> + >> +The architecture of COLO is shown in the bellow diagram. > > s/bellow diagram/diagram below/ > >> +It consists of a pair of networked physical nodes: >> +The primary node running the PVM, and the secondary node running the SVM >> +to maintain a valid replica of the PVM. >> +PVM and SVM execute in parallel and generate output of response packets for >> +client requests according to the application semantics. >> + >> +The incoming packets from the client or external network are received by the >> +primary node, and then forwarded to the secondary node, so that Both the PVM > > s/Both/both/ > >> +and the SVM are stimulated with the same requests. >> + >> +COLO receives the outbound packets from both the PVM and SVM and compares them >> +before allowing the output to be sent to clients. >> + >> +The SVM is qualified as a valid replica of the PVM, as long as it generates >> +identical responses to all client requests. Once the differences in the outputs >> +are detected between the PVM and SVM, COLO withholds transmission of the >> +outbound packets until it has successfully synchronized the PVM state to the SVM. >> + > >> +== Components introduction == >> + >> +You can see there are several components in COLO's diagram of architecture. >> +Their functions are described as bellow. > > s/as bellow/below/ > >> + >> +HeartBeat: >> +Runs on both the primary and secondary nodes, to periodically check platform >> +availability. When the primary node suffers a hardware fail-stop failure, >> +the heartbeat stops responding, the secondary node will trigger a failover >> +as soon as it determines the absence. >> + >> +COLO disk Manager: >> +When primary VM writes data into image, the colo disk manger captures this data >> +and send it to secondary VM’s which makes sure the context of secondary VM's > > s/send/sends/ > >> +image is consentient with the context of primary VM 's image. > > s/consentient/consistent/ > s/VM 's/VM's/ > >> +For more details, please refer to docs/block-replication.txt. >> + >> +Checkpoint/Failover Controller: >> +Modifications of save/restore flow to realize continuous migration, >> +to make sure the state of VM in Secondary side always be consistent with VM in > > s/always be/is always/ > >> +Primary side. >> + >> +COLO Proxy: >> +Delivers packets to Primary and Seconday, and then compare the responses from >> +both side. Then decide whether to start a checkpoint according to some rules. >> + >> +Note: >> + a. HeartBeat is not been realized, so you need to trigger failover process > > s/is/has/ > s/realized/implemented yet/ > > Is this note going to be stale once heartbeat is implemented? > Yes, but we're not sure if it is suitable to implement it in qemu. >> + by using 'x-colo-lost-heartbeat' command. >> + b. COLO proxy compents is work-in-process, it only support periodic checkpoint > > s/compents is/components are a/ > >> + mode now, just as Micro-checkpointing. >> + > >> +3. On Primary VM's QEMU monitor, issue command: >> +{'execute':'qmp_capabilities'} >> +{ 'execute': 'human-monitor-command', >> + 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=colo-disk0,node-name=node0'}} > > It would be really nice if we could get this done through QMP > blockdev-add instead of HMP drive_add. > You are right, but this command doesn't support nbd drive yet in upstream. I saw Max had send a patch-set to support it. I will update this after his patches been merged. >> + >> +Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to >> +issue block related command to stop block replication. >> +Primary: >> + Remove the nbd child from the quorum: >> + { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}} >> + { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}} >> + Note: there is no qmp command to remove the blockdev now > > Don't we have x-blockdev-del? > Yes, we can use this command, I'll fix it in next version. >> + >> +Secondary: >> + The primary host is down, so we should do the following thing: >> + { 'execute': 'nbd-server-stop' } >> + >> +== TODO == >> +1. Support continuously VM replication. > > s/continuously/continuous/ > >> +2. Support shared storage. >> +3. Develop the heartbeat part. >> +4. Reduce checkpoint VM’s downtime while do checkpoint. > > s/do/doing/ > >> All the above typos and grammatical mistake will be fixed in next version, thanks! Hailiang >
diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt new file mode 100644 index 0000000..f1ba580 --- /dev/null +++ b/docs/COLO-FT.txt @@ -0,0 +1,190 @@ +COarse-grained LOck-stepping Virtual Machines for Non-stop Service +---------------------------------------- +Copyright (c) 2016 Intel Corporation +Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD. +Copyright (c) 2016 Fujitsu, Corp. + +This work is licensed under the terms of the GNU GPL, version 2 or later. +See the COPYING file in the top-level directory. + +This document gives an overview of COLO's design and how to use it. + +== Background == +Virtual machine (VM) replication is a well known technique for providing +application-agnostic software-implemented hardware fault tolerance +"non-stop service". + +COLO (COarse-grained LOck-stepping) is a high availability solution. +Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the +same request from client, and generate response in parallel too. +If the response packets from PVM and SVM are identical, they are released +immediately. Otherwise, a VM checkpoint (on demand) is conducted. + +== Architecture == + +The architecture of COLO is shown in the bellow diagram. +It consists of a pair of networked physical nodes: +The primary node running the PVM, and the secondary node running the SVM +to maintain a valid replica of the PVM. +PVM and SVM execute in parallel and generate output of response packets for +client requests according to the application semantics. + +The incoming packets from the client or external network are received by the +primary node, and then forwarded to the secondary node, so that Both the PVM +and the SVM are stimulated with the same requests. + +COLO receives the outbound packets from both the PVM and SVM and compares them +before allowing the output to be sent to clients. + +The SVM is qualified as a valid replica of the PVM, as long as it generates +identical responses to all client requests. Once the differences in the outputs +are detected between the PVM and SVM, COLO withholds transmission of the +outbound packets until it has successfully synchronized the PVM state to the SVM. + + Primary Node Secondary Node + +------------+ +-----------------------+ +------------------------+ +------------+ + | | | HeartBeat |<----->| HeartBeat | | | + | Primary VM | +-----------|-----------+ +-----------|------------+ |Secondary VM| + | | | | | | + | | +-----------|-----------+ +-----------|------------+ | | + | | |QEMU +---v----+ | |QEMU +----v---+ | | | + | | | |Failover| | | |Failover| | | | + | | | +--------+ | | +--------+ | | | + | | | +---------------+ | | +---------------+ | | | + | | | | VM Checkpoint |-------------->| VM Checkpoint | | | | + | | | +---------------+ | | +---------------+ | | | + | | | | | | | | + |Requests<---------------------------^------------------------------------------>Requests| + |Responses----------------------\ /--|--------------\ /------------------------Responses| + | | | | | | | | | | | | | + | | | +-----------+ | | | | | | | +------------+ | | | + | | | | COLO disk | | | | | | | | | COLO disk | | | | + | | | | Manager |-|-|--|--------------|--|->| Manager | | | | + | | | +|----------+ | | | | | | | +-----------|+ | | | + | | | | | | | | | | | | | | | + +------------+ +--|------------|-|--|--+ +---|--|--------------|--+ +------------+ + | | | | | | | + +-------------+ | +----------v-v--|--+ +---|--v-----------+ | +-------------+ + | VM Monitor | | | COLO Proxy | | COLO Proxy | | | VM Monitor | + | | | |(compare packet) | | (adjust sequence)| | | | + +-------------+ | +----------|----^--+ +------------------+ | +-------------+ + | | | | + +------------------|------------|----|--+ +---------------------|------------------+ + | Kernel | | | | | Kernel | | + +------------------|------------|----|--+ +---------------------|------------------+ + | | | | + +--------------v+ +--------v----|--+ +------------------+ +v-------------+ + | Storage | |External Network| | External Network | | Storage | + +---------------+ +----------------+ +------------------+ +--------------+ + +== Components introduction == + +You can see there are several components in COLO's diagram of architecture. +Their functions are described as bellow. + +HeartBeat: +Runs on both the primary and secondary nodes, to periodically check platform +availability. When the primary node suffers a hardware fail-stop failure, +the heartbeat stops responding, the secondary node will trigger a failover +as soon as it determines the absence. + +COLO disk Manager: +When primary VM writes data into image, the colo disk manger captures this data +and send it to secondary VM’s which makes sure the context of secondary VM's +image is consentient with the context of primary VM 's image. +For more details, please refer to docs/block-replication.txt. + +Checkpoint/Failover Controller: +Modifications of save/restore flow to realize continuous migration, +to make sure the state of VM in Secondary side always be consistent with VM in +Primary side. + +COLO Proxy: +Delivers packets to Primary and Seconday, and then compare the responses from +both side. Then decide whether to start a checkpoint according to some rules. + +Note: + a. HeartBeat is not been realized, so you need to trigger failover process + by using 'x-colo-lost-heartbeat' command. + b. COLO proxy compents is work-in-process, it only support periodic checkpoint + mode now, just as Micro-checkpointing. + +== Test procedure == +1. Startup qemu +Primary: +# qemu-kvm -enable-kvm -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary \ + -device piix3-usb-uhci \ + -device usb-tablet -netdev tap,id=hn0,vhost=off \ + -device virtio-net-pci,id=net-pci0,netdev=hn0 \ + -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\ + children.0.file.filename=1.raw,\ + children.0.driver=raw -S +Secondary: +# qemu-kvm -enable-kvm -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary \ + -device piix3-usb-uhci \ + -device usb-tablet -netdev tap,id=hn0,vhost=off \ + -device virtio-net-pci,id=net-pci0,netdev=hn0 \ + -drive if=none,id=colo-disk0,file.filename=1.raw,driver=raw,node-name=node0 \ + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\ + file.driver=qcow2,top-id=active-disk0,\ + file.file.filename=/mnt/ramfs/active_disk.img,\ + file.backing.driver=qcow2,\ + file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\ + file.backing.backing=colo-disk0 \ + -incoming tcp:0:8888 + +2. On Secondary VM's QEMU monitor, issue command +{'execute':'qmp_capabilities'} +{ 'execute': 'nbd-server-start', + 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': '8889'} } } +} +{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } } + +Note: + a. The qmp command nbd-server-start and nbd-server-add must be run + before running the qmp command migrate on primary QEMU + b. Active disk, hidden disk and nbd target's length should be the + same. + c. It is better to put active disk and hidden disk in ramdisk. + +3. On Primary VM's QEMU monitor, issue command: +{'execute':'qmp_capabilities'} +{ 'execute': 'human-monitor-command', + 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=colo-disk0,node-name=node0'}} +{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } } +{ 'execute': 'migrate-set-capabilities', + 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } +{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' } } + + Note: + a. There should be only one NBD Client for each primary disk. + b. xx.xx.xx.xx is the secondary physical machine's hostname or IP + c. The qmp command line must be run after running qmp command line in + secondary qemu. + +4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced. +You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }' +to change the checkpoint period time + +5. Failover test +You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's +monitor at the same time, then SVM will failover and client will not detect this +change. + +Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to +issue block related command to stop block replication. +Primary: + Remove the nbd child from the quorum: + { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}} + { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}} + Note: there is no qmp command to remove the blockdev now + +Secondary: + The primary host is down, so we should do the following thing: + { 'execute': 'nbd-server-stop' } + +== TODO == +1. Support continuously VM replication. +2. Support shared storage. +3. Develop the heartbeat part. +4. Reduce checkpoint VM’s downtime while do checkpoint.
Introduce the design of COLO, and how to test it. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> --- docs/COLO-FT.txt | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 190 insertions(+) create mode 100644 docs/COLO-FT.txt