diff mbox

[COLO-Frame,(Base),v20,16/17] docs: Add documentation for COLO feature

Message ID 1475138797-9908-17-git-send-email-zhang.zhanghailiang@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhanghailiang Sept. 29, 2016, 8:46 a.m. UTC
Introduce the design of COLO, and how to test it.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 docs/COLO-FT.txt | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 190 insertions(+)
 create mode 100644 docs/COLO-FT.txt

Comments

J. Neuschäfer Sept. 29, 2016, 11:45 a.m. UTC | #1
On Thu, Sep 29, 2016 at 04:46:36PM +0800, zhanghailiang wrote:
> Introduce the design of COLO, and how to test it.

I think this patch could be placed much earlier in the series, so the
purpose of the other patches is clearer, when they are read in order.

> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
[...]
> +COLO Proxy:
> +Delivers packets to Primary and Seconday, and then compare the responses from

"Secondary"

> +both side. Then decide whether to start a checkpoint according to some rules.

"both sides"


Thanks,
Jonathan Neuschäfer
Eric Blake Oct. 5, 2016, 1:37 p.m. UTC | #2
On 09/29/2016 03:46 AM, zhanghailiang wrote:
> Introduce the design of COLO, and how to test it.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  docs/COLO-FT.txt | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 190 insertions(+)
>  create mode 100644 docs/COLO-FT.txt
> 

> +
> +== Background ==
> +Virtual machine (VM) replication is a well known technique for providing
> +application-agnostic software-implemented hardware fault tolerance
> +"non-stop service".

Do you want s/tolerance/tolerance, also known as/ ?


> +== Architecture ==
> +
> +The architecture of COLO is shown in the bellow diagram.

s/bellow diagram/diagram below/

> +It consists of a pair of networked physical nodes:
> +The primary node running the PVM, and the secondary node running the SVM
> +to maintain a valid replica of the PVM.
> +PVM and SVM execute in parallel and generate output of response packets for
> +client requests according to the application semantics.
> +
> +The incoming packets from the client or external network are received by the
> +primary node, and then forwarded to the secondary node, so that Both the PVM

s/Both/both/

> +and the SVM are stimulated with the same requests.
> +
> +COLO receives the outbound packets from both the PVM and SVM and compares them
> +before allowing the output to be sent to clients.
> +
> +The SVM is qualified as a valid replica of the PVM, as long as it generates
> +identical responses to all client requests. Once the differences in the outputs
> +are detected between the PVM and SVM, COLO withholds transmission of the
> +outbound packets until it has successfully synchronized the PVM state to the SVM.
> +

> +== Components introduction ==
> +
> +You can see there are several components in COLO's diagram of architecture.
> +Their functions are described as bellow.

s/as bellow/below/

> +
> +HeartBeat:
> +Runs on both the primary and secondary nodes, to periodically check platform
> +availability. When the primary node suffers a hardware fail-stop failure,
> +the heartbeat stops responding, the secondary node will trigger a failover
> +as soon as it determines the absence.
> +
> +COLO disk Manager:
> +When primary VM writes data into image, the colo disk manger captures this data
> +and send it to secondary VM’s which makes sure the context of secondary VM's

s/send/sends/

> +image is consentient with the context of primary VM 's image.

s/consentient/consistent/
s/VM 's/VM's/

> +For more details, please refer to docs/block-replication.txt.
> +
> +Checkpoint/Failover Controller:
> +Modifications of save/restore flow to realize continuous migration,
> +to make sure the state of VM in Secondary side always be consistent with VM in

s/always be/is always/

> +Primary side.
> +
> +COLO Proxy:
> +Delivers packets to Primary and Seconday, and then compare the responses from
> +both side. Then decide whether to start a checkpoint according to some rules.
> +
> +Note:
> + a. HeartBeat is not been realized, so you need to trigger failover process

s/is/has/
s/realized/implemented yet/

Is this note going to be stale once heartbeat is implemented?

> +    by using 'x-colo-lost-heartbeat' command.
> + b. COLO proxy compents is work-in-process, it only support periodic checkpoint

s/compents is/components are a/

> +    mode now, just as Micro-checkpointing.
> +

> +3. On Primary VM's QEMU monitor, issue command:
> +{'execute':'qmp_capabilities'}
> +{ 'execute': 'human-monitor-command',
> +  'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=colo-disk0,node-name=node0'}}

It would be really nice if we could get this done through QMP
blockdev-add instead of HMP drive_add.

> +
> +Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
> +issue block related command to stop block replication.
> +Primary:
> +  Remove the nbd child from the quorum:
> +  { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
> +  { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}}
> +  Note: there is no qmp command to remove the blockdev now

Don't we have x-blockdev-del?

> +
> +Secondary:
> +  The primary host is down, so we should do the following thing:
> +  { 'execute': 'nbd-server-stop' }
> +
> +== TODO ==
> +1. Support continuously VM replication.

s/continuously/continuous/

> +2. Support shared storage.
> +3. Develop the heartbeat part.
> +4. Reduce checkpoint VM’s downtime while do checkpoint.

s/do/doing/

>
Zhanghailiang Oct. 8, 2016, 9:32 a.m. UTC | #3
On 2016/10/5 21:37, Eric Blake wrote:
> On 09/29/2016 03:46 AM, zhanghailiang wrote:
>> Introduce the design of COLO, and how to test it.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   docs/COLO-FT.txt | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 190 insertions(+)
>>   create mode 100644 docs/COLO-FT.txt
>>
>
>> +
>> +== Background ==
>> +Virtual machine (VM) replication is a well known technique for providing
>> +application-agnostic software-implemented hardware fault tolerance
>> +"non-stop service".
>
> Do you want s/tolerance/tolerance, also known as/ ?
>

Yes, that is more appropriate.

>
>> +== Architecture ==
>> +
>> +The architecture of COLO is shown in the bellow diagram.
>
> s/bellow diagram/diagram below/
>

>> +It consists of a pair of networked physical nodes:
>> +The primary node running the PVM, and the secondary node running the SVM
>> +to maintain a valid replica of the PVM.
>> +PVM and SVM execute in parallel and generate output of response packets for
>> +client requests according to the application semantics.
>> +
>> +The incoming packets from the client or external network are received by the
>> +primary node, and then forwarded to the secondary node, so that Both the PVM
>
> s/Both/both/
>

>> +and the SVM are stimulated with the same requests.
>> +
>> +COLO receives the outbound packets from both the PVM and SVM and compares them
>> +before allowing the output to be sent to clients.
>> +
>> +The SVM is qualified as a valid replica of the PVM, as long as it generates
>> +identical responses to all client requests. Once the differences in the outputs
>> +are detected between the PVM and SVM, COLO withholds transmission of the
>> +outbound packets until it has successfully synchronized the PVM state to the SVM.
>> +
>
>> +== Components introduction ==
>> +
>> +You can see there are several components in COLO's diagram of architecture.
>> +Their functions are described as bellow.
>
> s/as bellow/below/
>

>> +
>> +HeartBeat:
>> +Runs on both the primary and secondary nodes, to periodically check platform
>> +availability. When the primary node suffers a hardware fail-stop failure,
>> +the heartbeat stops responding, the secondary node will trigger a failover
>> +as soon as it determines the absence.
>> +
>> +COLO disk Manager:
>> +When primary VM writes data into image, the colo disk manger captures this data
>> +and send it to secondary VM’s which makes sure the context of secondary VM's
>
> s/send/sends/
>

>> +image is consentient with the context of primary VM 's image.
>
> s/consentient/consistent/
> s/VM 's/VM's/
>

>> +For more details, please refer to docs/block-replication.txt.
>> +
>> +Checkpoint/Failover Controller:
>> +Modifications of save/restore flow to realize continuous migration,
>> +to make sure the state of VM in Secondary side always be consistent with VM in
>
> s/always be/is always/
>

>> +Primary side.
>> +
>> +COLO Proxy:
>> +Delivers packets to Primary and Seconday, and then compare the responses from
>> +both side. Then decide whether to start a checkpoint according to some rules.
>> +
>> +Note:
>> + a. HeartBeat is not been realized, so you need to trigger failover process
>
> s/is/has/
> s/realized/implemented yet/
>
> Is this note going to be stale once heartbeat is implemented?
>

Yes, but we're not sure if it is suitable to implement it in qemu.

>> +    by using 'x-colo-lost-heartbeat' command.
>> + b. COLO proxy compents is work-in-process, it only support periodic checkpoint
>
> s/compents is/components are a/
>

>> +    mode now, just as Micro-checkpointing.
>> +
>
>> +3. On Primary VM's QEMU monitor, issue command:
>> +{'execute':'qmp_capabilities'}
>> +{ 'execute': 'human-monitor-command',
>> +  'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=colo-disk0,node-name=node0'}}
>
> It would be really nice if we could get this done through QMP
> blockdev-add instead of HMP drive_add.
>

You are right, but this command doesn't support nbd drive yet in upstream.
I saw Max had send a patch-set to support it. I will update this after his
patches been merged.

>> +
>> +Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
>> +issue block related command to stop block replication.
>> +Primary:
>> +  Remove the nbd child from the quorum:
>> +  { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
>> +  { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}}
>> +  Note: there is no qmp command to remove the blockdev now
>
> Don't we have x-blockdev-del?
>

Yes, we can use this command, I'll fix it in next version.

>> +
>> +Secondary:
>> +  The primary host is down, so we should do the following thing:
>> +  { 'execute': 'nbd-server-stop' }
>> +
>> +== TODO ==
>> +1. Support continuously VM replication.
>
> s/continuously/continuous/
>
>> +2. Support shared storage.
>> +3. Develop the heartbeat part.
>> +4. Reduce checkpoint VM’s downtime while do checkpoint.
>
> s/do/doing/
>
>>

All the above typos and grammatical mistake  will be fixed in next version, thanks!

Hailiang
>
diff mbox

Patch

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
new file mode 100644
index 0000000..f1ba580
--- /dev/null
+++ b/docs/COLO-FT.txt
@@ -0,0 +1,190 @@ 
+COarse-grained LOck-stepping Virtual Machines for Non-stop Service
+----------------------------------------
+Copyright (c) 2016 Intel Corporation
+Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+This document gives an overview of COLO's design and how to use it.
+
+== Background ==
+Virtual machine (VM) replication is a well known technique for providing
+application-agnostic software-implemented hardware fault tolerance
+"non-stop service".
+
+COLO (COarse-grained LOck-stepping) is a high availability solution.
+Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the
+same request from client, and generate response in parallel too.
+If the response packets from PVM and SVM are identical, they are released
+immediately. Otherwise, a VM checkpoint (on demand) is conducted.
+
+== Architecture ==
+
+The architecture of COLO is shown in the bellow diagram.
+It consists of a pair of networked physical nodes:
+The primary node running the PVM, and the secondary node running the SVM
+to maintain a valid replica of the PVM.
+PVM and SVM execute in parallel and generate output of response packets for
+client requests according to the application semantics.
+
+The incoming packets from the client or external network are received by the
+primary node, and then forwarded to the secondary node, so that Both the PVM
+and the SVM are stimulated with the same requests.
+
+COLO receives the outbound packets from both the PVM and SVM and compares them
+before allowing the output to be sent to clients.
+
+The SVM is qualified as a valid replica of the PVM, as long as it generates
+identical responses to all client requests. Once the differences in the outputs
+are detected between the PVM and SVM, COLO withholds transmission of the
+outbound packets until it has successfully synchronized the PVM state to the SVM.
+
+   Primary Node                                                            Secondary Node
+ +------------+  +-----------------------+       +------------------------+  +------------+
+ |            |  |       HeartBeat       |<----->|       HeartBeat        |  |            |
+ | Primary VM |  +-----------|-----------+       +-----------|------------+  |Secondary VM|
+ |            |              |                               |               |            |
+ |            |  +-----------|-----------+       +-----------|------------+  |            |
+ |            |  |QEMU   +---v----+      |       |QEMU  +----v---+        |  |            |
+ |            |  |       |Failover|      |       |      |Failover|        |  |            |
+ |            |  |       +--------+      |       |      +--------+        |  |            |
+ |            |  |   +---------------+   |       |   +---------------+    |  |            |
+ |            |  |   | VM Checkpoint |-------------->| VM Checkpoint |    |  |            |
+ |            |  |   +---------------+   |       |   +---------------+    |  |            |
+ |            |  |                       |       |                        |  |            |
+ |Requests<---------------------------^------------------------------------------>Requests|
+ |Responses----------------------\ /--|--------------\  /------------------------Responses|
+ |            |  |               | |  |  |       |   |  |                 |  |            |
+ |            |  | +-----------+ | |  |  |       |   |  |  +------------+ |  |            |
+ |            |  | | COLO disk | | |  |  |       |   |  |  | COLO disk  | |  |            |
+ |            |  | |   Manager |-|-|--|--------------|--|->| Manager    | |  |            |
+ |            |  | +|----------+ | |  |  |       |   |  |  +-----------|+ |  |            |
+ |            |  |  |            | |  |  |       |   |  |              |  |  |            |
+ +------------+  +--|------------|-|--|--+       +---|--|--------------|--+  +------------+
+                    |            | |  |              |  |              |
+ +-------------+    | +----------v-v--|--+       +---|--v-----------+  |    +-------------+
+ |  VM Monitor |    | |  COLO Proxy      |       |    COLO Proxy    |  |    | VM Monitor  |
+ |             |    | |(compare packet)  |       | (adjust sequence)|  |    |             |
+ +-------------+    | +----------|----^--+       +------------------+  |    +-------------+
+                    |            |    |                                |
+ +------------------|------------|----|--+       +---------------------|------------------+
+ |   Kernel         |            |    |  |       |   Kernel            |                  |
+ +------------------|------------|----|--+       +---------------------|------------------+
+                    |            |    |                                |
+     +--------------v+  +--------v----|--+       +------------------+ +v-------------+
+     |   Storage     |  |External Network|       | External Network | |   Storage    |
+     +---------------+  +----------------+       +------------------+ +--------------+
+
+== Components introduction ==
+
+You can see there are several components in COLO's diagram of architecture.
+Their functions are described as bellow.
+
+HeartBeat:
+Runs on both the primary and secondary nodes, to periodically check platform
+availability. When the primary node suffers a hardware fail-stop failure,
+the heartbeat stops responding, the secondary node will trigger a failover
+as soon as it determines the absence.
+
+COLO disk Manager:
+When primary VM writes data into image, the colo disk manger captures this data
+and send it to secondary VM’s which makes sure the context of secondary VM's
+image is consentient with the context of primary VM 's image.
+For more details, please refer to docs/block-replication.txt.
+
+Checkpoint/Failover Controller:
+Modifications of save/restore flow to realize continuous migration,
+to make sure the state of VM in Secondary side always be consistent with VM in
+Primary side.
+
+COLO Proxy:
+Delivers packets to Primary and Seconday, and then compare the responses from
+both side. Then decide whether to start a checkpoint according to some rules.
+
+Note:
+ a. HeartBeat is not been realized, so you need to trigger failover process
+    by using 'x-colo-lost-heartbeat' command.
+ b. COLO proxy compents is work-in-process, it only support periodic checkpoint
+    mode now, just as Micro-checkpointing.
+
+== Test procedure ==
+1. Startup qemu
+Primary:
+# qemu-kvm -enable-kvm -m 2048 -smp 2 -qmp stdio -vnc :7 -name primary \
+  -device piix3-usb-uhci \
+  -device usb-tablet -netdev tap,id=hn0,vhost=off \
+  -device virtio-net-pci,id=net-pci0,netdev=hn0 \
+  -drive if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
+         children.0.file.filename=1.raw,\
+         children.0.driver=raw -S
+Secondary:
+# qemu-kvm -enable-kvm -m 2048 -smp 2 -qmp stdio -vnc :7 -name secondary \
+  -device piix3-usb-uhci \
+  -device usb-tablet -netdev tap,id=hn0,vhost=off \
+  -device virtio-net-pci,id=net-pci0,netdev=hn0 \
+  -drive if=none,id=colo-disk0,file.filename=1.raw,driver=raw,node-name=node0 \
+  -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
+         file.driver=qcow2,top-id=active-disk0,\
+         file.file.filename=/mnt/ramfs/active_disk.img,\
+         file.backing.driver=qcow2,\
+         file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\
+         file.backing.backing=colo-disk0 \
+  -incoming tcp:0:8888
+
+2. On Secondary VM's QEMU monitor, issue command
+{'execute':'qmp_capabilities'}
+{ 'execute': 'nbd-server-start',
+  'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': '8889'} } }
+}
+{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': true } }
+
+Note:
+  a. The qmp command nbd-server-start and nbd-server-add must be run
+     before running the qmp command migrate on primary QEMU
+  b. Active disk, hidden disk and nbd target's length should be the
+     same.
+  c. It is better to put active disk and hidden disk in ramdisk.
+
+3. On Primary VM's QEMU monitor, issue command:
+{'execute':'qmp_capabilities'}
+{ 'execute': 'human-monitor-command',
+  'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=colo-disk0,node-name=node0'}}
+{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'node0' } }
+{ 'execute': 'migrate-set-capabilities',
+      'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
+{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' } }
+
+  Note:
+  a. There should be only one NBD Client for each primary disk.
+  b. xx.xx.xx.xx is the secondary physical machine's hostname or IP
+  c. The qmp command line must be run after running qmp command line in
+     secondary qemu.
+
+4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
+You can by issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
+to change the checkpoint period time
+
+5. Failover test
+You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
+monitor at the same time, then SVM will failover and client will not detect this
+change.
+
+Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
+issue block related command to stop block replication.
+Primary:
+  Remove the nbd child from the quorum:
+  { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
+  { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}}
+  Note: there is no qmp command to remove the blockdev now
+
+Secondary:
+  The primary host is down, so we should do the following thing:
+  { 'execute': 'nbd-server-stop' }
+
+== TODO ==
+1. Support continuously VM replication.
+2. Support shared storage.
+3. Develop the heartbeat part.
+4. Reduce checkpoint VM’s downtime while do checkpoint.