diff mbox

[RFC,1/7] docs/block-replication: Add description for shared-disk case

Message ID 1476971860-20860-2-git-send-email-zhang.zhanghailiang@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhanghailiang Oct. 20, 2016, 1:57 p.m. UTC
Introuduce the scenario of shared-disk block replication
and how to use it.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
---
 docs/block-replication.txt | 131 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 127 insertions(+), 4 deletions(-)

Comments

Changlong Xie Oct. 25, 2016, 9:03 a.m. UTC | #1
On 10/20/2016 09:57 PM, zhanghailiang wrote:
> Introuduce the scenario of shared-disk block replication
> and how to use it.
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
> ---
>   docs/block-replication.txt | 131 +++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 127 insertions(+), 4 deletions(-)
>
> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
> index 6bde673..97fcfc1 100644
> --- a/docs/block-replication.txt
> +++ b/docs/block-replication.txt
> @@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation
>   effort during a vmstate checkpoint, the disk modification operations of
>   the Primary disk are asynchronously forwarded to the Secondary node.
>
> -== Workflow ==
> +== Non-shared disk workflow ==
>   The following is the image of block replication workflow:
>
>           +----------------------+            +------------------------+
> @@ -57,7 +57,7 @@ The following is the image of block replication workflow:
>       4) Secondary write requests will be buffered in the Disk buffer and it
>          will overwrite the existing sector content in the buffer.
>
> -== Architecture ==
> +== None-shared disk architecture ==

s/None-shared/Non-shared/g

>   We are going to implement block replication from many basic
>   blocks that are already in QEMU.
>
> @@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through
>   of the NBD server into the secondary disk. So before block replication,
>   the primary disk and secondary disk should contain the same data.
>
> +== Shared Disk Mode Workflow ==
> +The following is the image of block replication workflow:
> +
> +        +----------------------+            +------------------------+
> +        |Primary Write Requests|            |Secondary Write Requests|
> +        +----------------------+            +------------------------+
> +                  |                                       |
> +                  |                                      (4)
> +                  |                                       V
> +                  |                              /-------------\
> +                  | (2)Forward and write through |             |
> +                  | +--------------------------> | Disk Buffer |
> +                  | |                            |             |
> +                  | |                            \-------------/
> +                  | |(1)read                           |
> +                  | |                                  |
> +       (3)write   | |                                  | backing file
> +                  V |                                  |
> +                 +-----------------------------+       |
> +                 | Shared Disk                 | <-----+
> +                 +-----------------------------+
> +
> +    1) Primary writes will read original data and forward it to Secondary
> +       QEMU.
> +    2) Before Primary write requests are written to Shared disk, the
> +       original sector content will be read from Shared disk and
> +       forwarded and buffered in the Disk buffer on the secondary site,
> +       but it will not overwrite the existing

extra spaces at the end of line

> +       sector content(it could be from either "Secondary Write Requests" or

Need a space before "(" for better style.

> +       previous COW of "Primary Write Requests") in the Disk buffer.
> +    3) Primary write requests will be written to Shared disk.
> +    4) Secondary write requests will be buffered in the Disk buffer and it
> +       will overwrite the existing sector content in the buffer.
> +
> +== Shared Disk Mode Architecture ==
> +We are going to implement block replication from many basic
> +blocks that are already in QEMU.
> +         virtio-blk                     ||                               .----------
> +             /                          ||                               | Secondary
> +            /                           ||                               '----------
> +           /                            ||                                 virtio-blk
> +          /                             ||                                      |
> +          |                             ||                               replication(5)
> +          |                    NBD  -------->   NBD   (2)                       |
> +          |                  client     ||    server ---> hidden disk <-- active disk(4)
> +          |                     ^       ||                      |
> +          |              replication(1) ||                      |
> +          |                     |       ||                      |
> +          |   +-----------------'       ||                      |
> +         (3)  |drive-backup sync=none   ||                      |
> +--------. |   +-----------------+       ||                      |
> +Primary | |                     |       ||           backing    |
> +--------' |                     |       ||                      |
> +          V                     |                               |
> +       +-------------------------------------------+            |
> +       |               shared disk                 | <----------+
> +       +-------------------------------------------+
> +
> +
> +    1) Primary writes will read original data and forward it to Secondary
> +       QEMU.
> +    2) The hidden-disk buffers the original content that is modified by the
> +       primary VM. It should also be an empty disk, and

extra spaces at end of line

> +       the driver supports bdrv_make_empty() and backing file.
> +    3) Primary write requests will be written to Shared disk.
> +    4) Secondary write requests will be buffered in the active disk and it
> +       will overwrite the existing sector content in the buffer.
> +
>   == Failure Handling ==
>   There are 7 internal errors when block replication is running:
>   1. I/O error on primary disk
> @@ -145,7 +213,7 @@ d. replication_stop_all()
>      things except failover. The caller must hold the I/O mutex lock if it is
>      in migration/checkpoint thread.
>
> -== Usage ==
> +== Non-shared disk usage ==
>   Primary:
>     -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
>            children.0.file.filename=1.raw,\
> @@ -234,6 +302,61 @@ Secondary:
>     The primary host is down, so we should do the following thing:
>     { 'execute': 'nbd-server-stop' }
>
> +== Shared disk usage ==

Keep the some coding style with "== Non-shared disk usage ==" part is 
good to me.

> +Primary:
> + -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
> +
> +Issue qmp command:
> + {'execute': 'human-monitor-command',

two space indentation for the whole "{...}" part

> +    'arguments': {
> +        'command-line': 'drive_add-nbuddydriver=replication,

missing spaces

> +        mode=primary,
> +        file.driver=nbd,
> +        file.host=9.42.3.17,
> +        file.port=9998,
> +        file.export=hidden_disk0,
> +        shared-disk-id=primary_disk0,
> +        shared-disk=on,
> +        node-name=rep'

Keep the whole commands after "command-line" in one line, or you can 
execute it correctly. IIRC

> +    }
> + }

Secondary:

> + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
> +        backing.driver=raw,backing.file.filename=1.raw \
> + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
> +        file.driver=qcow2,top-id=active-disk0,\
> +        file.file.filename=/mnt/ramfs/active_disk.img,\
> +        file.backing=hidden_disk0,shared-disk=on
> +
> +Issue qmp command:
> +1. {'execute': 'nbd-server-start',
> +    'arguments': {
> +        'addr': {
> +            'type': 'inet',
> +            'data': {
> +                'host': '0',

s/0/9.42.3.17/g, since you use designated ip address above

> +                'port': '9998'
> +            }
> +        }
> +    }
> +   }
> +2. {
> +    'execute': 'nbd-server-add',
> +    'arguments': {
> +        'device': 'hidden_disk0',
> +        'writable': true
> +    }
> +  }
> +
> +After Failover:
> +Primary:
> +{'execute': 'human-monitor-command',
> +    'arguments': {
> +        'command-line': 'drive_delrep'

drive_del rep

> +    }
> +}
> +
> +Secondary:
> +  {'execute': 'nbd-server-stop' }
> +
>   TODO:
>   1. Continuous block replication
> -2. Shared disk
>
Zhanghailiang Nov. 28, 2016, 5:13 a.m. UTC | #2
On 2016/10/25 17:03, Changlong Xie wrote:
> On 10/20/2016 09:57 PM, zhanghailiang wrote:
>> Introuduce the scenario of shared-disk block replication
>> and how to use it.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
>> ---
>>    docs/block-replication.txt | 131 +++++++++++++++++++++++++++++++++++++++++++--
>>    1 file changed, 127 insertions(+), 4 deletions(-)
>>
>> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
>> index 6bde673..97fcfc1 100644
>> --- a/docs/block-replication.txt
>> +++ b/docs/block-replication.txt
>> @@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation
>>    effort during a vmstate checkpoint, the disk modification operations of
>>    the Primary disk are asynchronously forwarded to the Secondary node.
>>
>> -== Workflow ==
>> +== Non-shared disk workflow ==
>>    The following is the image of block replication workflow:
>>
>>            +----------------------+            +------------------------+
>> @@ -57,7 +57,7 @@ The following is the image of block replication workflow:
>>        4) Secondary write requests will be buffered in the Disk buffer and it
>>           will overwrite the existing sector content in the buffer.
>>
>> -== Architecture ==
>> +== None-shared disk architecture ==
>
> s/None-shared/Non-shared/g
>

>>    We are going to implement block replication from many basic
>>    blocks that are already in QEMU.
>>
>> @@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through
>>    of the NBD server into the secondary disk. So before block replication,
>>    the primary disk and secondary disk should contain the same data.
>>
>> +== Shared Disk Mode Workflow ==
>> +The following is the image of block replication workflow:
>> +
>> +        +----------------------+            +------------------------+
>> +        |Primary Write Requests|            |Secondary Write Requests|
>> +        +----------------------+            +------------------------+
>> +                  |                                       |
>> +                  |                                      (4)
>> +                  |                                       V
>> +                  |                              /-------------\
>> +                  | (2)Forward and write through |             |
>> +                  | +--------------------------> | Disk Buffer |
>> +                  | |                            |             |
>> +                  | |                            \-------------/
>> +                  | |(1)read                           |
>> +                  | |                                  |
>> +       (3)write   | |                                  | backing file
>> +                  V |                                  |
>> +                 +-----------------------------+       |
>> +                 | Shared Disk                 | <-----+
>> +                 +-----------------------------+
>> +
>> +    1) Primary writes will read original data and forward it to Secondary
>> +       QEMU.
>> +    2) Before Primary write requests are written to Shared disk, the
>> +       original sector content will be read from Shared disk and
>> +       forwarded and buffered in the Disk buffer on the secondary site,
>> +       but it will not overwrite the existing
>
> extra spaces at the end of line
>

>> +       sector content(it could be from either "Secondary Write Requests" or
>
> Need a space before "(" for better style.
>

>> +       previous COW of "Primary Write Requests") in the Disk buffer.
>> +    3) Primary write requests will be written to Shared disk.
>> +    4) Secondary write requests will be buffered in the Disk buffer and it
>> +       will overwrite the existing sector content in the buffer.
>> +
>> +== Shared Disk Mode Architecture ==
>> +We are going to implement block replication from many basic
>> +blocks that are already in QEMU.
>> +         virtio-blk                     ||                               .----------
>> +             /                          ||                               | Secondary
>> +            /                           ||                               '----------
>> +           /                            ||                                 virtio-blk
>> +          /                             ||                                      |
>> +          |                             ||                               replication(5)
>> +          |                    NBD  -------->   NBD   (2)                       |
>> +          |                  client     ||    server ---> hidden disk <-- active disk(4)
>> +          |                     ^       ||                      |
>> +          |              replication(1) ||                      |
>> +          |                     |       ||                      |
>> +          |   +-----------------'       ||                      |
>> +         (3)  |drive-backup sync=none   ||                      |
>> +--------. |   +-----------------+       ||                      |
>> +Primary | |                     |       ||           backing    |
>> +--------' |                     |       ||                      |
>> +          V                     |                               |
>> +       +-------------------------------------------+            |
>> +       |               shared disk                 | <----------+
>> +       +-------------------------------------------+
>> +
>> +
>> +    1) Primary writes will read original data and forward it to Secondary
>> +       QEMU.
>> +    2) The hidden-disk buffers the original content that is modified by the
>> +       primary VM. It should also be an empty disk, and
>
> extra spaces at end of line
>

>> +       the driver supports bdrv_make_empty() and backing file.
>> +    3) Primary write requests will be written to Shared disk.
>> +    4) Secondary write requests will be buffered in the active disk and it
>> +       will overwrite the existing sector content in the buffer.
>> +
>>    == Failure Handling ==
>>    There are 7 internal errors when block replication is running:
>>    1. I/O error on primary disk
>> @@ -145,7 +213,7 @@ d. replication_stop_all()
>>       things except failover. The caller must hold the I/O mutex lock if it is
>>       in migration/checkpoint thread.
>>
>> -== Usage ==
>> +== Non-shared disk usage ==
>>    Primary:
>>      -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
>>             children.0.file.filename=1.raw,\
>> @@ -234,6 +302,61 @@ Secondary:
>>      The primary host is down, so we should do the following thing:
>>      { 'execute': 'nbd-server-stop' }
>>
>> +== Shared disk usage ==
>
> Keep the some coding style with "== Non-shared disk usage ==" part is
> good to me.
>

>> +Primary:
>> + -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
>> +
>> +Issue qmp command:
>> + {'execute': 'human-monitor-command',
>
> two space indentation for the whole "{...}" part
>
>> +    'arguments': {
>> +        'command-line': 'drive_add-nbuddydriver=replication,
>
> missing spaces
>
>> +        mode=primary,
>> +        file.driver=nbd,
>> +        file.host=9.42.3.17,
>> +        file.port=9998,
>> +        file.export=hidden_disk0,
>> +        shared-disk-id=primary_disk0,
>> +        shared-disk=on,
>> +        node-name=rep'
>

> Keep the whole commands after "command-line" in one line, or you can
> execute it correctly. IIRC
>

Hmm, i will change this hmp command to qmp 'blockdev-add' command in next
version, because it is supported now, though it is ready for production.

>> +    }
>> + }
>
> Secondary:
>
>> + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
>> +        backing.driver=raw,backing.file.filename=1.raw \
>> + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
>> +        file.driver=qcow2,top-id=active-disk0,\
>> +        file.file.filename=/mnt/ramfs/active_disk.img,\
>> +        file.backing=hidden_disk0,shared-disk=on
>> +
>> +Issue qmp command:
>> +1. {'execute': 'nbd-server-start',
>> +    'arguments': {
>> +        'addr': {
>> +            'type': 'inet',
>> +            'data': {
>> +                'host': '0',
>
> s/0/9.42.3.17/g, since you use designated ip address above
>

>> +                'port': '9998'
>> +            }
>> +        }
>> +    }
>> +   }
>> +2. {
>> +    'execute': 'nbd-server-add',
>> +    'arguments': {
>> +        'device': 'hidden_disk0',
>> +        'writable': true
>> +    }
>> +  }
>> +
>> +After Failover:
>> +Primary:
>> +{'execute': 'human-monitor-command',
>> +    'arguments': {
>> +        'command-line': 'drive_delrep'
>
> drive_del rep
>

I'll use the qmp command instead here.

>> +    }
>> +}
>> +
>> +Secondary:
>> +  {'execute': 'nbd-server-stop' }
>> +
>>    TODO:
>>    1. Continuous block replication
>> -2. Shared disk
>>
>

I will fix all the above problems in next version, thanks.

>
>
> .
>
Zhanghailiang Nov. 28, 2016, 5:58 a.m. UTC | #3
On 2016/11/28 14:00, Changlong Xie wrote:
> On 11/28/2016 01:13 PM, Hailiang Zhang wrote:
>>
>> On 2016/10/25 17:03, Changlong Xie wrote:
>>> On 10/20/2016 09:57 PM, zhanghailiang wrote:
>>>> Introuduce the scenario of shared-disk block replication
>>>> and how to use it.
>>>>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
>>>> ---
>>>>     docs/block-replication.txt | 131
>>>> +++++++++++++++++++++++++++++++++++++++++++--
>>>>     1 file changed, 127 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
>>>> index 6bde673..97fcfc1 100644
>>>> --- a/docs/block-replication.txt
>>>> +++ b/docs/block-replication.txt
>>>> @@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the
>>>> network transportation
>>>>     effort during a vmstate checkpoint, the disk modification
>>>> operations of
>>>>     the Primary disk are asynchronously forwarded to the Secondary node.
>>>>
>>>> -== Workflow ==
>>>> +== Non-shared disk workflow ==
>>>>     The following is the image of block replication workflow:
>>>>
>>>>             +----------------------+
>>>> +------------------------+
>>>> @@ -57,7 +57,7 @@ The following is the image of block replication
>>>> workflow:
>>>>         4) Secondary write requests will be buffered in the Disk
>>>> buffer and it
>>>>            will overwrite the existing sector content in the buffer.
>>>>
>>>> -== Architecture ==
>>>> +== None-shared disk architecture ==
>>>
>>> s/None-shared/Non-shared/g
>>>
>>
>>>>     We are going to implement block replication from many basic
>>>>     blocks that are already in QEMU.
>>>>
>>>> @@ -106,6 +106,74 @@ any state that would otherwise be lost by the
>>>> speculative write-through
>>>>     of the NBD server into the secondary disk. So before block
>>>> replication,
>>>>     the primary disk and secondary disk should contain the same data.
>>>>
>>>> +== Shared Disk Mode Workflow ==
>>>> +The following is the image of block replication workflow:
>>>> +
>>>> +        +----------------------+            +------------------------+
>>>> +        |Primary Write Requests|            |Secondary Write Requests|
>>>> +        +----------------------+            +------------------------+
>>>> +                  |                                       |
>>>> +                  |                                      (4)
>>>> +                  |                                       V
>>>> +                  |                              /-------------\
>>>> +                  | (2)Forward and write through |             |
>>>> +                  | +--------------------------> | Disk Buffer |
>>>> +                  | |                            |             |
>>>> +                  | |                            \-------------/
>>>> +                  | |(1)read                           |
>>>> +                  | |                                  |
>>>> +       (3)write   | |                                  | backing file
>>>> +                  V |                                  |
>>>> +                 +-----------------------------+       |
>>>> +                 | Shared Disk                 | <-----+
>>>> +                 +-----------------------------+
>>>> +
>>>> +    1) Primary writes will read original data and forward it to
>>>> Secondary
>>>> +       QEMU.
>>>> +    2) Before Primary write requests are written to Shared disk, the
>>>> +       original sector content will be read from Shared disk and
>>>> +       forwarded and buffered in the Disk buffer on the secondary site,
>>>> +       but it will not overwrite the existing
>>>
>>> extra spaces at the end of line
>>>
>>
>>>> +       sector content(it could be from either "Secondary Write
>>>> Requests" or
>>>
>>> Need a space before "(" for better style.
>>>
>>
>>>> +       previous COW of "Primary Write Requests") in the Disk buffer.
>>>> +    3) Primary write requests will be written to Shared disk.
>>>> +    4) Secondary write requests will be buffered in the Disk buffer
>>>> and it
>>>> +       will overwrite the existing sector content in the buffer.
>>>> +
>>>> +== Shared Disk Mode Architecture ==
>>>> +We are going to implement block replication from many basic
>>>> +blocks that are already in QEMU.
>>>> +         virtio-blk
>>>> ||                               .----------
>>>> +             /
>>>> ||                               | Secondary
>>>> +            /
>>>> ||                               '----------
>>>> +           /
>>>> ||                                 virtio-blk
>>>> +          /
>>>> ||                                      |
>>>> +          |
>>>> ||                               replication(5)
>>>> +          |                    NBD  -------->   NBD
>>>> (2)                       |
>>>> +          |                  client     ||    server ---> hidden
>>>> disk <-- active disk(4)
>>>> +          |                     ^       ||                      |
>>>> +          |              replication(1) ||                      |
>>>> +          |                     |       ||                      |
>>>> +          |   +-----------------'       ||                      |
>>>> +         (3)  |drive-backup sync=none   ||                      |
>>>> +--------. |   +-----------------+       ||                      |
>>>> +Primary | |                     |       ||           backing    |
>>>> +--------' |                     |       ||                      |
>>>> +          V                     |                               |
>>>> +       +-------------------------------------------+            |
>>>> +       |               shared disk                 | <----------+
>>>> +       +-------------------------------------------+
>>>> +
>>>> +
>>>> +    1) Primary writes will read original data and forward it to
>>>> Secondary
>>>> +       QEMU.
>>>> +    2) The hidden-disk buffers the original content that is modified
>>>> by the
>>>> +       primary VM. It should also be an empty disk, and
>>>
>>> extra spaces at end of line
>>>
>>
>>>> +       the driver supports bdrv_make_empty() and backing file.
>>>> +    3) Primary write requests will be written to Shared disk.
>>>> +    4) Secondary write requests will be buffered in the active disk
>>>> and it
>>>> +       will overwrite the existing sector content in the buffer.
>>>> +
>>>>     == Failure Handling ==
>>>>     There are 7 internal errors when block replication is running:
>>>>     1. I/O error on primary disk
>>>> @@ -145,7 +213,7 @@ d. replication_stop_all()
>>>>        things except failover. The caller must hold the I/O mutex lock
>>>> if it is
>>>>        in migration/checkpoint thread.
>>>>
>>>> -== Usage ==
>>>> +== Non-shared disk usage ==
>>>>     Primary:
>>>>       -drive
>>>> if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
>>>>              children.0.file.filename=1.raw,\
>>>> @@ -234,6 +302,61 @@ Secondary:
>>>>       The primary host is down, so we should do the following thing:
>>>>       { 'execute': 'nbd-server-stop' }
>>>>
>>>> +== Shared disk usage ==
>>>
>>> Keep the some coding style with "== Non-shared disk usage ==" part is
>>> good to me.
>>>
>>
>>>> +Primary:
>>>> + -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
>>>> +
>>>> +Issue qmp command:
>>>> + {'execute': 'human-monitor-command',
>>>
>>> two space indentation for the whole "{...}" part
>>>
>>>> +    'arguments': {
>>>> +        'command-line': 'drive_add-nbuddydriver=replication,
>>>
>>> missing spaces
>>>
>>>> +        mode=primary,
>>>> +        file.driver=nbd,
>>>> +        file.host=9.42.3.17,
>>>> +        file.port=9998,
>>>> +        file.export=hidden_disk0,
>>>> +        shared-disk-id=primary_disk0,
>>>> +        shared-disk=on,
>>>> +        node-name=rep'
>>>
>>
>>> Keep the whole commands after "command-line" in one line, or you can
>>> execute it correctly. IIRC
>>>
>>
>> Hmm, i will change this hmp command to qmp 'blockdev-add' command in next
>> version, because it is supported now, though it is ready for production.
>>
>
> It's a good start, but i'm not sure here.
>
> http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg01062.html
>

Yes, i noticed that, but for COLO, it is not ready for production either.
So I think it is OK to use it here ...

> Thanks
> 	-Xie
>>>> +    }
>>>> + }
>>>
>>> Secondary:
>>>
>>>> + -drive
>>>> if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
>>>>
>>>> +        backing.driver=raw,backing.file.filename=1.raw \
>>>> + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
>>>> +        file.driver=qcow2,top-id=active-disk0,\
>>>> +        file.file.filename=/mnt/ramfs/active_disk.img,\
>>>> +        file.backing=hidden_disk0,shared-disk=on
>>>> +
>>>> +Issue qmp command:
>>>> +1. {'execute': 'nbd-server-start',
>>>> +    'arguments': {
>>>> +        'addr': {
>>>> +            'type': 'inet',
>>>> +            'data': {
>>>> +                'host': '0',
>>>
>>> s/0/9.42.3.17/g, since you use designated ip address above
>>>
>>
>>>> +                'port': '9998'
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +   }
>>>> +2. {
>>>> +    'execute': 'nbd-server-add',
>>>> +    'arguments': {
>>>> +        'device': 'hidden_disk0',
>>>> +        'writable': true
>>>> +    }
>>>> +  }
>>>> +
>>>> +After Failover:
>>>> +Primary:
>>>> +{'execute': 'human-monitor-command',
>>>> +    'arguments': {
>>>> +        'command-line': 'drive_delrep'
>>>
>>> drive_del rep
>>>
>>
>> I'll use the qmp command instead here.
>>
>>>> +    }
>>>> +}
>>>> +
>>>> +Secondary:
>>>> +  {'execute': 'nbd-server-stop' }
>>>> +
>>>>     TODO:
>>>>     1. Continuous block replication
>>>> -2. Shared disk
>>>>
>>>
>>
>> I will fix all the above problems in next version, thanks.
>>
>>>
>>>
>>> .
>>>
>>
>>
>>
>> .
>>
>
>
>
> .
>
Changlong Xie Nov. 28, 2016, 6 a.m. UTC | #4
On 11/28/2016 01:13 PM, Hailiang Zhang wrote:
>
> On 2016/10/25 17:03, Changlong Xie wrote:
>> On 10/20/2016 09:57 PM, zhanghailiang wrote:
>>> Introuduce the scenario of shared-disk block replication
>>> and how to use it.
>>>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
>>> ---
>>>    docs/block-replication.txt | 131
>>> +++++++++++++++++++++++++++++++++++++++++++--
>>>    1 file changed, 127 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
>>> index 6bde673..97fcfc1 100644
>>> --- a/docs/block-replication.txt
>>> +++ b/docs/block-replication.txt
>>> @@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the
>>> network transportation
>>>    effort during a vmstate checkpoint, the disk modification
>>> operations of
>>>    the Primary disk are asynchronously forwarded to the Secondary node.
>>>
>>> -== Workflow ==
>>> +== Non-shared disk workflow ==
>>>    The following is the image of block replication workflow:
>>>
>>>            +----------------------+
>>> +------------------------+
>>> @@ -57,7 +57,7 @@ The following is the image of block replication
>>> workflow:
>>>        4) Secondary write requests will be buffered in the Disk
>>> buffer and it
>>>           will overwrite the existing sector content in the buffer.
>>>
>>> -== Architecture ==
>>> +== None-shared disk architecture ==
>>
>> s/None-shared/Non-shared/g
>>
>
>>>    We are going to implement block replication from many basic
>>>    blocks that are already in QEMU.
>>>
>>> @@ -106,6 +106,74 @@ any state that would otherwise be lost by the
>>> speculative write-through
>>>    of the NBD server into the secondary disk. So before block
>>> replication,
>>>    the primary disk and secondary disk should contain the same data.
>>>
>>> +== Shared Disk Mode Workflow ==
>>> +The following is the image of block replication workflow:
>>> +
>>> +        +----------------------+            +------------------------+
>>> +        |Primary Write Requests|            |Secondary Write Requests|
>>> +        +----------------------+            +------------------------+
>>> +                  |                                       |
>>> +                  |                                      (4)
>>> +                  |                                       V
>>> +                  |                              /-------------\
>>> +                  | (2)Forward and write through |             |
>>> +                  | +--------------------------> | Disk Buffer |
>>> +                  | |                            |             |
>>> +                  | |                            \-------------/
>>> +                  | |(1)read                           |
>>> +                  | |                                  |
>>> +       (3)write   | |                                  | backing file
>>> +                  V |                                  |
>>> +                 +-----------------------------+       |
>>> +                 | Shared Disk                 | <-----+
>>> +                 +-----------------------------+
>>> +
>>> +    1) Primary writes will read original data and forward it to
>>> Secondary
>>> +       QEMU.
>>> +    2) Before Primary write requests are written to Shared disk, the
>>> +       original sector content will be read from Shared disk and
>>> +       forwarded and buffered in the Disk buffer on the secondary site,
>>> +       but it will not overwrite the existing
>>
>> extra spaces at the end of line
>>
>
>>> +       sector content(it could be from either "Secondary Write
>>> Requests" or
>>
>> Need a space before "(" for better style.
>>
>
>>> +       previous COW of "Primary Write Requests") in the Disk buffer.
>>> +    3) Primary write requests will be written to Shared disk.
>>> +    4) Secondary write requests will be buffered in the Disk buffer
>>> and it
>>> +       will overwrite the existing sector content in the buffer.
>>> +
>>> +== Shared Disk Mode Architecture ==
>>> +We are going to implement block replication from many basic
>>> +blocks that are already in QEMU.
>>> +         virtio-blk
>>> ||                               .----------
>>> +             /
>>> ||                               | Secondary
>>> +            /
>>> ||                               '----------
>>> +           /
>>> ||                                 virtio-blk
>>> +          /
>>> ||                                      |
>>> +          |
>>> ||                               replication(5)
>>> +          |                    NBD  -------->   NBD
>>> (2)                       |
>>> +          |                  client     ||    server ---> hidden
>>> disk <-- active disk(4)
>>> +          |                     ^       ||                      |
>>> +          |              replication(1) ||                      |
>>> +          |                     |       ||                      |
>>> +          |   +-----------------'       ||                      |
>>> +         (3)  |drive-backup sync=none   ||                      |
>>> +--------. |   +-----------------+       ||                      |
>>> +Primary | |                     |       ||           backing    |
>>> +--------' |                     |       ||                      |
>>> +          V                     |                               |
>>> +       +-------------------------------------------+            |
>>> +       |               shared disk                 | <----------+
>>> +       +-------------------------------------------+
>>> +
>>> +
>>> +    1) Primary writes will read original data and forward it to
>>> Secondary
>>> +       QEMU.
>>> +    2) The hidden-disk buffers the original content that is modified
>>> by the
>>> +       primary VM. It should also be an empty disk, and
>>
>> extra spaces at end of line
>>
>
>>> +       the driver supports bdrv_make_empty() and backing file.
>>> +    3) Primary write requests will be written to Shared disk.
>>> +    4) Secondary write requests will be buffered in the active disk
>>> and it
>>> +       will overwrite the existing sector content in the buffer.
>>> +
>>>    == Failure Handling ==
>>>    There are 7 internal errors when block replication is running:
>>>    1. I/O error on primary disk
>>> @@ -145,7 +213,7 @@ d. replication_stop_all()
>>>       things except failover. The caller must hold the I/O mutex lock
>>> if it is
>>>       in migration/checkpoint thread.
>>>
>>> -== Usage ==
>>> +== Non-shared disk usage ==
>>>    Primary:
>>>      -drive
>>> if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
>>>             children.0.file.filename=1.raw,\
>>> @@ -234,6 +302,61 @@ Secondary:
>>>      The primary host is down, so we should do the following thing:
>>>      { 'execute': 'nbd-server-stop' }
>>>
>>> +== Shared disk usage ==
>>
>> Keep the some coding style with "== Non-shared disk usage ==" part is
>> good to me.
>>
>
>>> +Primary:
>>> + -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
>>> +
>>> +Issue qmp command:
>>> + {'execute': 'human-monitor-command',
>>
>> two space indentation for the whole "{...}" part
>>
>>> +    'arguments': {
>>> +        'command-line': 'drive_add-nbuddydriver=replication,
>>
>> missing spaces
>>
>>> +        mode=primary,
>>> +        file.driver=nbd,
>>> +        file.host=9.42.3.17,
>>> +        file.port=9998,
>>> +        file.export=hidden_disk0,
>>> +        shared-disk-id=primary_disk0,
>>> +        shared-disk=on,
>>> +        node-name=rep'
>>
>
>> Keep the whole commands after "command-line" in one line, or you can
>> execute it correctly. IIRC
>>
>
> Hmm, i will change this hmp command to qmp 'blockdev-add' command in next
> version, because it is supported now, though it is ready for production.
>

It's a good start, but i'm not sure here.

http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg01062.html

Thanks
	-Xie
>>> +    }
>>> + }
>>
>> Secondary:
>>
>>> + -drive
>>> if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
>>>
>>> +        backing.driver=raw,backing.file.filename=1.raw \
>>> + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
>>> +        file.driver=qcow2,top-id=active-disk0,\
>>> +        file.file.filename=/mnt/ramfs/active_disk.img,\
>>> +        file.backing=hidden_disk0,shared-disk=on
>>> +
>>> +Issue qmp command:
>>> +1. {'execute': 'nbd-server-start',
>>> +    'arguments': {
>>> +        'addr': {
>>> +            'type': 'inet',
>>> +            'data': {
>>> +                'host': '0',
>>
>> s/0/9.42.3.17/g, since you use designated ip address above
>>
>
>>> +                'port': '9998'
>>> +            }
>>> +        }
>>> +    }
>>> +   }
>>> +2. {
>>> +    'execute': 'nbd-server-add',
>>> +    'arguments': {
>>> +        'device': 'hidden_disk0',
>>> +        'writable': true
>>> +    }
>>> +  }
>>> +
>>> +After Failover:
>>> +Primary:
>>> +{'execute': 'human-monitor-command',
>>> +    'arguments': {
>>> +        'command-line': 'drive_delrep'
>>
>> drive_del rep
>>
>
> I'll use the qmp command instead here.
>
>>> +    }
>>> +}
>>> +
>>> +Secondary:
>>> +  {'execute': 'nbd-server-stop' }
>>> +
>>>    TODO:
>>>    1. Continuous block replication
>>> -2. Shared disk
>>>
>>
>
> I will fix all the above problems in next version, thanks.
>
>>
>>
>> .
>>
>
>
>
> .
>
diff mbox

Patch

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..97fcfc1 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@  only dropped at next checkpoint time. To reduce the network transportation
 effort during a vmstate checkpoint, the disk modification operations of
 the Primary disk are asynchronously forwarded to the Secondary node.
 
-== Workflow ==
+== Non-shared disk workflow ==
 The following is the image of block replication workflow:
 
         +----------------------+            +------------------------+
@@ -57,7 +57,7 @@  The following is the image of block replication workflow:
     4) Secondary write requests will be buffered in the Disk buffer and it
        will overwrite the existing sector content in the buffer.
 
-== Architecture ==
+== None-shared disk architecture ==
 We are going to implement block replication from many basic
 blocks that are already in QEMU.
 
@@ -106,6 +106,74 @@  any state that would otherwise be lost by the speculative write-through
 of the NBD server into the secondary disk. So before block replication,
 the primary disk and secondary disk should contain the same data.
 
+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  | (2)Forward and write through |             |
+                  | +--------------------------> | Disk Buffer |
+                  | |                            |             |
+                  | |                            \-------------/
+                  | |(1)read                           |
+                  | |                                  |
+       (3)write   | |                                  | backing file
+                  V |                                  |
+                 +-----------------------------+       |
+                 | Shared Disk                 | <-----+
+                 +-----------------------------+
+
+    1) Primary writes will read original data and forward it to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Shared disk, the
+       original sector content will be read from Shared disk and
+       forwarded and buffered in the Disk buffer on the secondary site,
+       but it will not overwrite the existing
+       sector content(it could be from either "Secondary Write Requests" or
+       previous COW of "Primary Write Requests") in the Disk buffer.
+    3) Primary write requests will be written to Shared disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+         virtio-blk                     ||                               .----------
+             /                          ||                               | Secondary
+            /                           ||                               '----------
+           /                            ||                                 virtio-blk
+          /                             ||                                      |
+          |                             ||                               replication(5)
+          |                    NBD  -------->   NBD   (2)                       |
+          |                  client     ||    server ---> hidden disk <-- active disk(4)
+          |                     ^       ||                      |
+          |              replication(1) ||                      |
+          |                     |       ||                      |
+          |   +-----------------'       ||                      |
+         (3)  |drive-backup sync=none   ||                      |
+--------. |   +-----------------+       ||                      |
+Primary | |                     |       ||           backing    |
+--------' |                     |       ||                      |
+          V                     |                               |
+       +-------------------------------------------+            |
+       |               shared disk                 | <----------+
+       +-------------------------------------------+
+
+
+    1) Primary writes will read original data and forward it to Secondary
+       QEMU.
+    2) The hidden-disk buffers the original content that is modified by the
+       primary VM. It should also be an empty disk, and
+       the driver supports bdrv_make_empty() and backing file.
+    3) Primary write requests will be written to Shared disk.
+    4) Secondary write requests will be buffered in the active disk and it
+       will overwrite the existing sector content in the buffer.
+
 == Failure Handling ==
 There are 7 internal errors when block replication is running:
 1. I/O error on primary disk
@@ -145,7 +213,7 @@  d. replication_stop_all()
    things except failover. The caller must hold the I/O mutex lock if it is
    in migration/checkpoint thread.
 
-== Usage ==
+== Non-shared disk usage ==
 Primary:
   -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
          children.0.file.filename=1.raw,\
@@ -234,6 +302,61 @@  Secondary:
   The primary host is down, so we should do the following thing:
   { 'execute': 'nbd-server-stop' }
 
+== Shared disk usage ==
+Primary:
+ -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
+
+Issue qmp command:
+ {'execute': 'human-monitor-command',
+    'arguments': {
+        'command-line': 'drive_add-nbuddydriver=replication,
+        mode=primary,
+        file.driver=nbd,
+        file.host=9.42.3.17,
+        file.port=9998,
+        file.export=hidden_disk0,
+        shared-disk-id=primary_disk0,
+        shared-disk=on,
+        node-name=rep'
+    }
+ }
+ -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
+        backing.driver=raw,backing.file.filename=1.raw \
+ -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
+        file.driver=qcow2,top-id=active-disk0,\
+        file.file.filename=/mnt/ramfs/active_disk.img,\
+        file.backing=hidden_disk0,shared-disk=on
+
+Issue qmp command:
+1. {'execute': 'nbd-server-start',
+    'arguments': {
+        'addr': {
+            'type': 'inet',
+            'data': {
+                'host': '0',
+                'port': '9998'
+            }
+        }
+    }
+   }
+2. {
+    'execute': 'nbd-server-add',
+    'arguments': {
+        'device': 'hidden_disk0',
+        'writable': true
+    }
+  }
+
+After Failover:
+Primary:
+{'execute': 'human-monitor-command',
+    'arguments': {
+        'command-line': 'drive_delrep'
+    }
+}
+
+Secondary:
+  {'execute': 'nbd-server-stop' }
+
 TODO:
 1. Continuous block replication
-2. Shared disk