diff mbox

[1/3] colo-compare: reconstruct the mutex lock usage

Message ID 1485266748-15340-2-git-send-email-zhang.zhanghailiang@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhanghailiang Jan. 24, 2017, 2:05 p.m. UTC
The original 'timer_check_lock' mutex lock of struct CompareState
is used to protect the 'conn_list' queue and its child queues which
are 'primary_list' and 'secondary_list', which is a little abused
and confusing

To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock'
which is used to protect 'conn_list' queue, use another 'conn_lock'
to protect 'primary_list' and 'secondary_list'.

Besides, fix some missing places which need these mutex lock.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/colo-compare.c | 33 +++++++++++++++++++++++----------
 net/colo.c         |  2 ++
 net/colo.h         |  2 ++
 3 files changed, 27 insertions(+), 10 deletions(-)

Comments

Jason Wang Feb. 3, 2017, 3:47 a.m. UTC | #1
On 2017年01月24日 22:05, zhanghailiang wrote:
> The original 'timer_check_lock' mutex lock of struct CompareState
> is used to protect the 'conn_list' queue and its child queues which
> are 'primary_list' and 'secondary_list', which is a little abused
> and confusing
>
> To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock'
> which is used to protect 'conn_list' queue, use another 'conn_lock'
> to protect 'primary_list' and 'secondary_list'.
>
> Besides, fix some missing places which need these mutex lock.
>
> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>

Instead of sticking to such kind of mutex, I think it's time to make 
colo timer run in colo thread (there's a TODO in the code).

Thought?

Thanks
Zhanghailiang Feb. 6, 2017, 8:13 a.m. UTC | #2
On 2017/2/3 11:47, Jason Wang wrote:
>
>
> On 2017年01月24日 22:05, zhanghailiang wrote:
>> The original 'timer_check_lock' mutex lock of struct CompareState
>> is used to protect the 'conn_list' queue and its child queues which
>> are 'primary_list' and 'secondary_list', which is a little abused
>> and confusing
>>
>> To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock'
>> which is used to protect 'conn_list' queue, use another 'conn_lock'
>> to protect 'primary_list' and 'secondary_list'.
>>
>> Besides, fix some missing places which need these mutex lock.
>>
>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>
> Instead of sticking to such kind of mutex, I think it's time to make
> colo timer run in colo thread (there's a TODO in the code).
>

Er, it seems that, we still need these mutex locks even we make colo
timer and colo thread run in the same thread, because we may access
the connect/primary/secondary list from colo (migratioin) thread
concurrently.

Besides, it seems to be a little complex to make colo timer run in colo
compare thread, and it is not this series' goal. This series is preparing
work for integrating COLO compare with COLO frame and it is prerequisite.

So, we may consider implementing it later in another series.
Zhang Chen, what's your opinion ?


Thanks,
Hailiang

> Thought?
>
> Thanks
>
> .
>
Zhang Chen Feb. 6, 2017, 8:30 a.m. UTC | #3
On 02/06/2017 04:13 PM, Hailiang Zhang wrote:
> On 2017/2/3 11:47, Jason Wang wrote:
>>
>>
>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>> is used to protect the 'conn_list' queue and its child queues which
>>> are 'primary_list' and 'secondary_list', which is a little abused
>>> and confusing
>>>
>>> To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock'
>>> which is used to protect 'conn_list' queue, use another 'conn_lock'
>>> to protect 'primary_list' and 'secondary_list'.
>>>
>>> Besides, fix some missing places which need these mutex lock.
>>>
>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>
>> Instead of sticking to such kind of mutex, I think it's time to make
>> colo timer run in colo thread (there's a TODO in the code).
>>
>
> Er, it seems that, we still need these mutex locks even we make colo
> timer and colo thread run in the same thread, because we may access
> the connect/primary/secondary list from colo (migratioin) thread
> concurrently.
>
> Besides, it seems to be a little complex to make colo timer run in colo
> compare thread, and it is not this series' goal. This series is preparing
> work for integrating COLO compare with COLO frame and it is prerequisite.
>
> So, we may consider implementing it later in another series.
> Zhang Chen, what's your opinion ?
>

I agree, The current situation of COLO is not a complete HA solution.
So, We should focus on make Qemu COLO run completely, and then do
this "TODO" job.

Thanks
Zhang Chen

>
> Thanks,
> Hailiang
>
>> Thought?
>>
>> Thanks
>>
>> .
>>
>
>
>
> .
>
Jason Wang Feb. 6, 2017, 9:35 a.m. UTC | #4
On 2017年02月06日 16:13, Hailiang Zhang wrote:
> On 2017/2/3 11:47, Jason Wang wrote:
>>
>>
>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>> is used to protect the 'conn_list' queue and its child queues which
>>> are 'primary_list' and 'secondary_list', which is a little abused
>>> and confusing
>>>
>>> To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock'
>>> which is used to protect 'conn_list' queue, use another 'conn_lock'
>>> to protect 'primary_list' and 'secondary_list'.
>>>
>>> Besides, fix some missing places which need these mutex lock.
>>>
>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>
>> Instead of sticking to such kind of mutex, I think it's time to make
>> colo timer run in colo thread (there's a TODO in the code).
>>
>
> Er, it seems that, we still need these mutex locks even we make colo
> timer and colo thread run in the same thread, because we may access
> the connect/primary/secondary list from colo (migratioin) thread
> concurrently.

Just make sure I understand the issue, why need it access the list?

>
> Besides, it seems to be a little complex to make colo timer run in colo
> compare thread, and it is not this series' goal. 

Seems not by just looking at how it was implemented in main loop, but 
maybe I was wrong.

> This series is preparing
> work for integrating COLO compare with COLO frame and it is prerequisite.
>
> So, we may consider implementing it later in another series.
> Zhang Chen, what's your opinion ?

The problem is this patch make things even worse, it introduces one more 
mutex.

Thanks

>
>
> Thanks,
> Hailiang
>
>> Thought?
>>
>> Thanks
>>
>> .
>>
>
Zhanghailiang Feb. 6, 2017, 11:11 a.m. UTC | #5
On 2017/2/6 17:35, Jason Wang wrote:
>
>
> On 2017年02月06日 16:13, Hailiang Zhang wrote:
>> On 2017/2/3 11:47, Jason Wang wrote:
>>>
>>>
>>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>>> is used to protect the 'conn_list' queue and its child queues which
>>>> are 'primary_list' and 'secondary_list', which is a little abused
>>>> and confusing
>>>>
>>>> To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock'
>>>> which is used to protect 'conn_list' queue, use another 'conn_lock'
>>>> to protect 'primary_list' and 'secondary_list'.
>>>>
>>>> Besides, fix some missing places which need these mutex lock.
>>>>
>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>
>>> Instead of sticking to such kind of mutex, I think it's time to make
>>> colo timer run in colo thread (there's a TODO in the code).
>>>
>>
>> Er, it seems that, we still need these mutex locks even we make colo
>> timer and colo thread run in the same thread, because we may access
>> the connect/primary/secondary list from colo (migratioin) thread
>> concurrently.
>
> Just make sure I understand the issue, why need it access the list?
>
>>
>> Besides, it seems to be a little complex to make colo timer run in colo
>> compare thread, and it is not this series' goal.
>
> Seems not by just looking at how it was implemented in main loop, but
> maybe I was wrong.
>
>> This series is preparing
>> work for integrating COLO compare with COLO frame and it is prerequisite.
>>
>> So, we may consider implementing it later in another series.
>> Zhang Chen, what's your opinion ?
>
> The problem is this patch make things even worse, it introduces one more
> mutex.
>

Hmm, for most cases, we need to get these two locks at the same time.
We can use one lock to protect conn_list/primary_list/secondary_list,
(We need to get this lock before operate all these lists, as you can see
in patch 2, while do checkpoint, we may operate these lists in
COLO checkpoint thread concurrently.)

But for the original codes, we didn't got timer_check_lock in
packet_enqueue() while operate conn_list/primary_list/secondary_list,
and didn't got this lock in colo_compare_connection while operate
secondary_list either.

So, is it OK to use the conn_lock instead of timer_check_lock, and
add the lock where it is need ?

Thanks.
Hailiang

> Thanks
>
>>
>>
>> Thanks,
>> Hailiang
>>
>>> Thought?
>>>
>>> Thanks
>>>
>>> .
>>>
>>
>
>
> .
>
Jason Wang Feb. 6, 2017, 12:53 p.m. UTC | #6
On 2017年02月06日 19:11, Hailiang Zhang wrote:
> On 2017/2/6 17:35, Jason Wang wrote:
>>
>>
>> On 2017年02月06日 16:13, Hailiang Zhang wrote:
>>> On 2017/2/3 11:47, Jason Wang wrote:
>>>>
>>>>
>>>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>>>> is used to protect the 'conn_list' queue and its child queues which
>>>>> are 'primary_list' and 'secondary_list', which is a little abused
>>>>> and confusing
>>>>>
>>>>> To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock'
>>>>> which is used to protect 'conn_list' queue, use another 'conn_lock'
>>>>> to protect 'primary_list' and 'secondary_list'.
>>>>>
>>>>> Besides, fix some missing places which need these mutex lock.
>>>>>
>>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>>
>>>> Instead of sticking to such kind of mutex, I think it's time to make
>>>> colo timer run in colo thread (there's a TODO in the code).
>>>>
>>>
>>> Er, it seems that, we still need these mutex locks even we make colo
>>> timer and colo thread run in the same thread, because we may access
>>> the connect/primary/secondary list from colo (migratioin) thread
>>> concurrently.
>>
>> Just make sure I understand the issue, why need it access the list?
>>
>>>
>>> Besides, it seems to be a little complex to make colo timer run in colo
>>> compare thread, and it is not this series' goal.
>>
>> Seems not by just looking at how it was implemented in main loop, but
>> maybe I was wrong.
>>
>>> This series is preparing
>>> work for integrating COLO compare with COLO frame and it is 
>>> prerequisite.
>>>
>>> So, we may consider implementing it later in another series.
>>> Zhang Chen, what's your opinion ?
>>
>> The problem is this patch make things even worse, it introduces one more
>> mutex.
>>
>
> Hmm, for most cases, we need to get these two locks at the same time.
> We can use one lock to protect conn_list/primary_list/secondary_list,
> (We need to get this lock before operate all these lists, as you can see
> in patch 2, while do checkpoint, we may operate these lists in
> COLO checkpoint thread concurrently.)
>
> But for the original codes, we didn't got timer_check_lock in
> packet_enqueue() while operate conn_list/primary_list/secondary_list,
> and didn't got this lock in colo_compare_connection while operate
> secondary_list either.
>
> So, is it OK to use the conn_lock instead of timer_check_lock, and
> add the lock where it is need ?

I'd like to know if timer were run in colo thread (this looks as simple 
as a g_timeout_source_new() followed by a g_source_attach()), why do we 
still need a mutex. And if we need it now but plan to remove it in the 
future, I'd like to use big lock to simplify the codes.

Thanks

>
> Thanks.
> Hailiang
>
>> Thanks
>>
>>>
>>>
>>> Thanks,
>>> Hailiang
>>>
>>>> Thought?
>>>>
>>>> Thanks
>>>>
>>>> .
>>>>
>>>
>>
>>
>> .
>>
>
Zhanghailiang Feb. 7, 2017, 7:54 a.m. UTC | #7
Hi Jason,

On 2017/2/6 20:53, Jason Wang wrote:
>
>
> On 2017年02月06日 19:11, Hailiang Zhang wrote:
>> On 2017/2/6 17:35, Jason Wang wrote:
>>>
>>>
>>> On 2017年02月06日 16:13, Hailiang Zhang wrote:
>>>> On 2017/2/3 11:47, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>>>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>>>>> is used to protect the 'conn_list' queue and its child queues which
>>>>>> are 'primary_list' and 'secondary_list', which is a little abused
>>>>>> and confusing
>>>>>>
>>>>>> To make it clearer, we rename 'timer_check_lock' to 'conn_list_lock'
>>>>>> which is used to protect 'conn_list' queue, use another 'conn_lock'
>>>>>> to protect 'primary_list' and 'secondary_list'.
>>>>>>
>>>>>> Besides, fix some missing places which need these mutex lock.
>>>>>>
>>>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>>>
>>>>> Instead of sticking to such kind of mutex, I think it's time to make
>>>>> colo timer run in colo thread (there's a TODO in the code).
>>>>>
>>>>
>>>> Er, it seems that, we still need these mutex locks even we make colo
>>>> timer and colo thread run in the same thread, because we may access
>>>> the connect/primary/secondary list from colo (migratioin) thread
>>>> concurrently.
>>>
>>> Just make sure I understand the issue, why need it access the list?
>>>
>>>>
>>>> Besides, it seems to be a little complex to make colo timer run in colo
>>>> compare thread, and it is not this series' goal.
>>>
>>> Seems not by just looking at how it was implemented in main loop, but
>>> maybe I was wrong.
>>>
>>>> This series is preparing
>>>> work for integrating COLO compare with COLO frame and it is
>>>> prerequisite.
>>>>
>>>> So, we may consider implementing it later in another series.
>>>> Zhang Chen, what's your opinion ?
>>>
>>> The problem is this patch make things even worse, it introduces one more
>>> mutex.
>>>
>>
>> Hmm, for most cases, we need to get these two locks at the same time.
>> We can use one lock to protect conn_list/primary_list/secondary_list,
>> (We need to get this lock before operate all these lists, as you can see
>> in patch 2, while do checkpoint, we may operate these lists in
>> COLO checkpoint thread concurrently.)
>>
>> But for the original codes, we didn't got timer_check_lock in
>> packet_enqueue() while operate conn_list/primary_list/secondary_list,
>> and didn't got this lock in colo_compare_connection while operate
>> secondary_list either.
>>
>> So, is it OK to use the conn_lock instead of timer_check_lock, and
>> add the lock where it is need ?
>
> I'd like to know if timer were run in colo thread (this looks as simple
> as a g_timeout_source_new() followed by a g_source_attach()), why do we
> still need a mutex. And if we need it now but plan to remove it in the
> future, I'd like to use big lock to simplify the codes.
>

After investigated your above suggestion, I think it works by using
g_timeout_source_new() and g_source_attach(), but I'm not sure
if it is a good idea to use the big lock to protect the possible
concurrent cases which seem to only happen between COLO migration
thread and COLO compare thread, any further suggestions ?

Thanks,
Hailiang


> Thanks
>
>>
>> Thanks.
>> Hailiang
>>
>>> Thanks
>>>
>>>>
>>>>
>>>> Thanks,
>>>> Hailiang
>>>>
>>>>> Thought?
>>>>>
>>>>> Thanks
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>
>
> .
>
Jason Wang Feb. 7, 2017, 7:57 a.m. UTC | #8
On 2017年02月07日 15:54, Hailiang Zhang wrote:
> Hi Jason,
>
> On 2017/2/6 20:53, Jason Wang wrote:
>>
>>
>> On 2017年02月06日 19:11, Hailiang Zhang wrote:
>>> On 2017/2/6 17:35, Jason Wang wrote:
>>>>
>>>>
>>>> On 2017年02月06日 16:13, Hailiang Zhang wrote:
>>>>> On 2017/2/3 11:47, Jason Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>>>>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>>>>>> is used to protect the 'conn_list' queue and its child queues which
>>>>>>> are 'primary_list' and 'secondary_list', which is a little abused
>>>>>>> and confusing
>>>>>>>
>>>>>>> To make it clearer, we rename 'timer_check_lock' to 
>>>>>>> 'conn_list_lock'
>>>>>>> which is used to protect 'conn_list' queue, use another 'conn_lock'
>>>>>>> to protect 'primary_list' and 'secondary_list'.
>>>>>>>
>>>>>>> Besides, fix some missing places which need these mutex lock.
>>>>>>>
>>>>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>>>>
>>>>>> Instead of sticking to such kind of mutex, I think it's time to make
>>>>>> colo timer run in colo thread (there's a TODO in the code).
>>>>>>
>>>>>
>>>>> Er, it seems that, we still need these mutex locks even we make colo
>>>>> timer and colo thread run in the same thread, because we may access
>>>>> the connect/primary/secondary list from colo (migratioin) thread
>>>>> concurrently.
>>>>
>>>> Just make sure I understand the issue, why need it access the list?
>>>>
>>>>>
>>>>> Besides, it seems to be a little complex to make colo timer run in 
>>>>> colo
>>>>> compare thread, and it is not this series' goal.
>>>>
>>>> Seems not by just looking at how it was implemented in main loop, but
>>>> maybe I was wrong.
>>>>
>>>>> This series is preparing
>>>>> work for integrating COLO compare with COLO frame and it is
>>>>> prerequisite.
>>>>>
>>>>> So, we may consider implementing it later in another series.
>>>>> Zhang Chen, what's your opinion ?
>>>>
>>>> The problem is this patch make things even worse, it introduces one 
>>>> more
>>>> mutex.
>>>>
>>>
>>> Hmm, for most cases, we need to get these two locks at the same time.
>>> We can use one lock to protect conn_list/primary_list/secondary_list,
>>> (We need to get this lock before operate all these lists, as you can 
>>> see
>>> in patch 2, while do checkpoint, we may operate these lists in
>>> COLO checkpoint thread concurrently.)
>>>
>>> But for the original codes, we didn't got timer_check_lock in
>>> packet_enqueue() while operate conn_list/primary_list/secondary_list,
>>> and didn't got this lock in colo_compare_connection while operate
>>> secondary_list either.
>>>
>>> So, is it OK to use the conn_lock instead of timer_check_lock, and
>>> add the lock where it is need ?
>>
>> I'd like to know if timer were run in colo thread (this looks as simple
>> as a g_timeout_source_new() followed by a g_source_attach()), why do we
>> still need a mutex. And if we need it now but plan to remove it in the
>> future, I'd like to use big lock to simplify the codes.
>>
>
> After investigated your above suggestion, I think it works by using
> g_timeout_source_new() and g_source_attach(), but I'm not sure
> if it is a good idea to use the big lock to protect the possible
> concurrent cases which seem to only happen between COLO migration
> thread and COLO compare thread, any further suggestions ?

I think I need first understand why migration thread need to access the 
list?

Thanks

>
> Thanks,
> Hailiang
>
>
>> Thanks
>>
>>>
>>> Thanks.
>>> Hailiang
>>>
>>>> Thanks
>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Hailiang
>>>>>
>>>>>> Thought?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>>
>>
>> .
>>
>
Zhanghailiang Feb. 7, 2017, 8:19 a.m. UTC | #9
On 2017/2/7 15:57, Jason Wang wrote:
>
>
> On 2017年02月07日 15:54, Hailiang Zhang wrote:
>> Hi Jason,
>>
>> On 2017/2/6 20:53, Jason Wang wrote:
>>>
>>>
>>> On 2017年02月06日 19:11, Hailiang Zhang wrote:
>>>> On 2017/2/6 17:35, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 2017年02月06日 16:13, Hailiang Zhang wrote:
>>>>>> On 2017/2/3 11:47, Jason Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>>>>>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>>>>>>> is used to protect the 'conn_list' queue and its child queues which
>>>>>>>> are 'primary_list' and 'secondary_list', which is a little abused
>>>>>>>> and confusing
>>>>>>>>
>>>>>>>> To make it clearer, we rename 'timer_check_lock' to
>>>>>>>> 'conn_list_lock'
>>>>>>>> which is used to protect 'conn_list' queue, use another 'conn_lock'
>>>>>>>> to protect 'primary_list' and 'secondary_list'.
>>>>>>>>
>>>>>>>> Besides, fix some missing places which need these mutex lock.
>>>>>>>>
>>>>>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>>>>>
>>>>>>> Instead of sticking to such kind of mutex, I think it's time to make
>>>>>>> colo timer run in colo thread (there's a TODO in the code).
>>>>>>>
>>>>>>
>>>>>> Er, it seems that, we still need these mutex locks even we make colo
>>>>>> timer and colo thread run in the same thread, because we may access
>>>>>> the connect/primary/secondary list from colo (migratioin) thread
>>>>>> concurrently.
>>>>>
>>>>> Just make sure I understand the issue, why need it access the list?
>>>>>
>>>>>>
>>>>>> Besides, it seems to be a little complex to make colo timer run in
>>>>>> colo
>>>>>> compare thread, and it is not this series' goal.
>>>>>
>>>>> Seems not by just looking at how it was implemented in main loop, but
>>>>> maybe I was wrong.
>>>>>
>>>>>> This series is preparing
>>>>>> work for integrating COLO compare with COLO frame and it is
>>>>>> prerequisite.
>>>>>>
>>>>>> So, we may consider implementing it later in another series.
>>>>>> Zhang Chen, what's your opinion ?
>>>>>
>>>>> The problem is this patch make things even worse, it introduces one
>>>>> more
>>>>> mutex.
>>>>>
>>>>
>>>> Hmm, for most cases, we need to get these two locks at the same time.
>>>> We can use one lock to protect conn_list/primary_list/secondary_list,
>>>> (We need to get this lock before operate all these lists, as you can
>>>> see
>>>> in patch 2, while do checkpoint, we may operate these lists in
>>>> COLO checkpoint thread concurrently.)
>>>>
>>>> But for the original codes, we didn't got timer_check_lock in
>>>> packet_enqueue() while operate conn_list/primary_list/secondary_list,
>>>> and didn't got this lock in colo_compare_connection while operate
>>>> secondary_list either.
>>>>
>>>> So, is it OK to use the conn_lock instead of timer_check_lock, and
>>>> add the lock where it is need ?
>>>
>>> I'd like to know if timer were run in colo thread (this looks as simple
>>> as a g_timeout_source_new() followed by a g_source_attach()), why do we
>>> still need a mutex. And if we need it now but plan to remove it in the
>>> future, I'd like to use big lock to simplify the codes.
>>>
>>
>> After investigated your above suggestion, I think it works by using
>> g_timeout_source_new() and g_source_attach(), but I'm not sure
>> if it is a good idea to use the big lock to protect the possible
>> concurrent cases which seem to only happen between COLO migration
>> thread and COLO compare thread, any further suggestions ?
>
> I think I need first understand why migration thread need to access the
> list?
>

Er, sorry to confuse you here, to be exactly, it is COLO checkpoint thread,
we reuse the migration thread to realize checkpoint process for COLO,
Because qemu enters into COLO state after a complete migration, so we reuse it.

While do checkpoint, we need to release all the buffered packets that have not
yet been compared, so we need to access the list in COLO checkpoint thread.

Thanks.

> Thanks
>
>>
>> Thanks,
>> Hailiang
>>
>>
>>> Thanks
>>>
>>>>
>>>> Thanks.
>>>> Hailiang
>>>>
>>>>> Thanks
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Hailiang
>>>>>>
>>>>>>> Thought?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>
>
> .
>
Jason Wang Feb. 7, 2017, 9:21 a.m. UTC | #10
On 2017年02月07日 16:19, Hailiang Zhang wrote:
> On 2017/2/7 15:57, Jason Wang wrote:
>>
>>
>> On 2017年02月07日 15:54, Hailiang Zhang wrote:
>>> Hi Jason,
>>>
>>> On 2017/2/6 20:53, Jason Wang wrote:
>>>>
>>>>
>>>> On 2017年02月06日 19:11, Hailiang Zhang wrote:
>>>>> On 2017/2/6 17:35, Jason Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 2017年02月06日 16:13, Hailiang Zhang wrote:
>>>>>>> On 2017/2/3 11:47, Jason Wang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>>>>>>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>>>>>>>> is used to protect the 'conn_list' queue and its child queues 
>>>>>>>>> which
>>>>>>>>> are 'primary_list' and 'secondary_list', which is a little abused
>>>>>>>>> and confusing
>>>>>>>>>
>>>>>>>>> To make it clearer, we rename 'timer_check_lock' to
>>>>>>>>> 'conn_list_lock'
>>>>>>>>> which is used to protect 'conn_list' queue, use another 
>>>>>>>>> 'conn_lock'
>>>>>>>>> to protect 'primary_list' and 'secondary_list'.
>>>>>>>>>
>>>>>>>>> Besides, fix some missing places which need these mutex lock.
>>>>>>>>>
>>>>>>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>>>>>>
>>>>>>>> Instead of sticking to such kind of mutex, I think it's time to 
>>>>>>>> make
>>>>>>>> colo timer run in colo thread (there's a TODO in the code).
>>>>>>>>
>>>>>>>
>>>>>>> Er, it seems that, we still need these mutex locks even we make 
>>>>>>> colo
>>>>>>> timer and colo thread run in the same thread, because we may access
>>>>>>> the connect/primary/secondary list from colo (migratioin) thread
>>>>>>> concurrently.
>>>>>>
>>>>>> Just make sure I understand the issue, why need it access the list?
>>>>>>
>>>>>>>
>>>>>>> Besides, it seems to be a little complex to make colo timer run in
>>>>>>> colo
>>>>>>> compare thread, and it is not this series' goal.
>>>>>>
>>>>>> Seems not by just looking at how it was implemented in main loop, 
>>>>>> but
>>>>>> maybe I was wrong.
>>>>>>
>>>>>>> This series is preparing
>>>>>>> work for integrating COLO compare with COLO frame and it is
>>>>>>> prerequisite.
>>>>>>>
>>>>>>> So, we may consider implementing it later in another series.
>>>>>>> Zhang Chen, what's your opinion ?
>>>>>>
>>>>>> The problem is this patch make things even worse, it introduces one
>>>>>> more
>>>>>> mutex.
>>>>>>
>>>>>
>>>>> Hmm, for most cases, we need to get these two locks at the same time.
>>>>> We can use one lock to protect conn_list/primary_list/secondary_list,
>>>>> (We need to get this lock before operate all these lists, as you can
>>>>> see
>>>>> in patch 2, while do checkpoint, we may operate these lists in
>>>>> COLO checkpoint thread concurrently.)
>>>>>
>>>>> But for the original codes, we didn't got timer_check_lock in
>>>>> packet_enqueue() while operate conn_list/primary_list/secondary_list,
>>>>> and didn't got this lock in colo_compare_connection while operate
>>>>> secondary_list either.
>>>>>
>>>>> So, is it OK to use the conn_lock instead of timer_check_lock, and
>>>>> add the lock where it is need ?
>>>>
>>>> I'd like to know if timer were run in colo thread (this looks as 
>>>> simple
>>>> as a g_timeout_source_new() followed by a g_source_attach()), why 
>>>> do we
>>>> still need a mutex. And if we need it now but plan to remove it in the
>>>> future, I'd like to use big lock to simplify the codes.
>>>>
>>>
>>> After investigated your above suggestion, I think it works by using
>>> g_timeout_source_new() and g_source_attach(), but I'm not sure
>>> if it is a good idea to use the big lock to protect the possible
>>> concurrent cases which seem to only happen between COLO migration
>>> thread and COLO compare thread, any further suggestions ?
>>
>> I think I need first understand why migration thread need to access the
>> list?
>>
>
> Er, sorry to confuse you here, to be exactly, it is COLO checkpoint 
> thread,
> we reuse the migration thread to realize checkpoint process for COLO,
> Because qemu enters into COLO state after a complete migration, so we 
> reuse it.
>
> While do checkpoint, we need to release all the buffered packets that 
> have not
> yet been compared, so we need to access the list in COLO checkpoint 
> thread.

I think the better way is notify the comparing thread and let it do the 
releasing. You probably need similar mechanism to notify from comparing 
thread to checkpoint thread.

Thanks

>
> Thanks.
Zhanghailiang Feb. 7, 2017, 9:30 a.m. UTC | #11
On 2017/2/7 17:21, Jason Wang wrote:
>
>
> On 2017年02月07日 16:19, Hailiang Zhang wrote:
>> On 2017/2/7 15:57, Jason Wang wrote:
>>>
>>>
>>> On 2017年02月07日 15:54, Hailiang Zhang wrote:
>>>> Hi Jason,
>>>>
>>>> On 2017/2/6 20:53, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 2017年02月06日 19:11, Hailiang Zhang wrote:
>>>>>> On 2017/2/6 17:35, Jason Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2017年02月06日 16:13, Hailiang Zhang wrote:
>>>>>>>> On 2017/2/3 11:47, Jason Wang wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>>>>>>>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>>>>>>>>> is used to protect the 'conn_list' queue and its child queues
>>>>>>>>>> which
>>>>>>>>>> are 'primary_list' and 'secondary_list', which is a little abused
>>>>>>>>>> and confusing
>>>>>>>>>>
>>>>>>>>>> To make it clearer, we rename 'timer_check_lock' to
>>>>>>>>>> 'conn_list_lock'
>>>>>>>>>> which is used to protect 'conn_list' queue, use another
>>>>>>>>>> 'conn_lock'
>>>>>>>>>> to protect 'primary_list' and 'secondary_list'.
>>>>>>>>>>
>>>>>>>>>> Besides, fix some missing places which need these mutex lock.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>>>>>>>
>>>>>>>>> Instead of sticking to such kind of mutex, I think it's time to
>>>>>>>>> make
>>>>>>>>> colo timer run in colo thread (there's a TODO in the code).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Er, it seems that, we still need these mutex locks even we make
>>>>>>>> colo
>>>>>>>> timer and colo thread run in the same thread, because we may access
>>>>>>>> the connect/primary/secondary list from colo (migratioin) thread
>>>>>>>> concurrently.
>>>>>>>
>>>>>>> Just make sure I understand the issue, why need it access the list?
>>>>>>>
>>>>>>>>
>>>>>>>> Besides, it seems to be a little complex to make colo timer run in
>>>>>>>> colo
>>>>>>>> compare thread, and it is not this series' goal.
>>>>>>>
>>>>>>> Seems not by just looking at how it was implemented in main loop,
>>>>>>> but
>>>>>>> maybe I was wrong.
>>>>>>>
>>>>>>>> This series is preparing
>>>>>>>> work for integrating COLO compare with COLO frame and it is
>>>>>>>> prerequisite.
>>>>>>>>
>>>>>>>> So, we may consider implementing it later in another series.
>>>>>>>> Zhang Chen, what's your opinion ?
>>>>>>>
>>>>>>> The problem is this patch make things even worse, it introduces one
>>>>>>> more
>>>>>>> mutex.
>>>>>>>
>>>>>>
>>>>>> Hmm, for most cases, we need to get these two locks at the same time.
>>>>>> We can use one lock to protect conn_list/primary_list/secondary_list,
>>>>>> (We need to get this lock before operate all these lists, as you can
>>>>>> see
>>>>>> in patch 2, while do checkpoint, we may operate these lists in
>>>>>> COLO checkpoint thread concurrently.)
>>>>>>
>>>>>> But for the original codes, we didn't got timer_check_lock in
>>>>>> packet_enqueue() while operate conn_list/primary_list/secondary_list,
>>>>>> and didn't got this lock in colo_compare_connection while operate
>>>>>> secondary_list either.
>>>>>>
>>>>>> So, is it OK to use the conn_lock instead of timer_check_lock, and
>>>>>> add the lock where it is need ?
>>>>>
>>>>> I'd like to know if timer were run in colo thread (this looks as
>>>>> simple
>>>>> as a g_timeout_source_new() followed by a g_source_attach()), why
>>>>> do we
>>>>> still need a mutex. And if we need it now but plan to remove it in the
>>>>> future, I'd like to use big lock to simplify the codes.
>>>>>
>>>>
>>>> After investigated your above suggestion, I think it works by using
>>>> g_timeout_source_new() and g_source_attach(), but I'm not sure
>>>> if it is a good idea to use the big lock to protect the possible
>>>> concurrent cases which seem to only happen between COLO migration
>>>> thread and COLO compare thread, any further suggestions ?
>>>
>>> I think I need first understand why migration thread need to access the
>>> list?
>>>
>>
>> Er, sorry to confuse you here, to be exactly, it is COLO checkpoint
>> thread,
>> we reuse the migration thread to realize checkpoint process for COLO,
>> Because qemu enters into COLO state after a complete migration, so we
>> reuse it.
>>
>> While do checkpoint, we need to release all the buffered packets that
>> have not
>> yet been compared, so we need to access the list in COLO checkpoint
>> thread.
>
> I think the better way is notify the comparing thread and let it do the
> releasing. You probably need similar mechanism to notify from comparing
> thread to checkpoint thread.
>

Hmm, that's really a good idea, it can avoid using the mutex lock,
I'll try to implement it in next version, thanks very much. :)

> Thanks
>
>>
>> Thanks.
>
>
> .
>
Zhanghailiang Feb. 14, 2017, 2:32 a.m. UTC | #12
On 2017/2/7 17:21, Jason Wang wrote:
>
>
> On 2017年02月07日 16:19, Hailiang Zhang wrote:
>> On 2017/2/7 15:57, Jason Wang wrote:
>>>
>>>
>>> On 2017年02月07日 15:54, Hailiang Zhang wrote:
>>>> Hi Jason,
>>>>
>>>> On 2017/2/6 20:53, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 2017年02月06日 19:11, Hailiang Zhang wrote:
>>>>>> On 2017/2/6 17:35, Jason Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2017年02月06日 16:13, Hailiang Zhang wrote:
>>>>>>>> On 2017/2/3 11:47, Jason Wang wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2017年01月24日 22:05, zhanghailiang wrote:
>>>>>>>>>> The original 'timer_check_lock' mutex lock of struct CompareState
>>>>>>>>>> is used to protect the 'conn_list' queue and its child queues
>>>>>>>>>> which
>>>>>>>>>> are 'primary_list' and 'secondary_list', which is a little abused
>>>>>>>>>> and confusing
>>>>>>>>>>
>>>>>>>>>> To make it clearer, we rename 'timer_check_lock' to
>>>>>>>>>> 'conn_list_lock'
>>>>>>>>>> which is used to protect 'conn_list' queue, use another
>>>>>>>>>> 'conn_lock'
>>>>>>>>>> to protect 'primary_list' and 'secondary_list'.
>>>>>>>>>>
>>>>>>>>>> Besides, fix some missing places which need these mutex lock.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>>>>>>>
>>>>>>>>> Instead of sticking to such kind of mutex, I think it's time to
>>>>>>>>> make
>>>>>>>>> colo timer run in colo thread (there's a TODO in the code).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Er, it seems that, we still need these mutex locks even we make
>>>>>>>> colo
>>>>>>>> timer and colo thread run in the same thread, because we may access
>>>>>>>> the connect/primary/secondary list from colo (migratioin) thread
>>>>>>>> concurrently.
>>>>>>>
>>>>>>> Just make sure I understand the issue, why need it access the list?
>>>>>>>
>>>>>>>>
>>>>>>>> Besides, it seems to be a little complex to make colo timer run in
>>>>>>>> colo
>>>>>>>> compare thread, and it is not this series' goal.
>>>>>>>
>>>>>>> Seems not by just looking at how it was implemented in main loop,
>>>>>>> but
>>>>>>> maybe I was wrong.
>>>>>>>
>>>>>>>> This series is preparing
>>>>>>>> work for integrating COLO compare with COLO frame and it is
>>>>>>>> prerequisite.
>>>>>>>>
>>>>>>>> So, we may consider implementing it later in another series.
>>>>>>>> Zhang Chen, what's your opinion ?
>>>>>>>
>>>>>>> The problem is this patch make things even worse, it introduces one
>>>>>>> more
>>>>>>> mutex.
>>>>>>>
>>>>>>
>>>>>> Hmm, for most cases, we need to get these two locks at the same time.
>>>>>> We can use one lock to protect conn_list/primary_list/secondary_list,
>>>>>> (We need to get this lock before operate all these lists, as you can
>>>>>> see
>>>>>> in patch 2, while do checkpoint, we may operate these lists in
>>>>>> COLO checkpoint thread concurrently.)
>>>>>>
>>>>>> But for the original codes, we didn't got timer_check_lock in
>>>>>> packet_enqueue() while operate conn_list/primary_list/secondary_list,
>>>>>> and didn't got this lock in colo_compare_connection while operate
>>>>>> secondary_list either.
>>>>>>
>>>>>> So, is it OK to use the conn_lock instead of timer_check_lock, and
>>>>>> add the lock where it is need ?
>>>>>
>>>>> I'd like to know if timer were run in colo thread (this looks as
>>>>> simple
>>>>> as a g_timeout_source_new() followed by a g_source_attach()), why
>>>>> do we
>>>>> still need a mutex. And if we need it now but plan to remove it in the
>>>>> future, I'd like to use big lock to simplify the codes.
>>>>>
>>>>
>>>> After investigated your above suggestion, I think it works by using
>>>> g_timeout_source_new() and g_source_attach(), but I'm not sure
>>>> if it is a good idea to use the big lock to protect the possible
>>>> concurrent cases which seem to only happen between COLO migration
>>>> thread and COLO compare thread, any further suggestions ?
>>>
>>> I think I need first understand why migration thread need to access the
>>> list?
>>>
>>
>> Er, sorry to confuse you here, to be exactly, it is COLO checkpoint
>> thread,
>> we reuse the migration thread to realize checkpoint process for COLO,
>> Because qemu enters into COLO state after a complete migration, so we
>> reuse it.
>>
>> While do checkpoint, we need to release all the buffered packets that
>> have not
>> yet been compared, so we need to access the list in COLO checkpoint
>> thread.
>

Hi Jason,

> I think the better way is notify the comparing thread and let it do the
> releasing. You probably need similar mechanism to notify from comparing
> thread to checkpoint thread.
>

It seems that there is no available APIs in glib to notify a thread which
are run coroutine to do something (idle source ?). What about using anonymous pipe
as the GPollFD to communicate between colo comparing thread and colo thread ?

Any ideas ?

Hailiang


> Thanks
>
>>
>> Thanks.
>
>
> .
>
Jason Wang Feb. 14, 2017, 4:08 a.m. UTC | #13
On 2017年02月14日 10:32, Hailiang Zhang wrote:
>>>
>>
>
> Hi Jason,
>
>> I think the better way is notify the comparing thread and let it do the
>> releasing. You probably need similar mechanism to notify from comparing
>> thread to checkpoint thread.
>>
>
> It seems that there is no available APIs in glib to notify a thread which
> are run coroutine to do something (idle source ?). What about using 
> anonymous pipe
> as the GPollFD to communicate between colo comparing thread and colo 
> thread ?
>
> Any ideas ?
>
> Hailiang

Haven't thought this deeply. But I think you can try event notifier 
first which was designed for such kind of notification.

Thanks
Zhanghailiang Feb. 14, 2017, 7:33 a.m. UTC | #14
On 2017/2/14 12:08, Jason Wang wrote:
>
>
> On 2017年02月14日 10:32, Hailiang Zhang wrote:
>>>>
>>>
>>
>> Hi Jason,
>>
>>> I think the better way is notify the comparing thread and let it do the
>>> releasing. You probably need similar mechanism to notify from comparing
>>> thread to checkpoint thread.
>>>
>>
>> It seems that there is no available APIs in glib to notify a thread which
>> are run coroutine to do something (idle source ?). What about using
>> anonymous pipe
>> as the GPollFD to communicate between colo comparing thread and colo
>> thread ?
>>
>> Any ideas ?
>>
>> Hailiang
>
> Haven't thought this deeply. But I think you can try event notifier
> first which was designed for such kind of notification.
>

Hmm, It is quite same with what i said above except it uses the
return value of eventfd as the GPollFD.
The things here are more complex especially if there are more than
one vNIC, which each is corresponding to one compare thread.

How about using g_source_attach(), g_source_attach(), g_source_remove(),
g_source_set_callback() and g_source_set_priority() to control the process
of releasing packets dynamically ? Which we call g_source_attach()
to add the coroutine of releasing packets when do checkpoint, and remove
it once finishing the checkpoint process.

About colo comparing thread notifies colo thread, i think we can use
qemu_cond_wait(),qemu_cond_broadcast(), just like what we do in
pause_all_vcpus().

Thanks.
Hailiang

> Thanks
>
> .
>
Jason Wang Feb. 15, 2017, 3:16 a.m. UTC | #15
On 2017年02月14日 15:33, Hailiang Zhang wrote:
> On 2017/2/14 12:08, Jason Wang wrote:
>>
>>
>> On 2017年02月14日 10:32, Hailiang Zhang wrote:
>>>>>
>>>>
>>>
>>> Hi Jason,
>>>
>>>> I think the better way is notify the comparing thread and let it do 
>>>> the
>>>> releasing. You probably need similar mechanism to notify from 
>>>> comparing
>>>> thread to checkpoint thread.
>>>>
>>>
>>> It seems that there is no available APIs in glib to notify a thread 
>>> which
>>> are run coroutine to do something (idle source ?). What about using
>>> anonymous pipe
>>> as the GPollFD to communicate between colo comparing thread and colo
>>> thread ?
>>>
>>> Any ideas ?
>>>
>>> Hailiang
>>
>> Haven't thought this deeply. But I think you can try event notifier
>> first which was designed for such kind of notification.
>>
>
> Hmm, It is quite same with what i said above except it uses the
> return value of eventfd as the GPollFD.
> The things here are more complex especially if there are more than
> one vNIC, which each is corresponding to one compare thread.
>
> How about using g_source_attach(), g_source_attach(), g_source_remove(),
> g_source_set_callback() and g_source_set_priority() to control the 
> process
> of releasing packets dynamically ? Which we call g_source_attach()
> to add the coroutine of releasing packets when do checkpoint, and remove
> it once finishing the checkpoint process.

I'm not sure I get the idea, maybe you can post patch for early review.

>
> About colo comparing thread notifies colo thread, i think we can use
> qemu_cond_wait(),qemu_cond_broadcast(), just like what we do in
> pause_all_vcpus().

Yes, agree.

Thanks

>
> Thanks.
> Hailiang
>
>> Thanks
>>
>> .
>>
>
>
diff mbox

Patch

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 5a4f335..9bea62a 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -79,13 +79,15 @@  typedef struct CompareState {
      * element type: Connection
      */
     GQueue conn_list;
+    QemuMutex conn_list_lock;
     /* hashtable to save connection */
     GHashTable *connection_track_table;
+
     /* compare thread, a thread for each NIC */
     QemuThread thread;
+
     /* Timer used on the primary to find packets that are never matched */
     QEMUTimer *timer;
-    QemuMutex timer_check_lock;
 } CompareState;
 
 typedef struct CompareClass {
@@ -133,6 +135,7 @@  static int packet_enqueue(CompareState *s, int mode)
     }
     fill_connection_key(pkt, &key);
 
+    qemu_mutex_lock(&s->conn_list_lock);
     conn = connection_get(s->connection_track_table,
                           &key,
                           &s->conn_list);
@@ -141,16 +144,19 @@  static int packet_enqueue(CompareState *s, int mode)
         g_queue_push_tail(&s->conn_list, conn);
         conn->processing = true;
     }
+    qemu_mutex_unlock(&s->conn_list_lock);
 
     if (mode == PRIMARY_IN) {
         if (g_queue_get_length(&conn->primary_list) <=
                                MAX_QUEUE_SIZE) {
+            qemu_mutex_lock(&conn->conn_lock);
             g_queue_push_tail(&conn->primary_list, pkt);
             if (conn->ip_proto == IPPROTO_TCP) {
                 g_queue_sort(&conn->primary_list,
                              (GCompareDataFunc)seq_sorter,
                              NULL);
             }
+            qemu_mutex_unlock(&conn->conn_lock);
         } else {
             error_report("colo compare primary queue size too big,"
                          "drop packet");
@@ -158,12 +164,14 @@  static int packet_enqueue(CompareState *s, int mode)
     } else {
         if (g_queue_get_length(&conn->secondary_list) <=
                                MAX_QUEUE_SIZE) {
+            qemu_mutex_lock(&conn->conn_lock);
             g_queue_push_tail(&conn->secondary_list, pkt);
             if (conn->ip_proto == IPPROTO_TCP) {
                 g_queue_sort(&conn->secondary_list,
                              (GCompareDataFunc)seq_sorter,
                              NULL);
             }
+            qemu_mutex_unlock(&conn->conn_lock);
         } else {
             error_report("colo compare secondary queue size too big,"
                          "drop packet");
@@ -338,10 +346,11 @@  static void colo_old_packet_check_one_conn(void *opaque,
     GList *result = NULL;
     int64_t check_time = REGULAR_PACKET_CHECK_MS;
 
+    qemu_mutex_lock(&conn->conn_lock);
     result = g_queue_find_custom(&conn->primary_list,
                                  &check_time,
                                  (GCompareFunc)colo_old_packet_check_one);
-
+    qemu_mutex_unlock(&conn->conn_lock);
     if (result) {
         /* do checkpoint will flush old packet */
         /* TODO: colo_notify_checkpoint();*/
@@ -357,7 +366,9 @@  static void colo_old_packet_check(void *opaque)
 {
     CompareState *s = opaque;
 
+    qemu_mutex_lock(&s->conn_list_lock);
     g_queue_foreach(&s->conn_list, colo_old_packet_check_one_conn, NULL);
+    qemu_mutex_unlock(&s->conn_list_lock);
 }
 
 /*
@@ -372,11 +383,10 @@  static void colo_compare_connection(void *opaque, void *user_data)
     GList *result = NULL;
     int ret;
 
+    qemu_mutex_lock(&conn->conn_lock);
     while (!g_queue_is_empty(&conn->primary_list) &&
            !g_queue_is_empty(&conn->secondary_list)) {
-        qemu_mutex_lock(&s->timer_check_lock);
         pkt = g_queue_pop_tail(&conn->primary_list);
-        qemu_mutex_unlock(&s->timer_check_lock);
         switch (conn->ip_proto) {
         case IPPROTO_TCP:
             result = g_queue_find_custom(&conn->secondary_list,
@@ -411,13 +421,12 @@  static void colo_compare_connection(void *opaque, void *user_data)
              * until next comparison.
              */
             trace_colo_compare_main("packet different");
-            qemu_mutex_lock(&s->timer_check_lock);
             g_queue_push_tail(&conn->primary_list, pkt);
-            qemu_mutex_unlock(&s->timer_check_lock);
             /* TODO: colo_notify_checkpoint();*/
             break;
         }
     }
+    qemu_mutex_unlock(&conn->conn_lock);
 }
 
 static int compare_chr_send(CharBackend *out,
@@ -561,8 +570,10 @@  static void compare_pri_rs_finalize(SocketReadState *pri_rs)
         trace_colo_compare_main("primary: unsupported packet in");
         compare_chr_send(&s->chr_out, pri_rs->buf, pri_rs->packet_len);
     } else {
+        qemu_mutex_lock(&s->conn_list_lock);
         /* compare connection */
         g_queue_foreach(&s->conn_list, colo_compare_connection, s);
+        qemu_mutex_unlock(&s->conn_list_lock);
     }
 }
 
@@ -573,8 +584,10 @@  static void compare_sec_rs_finalize(SocketReadState *sec_rs)
     if (packet_enqueue(s, SECONDARY_IN)) {
         trace_colo_compare_main("secondary: unsupported packet in");
     } else {
+        qemu_mutex_lock(&s->conn_list_lock);
         /* compare connection */
         g_queue_foreach(&s->conn_list, colo_compare_connection, s);
+        qemu_mutex_unlock(&s->conn_list_lock);
     }
 }
 
@@ -618,9 +631,7 @@  static void check_old_packet_regular(void *opaque)
      * TODO: Make timer handler run in compare thread
      * like qemu_chr_add_handlers_full.
      */
-    qemu_mutex_lock(&s->timer_check_lock);
     colo_old_packet_check(s);
-    qemu_mutex_unlock(&s->timer_check_lock);
 }
 
 /*
@@ -665,7 +676,7 @@  static void colo_compare_complete(UserCreatable *uc, Error **errp)
     net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize);
 
     g_queue_init(&s->conn_list);
-    qemu_mutex_init(&s->timer_check_lock);
+    qemu_mutex_init(&s->conn_list_lock);
 
     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                       connection_key_equal,
@@ -718,8 +729,10 @@  static void colo_compare_finalize(Object *obj)
     g_queue_free(&s->conn_list);
 
     if (qemu_thread_is_self(&s->thread)) {
+        qemu_mutex_lock(&s->conn_list_lock);
         /* compare connection */
         g_queue_foreach(&s->conn_list, colo_compare_connection, s);
+        qemu_mutex_unlock(&s->conn_list_lock);
         qemu_thread_join(&s->thread);
     }
 
@@ -727,7 +740,7 @@  static void colo_compare_finalize(Object *obj)
         timer_del(s->timer);
     }
 
-    qemu_mutex_destroy(&s->timer_check_lock);
+    qemu_mutex_destroy(&s->conn_list_lock);
 
     g_free(s->pri_indev);
     g_free(s->sec_indev);
diff --git a/net/colo.c b/net/colo.c
index 6a6eacd..267f29c 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -138,6 +138,7 @@  Connection *connection_new(ConnectionKey *key)
     conn->syn_flag = 0;
     g_queue_init(&conn->primary_list);
     g_queue_init(&conn->secondary_list);
+    qemu_mutex_init(&conn->conn_lock);
 
     return conn;
 }
@@ -151,6 +152,7 @@  void connection_destroy(void *opaque)
     g_queue_foreach(&conn->secondary_list, packet_destroy, NULL);
     g_queue_free(&conn->secondary_list);
     g_slice_free(Connection, conn);
+    qemu_mutex_destroy(&conn->conn_lock);
 }
 
 Packet *packet_new(const void *data, int size)
diff --git a/net/colo.h b/net/colo.h
index 7c524f3..2d5f9be 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -61,6 +61,8 @@  typedef struct Connection {
     GQueue secondary_list;
     /* flag to enqueue unprocessed_connections */
     bool processing;
+    /* Protect the access of primary_list or secondary list */
+    QemuMutex conn_lock;
     uint8_t ip_proto;
     /* offset = secondary_seq - primary_seq */
     tcp_seq  offset;