diff mbox

[1/3] blockjob: fix dead pointer in txn list

Message ID 1469616590-38683-2-git-send-email-vsementsov@virtuozzo.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vladimir Sementsov-Ogievskiy July 27, 2016, 10:49 a.m. UTC
Job may be freed in block_job_unref and in this case this would break
transaction QLIST.

Fix this by removing job from this list before unref.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockjob.c | 1 +
 1 file changed, 1 insertion(+)

Comments

John Snow Aug. 1, 2016, 10:39 p.m. UTC | #1
On 07/27/2016 06:49 AM, Vladimir Sementsov-Ogievskiy wrote:
> Job may be freed in block_job_unref and in this case this would break
> transaction QLIST.
>
> Fix this by removing job from this list before unref.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  blockjob.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/blockjob.c b/blockjob.c
> index a5ba3be..e045091 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -216,6 +216,7 @@ static void block_job_completed_single(BlockJob *job)
>      }
>      job->cb(job->opaque, job->ret);
>      if (job->txn) {
> +        QLIST_REMOVE(job, txn_list);
>          block_job_txn_unref(job->txn);
>      }
>      block_job_unref(job);
>

Has this caused actual problems for you?

This function is only ever called in a transactional context if the 
transaction is over -- so we're not likely to use the pointers ever 
again anyway.

Still, it's good practice, and the caller uses a safe iteration of the 
list, so I think this should be safe.

But I don't think this SHOULD fix an actual bug. If it does, I think 
something else is wrong.

Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Vladimir Sementsov-Ogievskiy Aug. 2, 2016, 11:05 a.m. UTC | #2
On 02.08.2016 01:39, John Snow wrote:
> On 07/27/2016 06:49 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Job may be freed in block_job_unref and in this case this would break
>> transaction QLIST.
>>
>> Fix this by removing job from this list before unref.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  blockjob.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index a5ba3be..e045091 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -216,6 +216,7 @@ static void block_job_completed_single(BlockJob 
>> *job)
>>      }
>>      job->cb(job->opaque, job->ret);
>>      if (job->txn) {
>> +        QLIST_REMOVE(job, txn_list);
>>          block_job_txn_unref(job->txn);
>>      }
>>      block_job_unref(job);
>>
>
> Has this caused actual problems for you?

Yes, with the same changed test 124 (my parallel thread). Backup job can 
finish  too early (if dirty bitmap is empty) and then we use this 
transaction job list with dead pointer.

>
> This function is only ever called in a transactional context if the 
> transaction is over -- so we're not likely to use the pointers ever 
> again anyway.

Backup job may finish even earlier than all jobs are added to the list. 
(same case, empty dirty bitmap for one of drives).

>
> Still, it's good practice, and the caller uses a safe iteration of the 
> list, so I think this should be safe.
>
> But I don't think this SHOULD fix an actual bug. If it does, I think 
> something else is wrong.
>
> Tested-by: John Snow <jsnow@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
John Snow Aug. 2, 2016, 4:50 p.m. UTC | #3
On 08/02/2016 07:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.08.2016 01:39, John Snow wrote:
>> On 07/27/2016 06:49 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Job may be freed in block_job_unref and in this case this would break
>>> transaction QLIST.
>>>
>>> Fix this by removing job from this list before unref.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>  blockjob.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/blockjob.c b/blockjob.c
>>> index a5ba3be..e045091 100644
>>> --- a/blockjob.c
>>> +++ b/blockjob.c
>>> @@ -216,6 +216,7 @@ static void block_job_completed_single(BlockJob
>>> *job)
>>>      }
>>>      job->cb(job->opaque, job->ret);
>>>      if (job->txn) {
>>> +        QLIST_REMOVE(job, txn_list);
>>>          block_job_txn_unref(job->txn);
>>>      }
>>>      block_job_unref(job);
>>>
>>
>> Has this caused actual problems for you?
>
> Yes, with the same changed test 124 (my parallel thread). Backup job can
> finish  too early (if dirty bitmap is empty) and then we use this
> transaction job list with dead pointer.
>
>>
>> This function is only ever called in a transactional context if the
>> transaction is over -- so we're not likely to use the pointers ever
>> again anyway.
>
> Backup job may finish even earlier than all jobs are added to the list.
> (same case, empty dirty bitmap for one of drives).
>

AHA, I get it now.

I think the right solution will be a general mechanism at the 
transactional level, not backup-specific hacks, but thank you for 
explaining this to me.

>>
>> Still, it's good practice, and the caller uses a safe iteration of the
>> list, so I think this should be safe.
>>
>> But I don't think this SHOULD fix an actual bug. If it does, I think
>> something else is wrong.
>>
>> Tested-by: John Snow <jsnow@redhat.com>
>> Reviewed-by: John Snow <jsnow@redhat.com>
>
>
diff mbox

Patch

diff --git a/blockjob.c b/blockjob.c
index a5ba3be..e045091 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -216,6 +216,7 @@  static void block_job_completed_single(BlockJob *job)
     }
     job->cb(job->opaque, job->ret);
     if (job->txn) {
+        QLIST_REMOVE(job, txn_list);
         block_job_txn_unref(job->txn);
     }
     block_job_unref(job);