[v2,10/10] iotests/030: Unthrottle parallel jobs in reverse

Message ID	20211111120829.81329-11-hreitz@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ufWV=P6=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B0E8F61268 From: Hanna Reitz <hreitz@redhat.com> To: qemu-block@nongnu.org Subject: [PATCH v2 10/10] iotests/030: Unthrottle parallel jobs in reverse Date: Thu, 11 Nov 2021 13:08:29 +0100 Message-Id: <20211111120829.81329-11-hreitz@redhat.com> In-Reply-To: <20211111120829.81329-1-hreitz@redhat.com> References: <20211111120829.81329-1-hreitz@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Received-SPF: pass client-ip=170.10.133.124; envelope-from=hreitz@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.7, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	block: Attempt on fixing 030-reported errors \| expand [v2,00/10] block: Attempt on fixing 030-reported errors [v2,01/10] stream: Traverse graph after modification [v2,02/10] block: Manipulate children list in .attach/.detach [v2,03/10] block: Unite remove_empty_child and child_free [v2,04/10] block: Drop detached child from ignore list [v2,05/10] block: Pass BdrvChild ** to replace_child_noperm [v2,06/10] block: Restructure remove_file_or_backing_child() [v2,07/10] transactions: Invoke clean() after everything else [v2,08/10] block: Let replace_child_tran keep indirect pointer [v2,09/10] block: Let replace_child_noperm free children [v2,10/10] iotests/030: Unthrottle parallel jobs in reverse

Message ID

20211111120829.81329-11-hreitz@redhat.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B0E8F61268
From: Hanna Reitz <hreitz@redhat.com>
To: qemu-block@nongnu.org
Subject: [PATCH v2 10/10] iotests/030: Unthrottle parallel jobs in reverse
Date: Thu, 11 Nov 2021 13:08:29 +0100
Message-Id: <20211111120829.81329-11-hreitz@redhat.com>
In-Reply-To: <20211111120829.81329-1-hreitz@redhat.com>
References: <20211111120829.81329-1-hreitz@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="US-ASCII"
Received-SPF: pass client-ip=170.10.133.124; envelope-from=hreitz@redhat.com;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -34
X-Spam_score: -3.5
X-Spam_bar: ---
X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.7,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>,
 Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
 qemu-devel@nongnu.org
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>

Series

block: Attempt on fixing 030-reported errors | expand

Comments

Vladimir Sementsov-Ogievskiy Nov. 12, 2021, 4:25 p.m. UTC | #1

11.11.2021 15:08, Hanna Reitz wrote:
> See the comment for why this is necessary.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>   tests/qemu-iotests/030 | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
> index 5fb65b4bef..567bf1da67 100755
> --- a/tests/qemu-iotests/030
> +++ b/tests/qemu-iotests/030
> @@ -251,7 +251,16 @@ class TestParallelOps(iotests.QMPTestCase):
>                                    speed=1024)
>               self.assert_qmp(result, 'return', {})
>   
> -        for job in pending_jobs:
> +        # Do this in reverse: After unthrottling them, some jobs may finish
> +        # before we have unthrottled all of them.  This will drain their
> +        # subgraph, and this will make jobs above them advance (despite those
> +        # jobs on top being throttled).  In the worst case, all jobs below the
> +        # top one are finished before we can unthrottle it, and this makes it
> +        # advance so far that it completes before we can unthrottle it - which
> +        # results in an error.
> +        # Starting from the top (i.e. in reverse) does not have this problem:
> +        # When a job finishes, the ones below it are not advanced.

Hmm, interesting why only jobs above the finished job may advance in the situation..

Looks like something may change and this workaround will stop working.

Isn't it better just handle the error, and don't care if job was just finished?

Something like

if result['return'] != {}:
    # Job was finished during drain caused by finish of already unthrottled job
    self.assert_qmp(result, 'error/class', 'DeviceNotActive')

Next thing in the test case is checking for completion events, so we'll get all events anyway.


> +        for job in reversed(pending_jobs):
>               result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
>               self.assert_qmp(result, 'return', {})
>   
>

Hanna Czenczek Nov. 15, 2021, 1:56 p.m. UTC | #2

On 12.11.21 17:25, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2021 15:08, Hanna Reitz wrote:
>> See the comment for why this is necessary.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/030 | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
>> index 5fb65b4bef..567bf1da67 100755
>> --- a/tests/qemu-iotests/030
>> +++ b/tests/qemu-iotests/030
>> @@ -251,7 +251,16 @@ class TestParallelOps(iotests.QMPTestCase):
>>                                    speed=1024)
>>               self.assert_qmp(result, 'return', {})
>>   -        for job in pending_jobs:
>> +        # Do this in reverse: After unthrottling them, some jobs may 
>> finish
>> +        # before we have unthrottled all of them.  This will drain 
>> their
>> +        # subgraph, and this will make jobs above them advance 
>> (despite those
>> +        # jobs on top being throttled).  In the worst case, all jobs 
>> below the
>> +        # top one are finished before we can unthrottle it, and this 
>> makes it
>> +        # advance so far that it completes before we can unthrottle 
>> it - which
>> +        # results in an error.
>> +        # Starting from the top (i.e. in reverse) does not have this 
>> problem:
>> +        # When a job finishes, the ones below it are not advanced.
>
> Hmm, interesting why only jobs above the finished job may advance in 
> the situation..
>
> Looks like something may change and this workaround will stop working.
>
> Isn't it better just handle the error, and don't care if job was just 
> finished?
>
> Something like
>
> if result['return'] != {}:
>    # Job was finished during drain caused by finish of already 
> unthrottled job
>    self.assert_qmp(result, 'error/class', 'DeviceNotActive')

Well.  My explanation (excuse) is that I felt like this was the hack-ish 
solution that I could have gone for from the start without understanding 
what the issue is (and in fact it was the solution I used while 
debugging the other problems).  I went with `reversed()`, because that 
really addresses the problem.

You’re right in that it only addresses the problem for now and there’s a 
chance it might reappear.  If we want to go with ignoring 
DeviceNotActive errors, then I think we should at least query all block 
jobs before the unthrottle loop and see that at least at one point they 
were all running simultaneously.

I don’t really have a strong opinion.  We can exchange this patch now 
(though I’d rather not hold up the rest of the series for it), or have a 
patch on top later, or, well, just keep it for now.  I think the least 
stressful option would be to just fix it up later.

Hanna

Vladimir Sementsov-Ogievskiy Nov. 16, 2021, 8:20 a.m. UTC | #3

15.11.2021 16:56, Hanna Reitz wrote:
> On 12.11.21 17:25, Vladimir Sementsov-Ogievskiy wrote:
>> 11.11.2021 15:08, Hanna Reitz wrote:
>>> See the comment for why this is necessary.
>>>
>>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>>> ---
>>>   tests/qemu-iotests/030 | 11 ++++++++++-
>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
>>> index 5fb65b4bef..567bf1da67 100755
>>> --- a/tests/qemu-iotests/030
>>> +++ b/tests/qemu-iotests/030
>>> @@ -251,7 +251,16 @@ class TestParallelOps(iotests.QMPTestCase):
>>>                                    speed=1024)
>>>               self.assert_qmp(result, 'return', {})
>>>   -        for job in pending_jobs:
>>> +        # Do this in reverse: After unthrottling them, some jobs may finish
>>> +        # before we have unthrottled all of them.  This will drain their
>>> +        # subgraph, and this will make jobs above them advance (despite those
>>> +        # jobs on top being throttled).  In the worst case, all jobs below the
>>> +        # top one are finished before we can unthrottle it, and this makes it
>>> +        # advance so far that it completes before we can unthrottle it - which
>>> +        # results in an error.
>>> +        # Starting from the top (i.e. in reverse) does not have this problem:
>>> +        # When a job finishes, the ones below it are not advanced.
>>
>> Hmm, interesting why only jobs above the finished job may advance in the situation..
>>
>> Looks like something may change and this workaround will stop working.
>>
>> Isn't it better just handle the error, and don't care if job was just finished?
>>
>> Something like
>>
>> if result['return'] != {}:
>>    # Job was finished during drain caused by finish of already unthrottled job
>>    self.assert_qmp(result, 'error/class', 'DeviceNotActive')
> 
> Well.  My explanation (excuse) is that I felt like this was the hack-ish solution that I could have gone for from the start without understanding what the issue is (and in fact it was the solution I used while debugging the other problems).  I went with `reversed()`, because that really addresses the problem.
> 
> You’re right in that it only addresses the problem for now and there’s a chance it might reappear.  If we want to go with ignoring DeviceNotActive errors, then I think we should at least query all block jobs before the unthrottle loop and see that at least at one point they were all running simultaneously.
> 
> I don’t really have a strong opinion.  We can exchange this patch now (though I’d rather not hold up the rest of the series for it), or have a patch on top later, or, well, just keep it for now.  I think the least stressful option would be to just fix it up later.
> 

OK, I agree

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 5fb65b4bef..567bf1da67 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -251,7 +251,16 @@  class TestParallelOps(iotests.QMPTestCase):
                                  speed=1024)
             self.assert_qmp(result, 'return', {})
 
-        for job in pending_jobs:
+        # Do this in reverse: After unthrottling them, some jobs may finish
+        # before we have unthrottled all of them.  This will drain their
+        # subgraph, and this will make jobs above them advance (despite those
+        # jobs on top being throttled).  In the worst case, all jobs below the
+        # top one are finished before we can unthrottle it, and this makes it
+        # advance so far that it completes before we can unthrottle it - which
+        # results in an error.
+        # Starting from the top (i.e. in reverse) does not have this problem:
+        # When a job finishes, the ones below it are not advanced.
+        for job in reversed(pending_jobs):
             result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
             self.assert_qmp(result, 'return', {})

[v2,10/10] iotests/030: Unthrottle parallel jobs in reverse

Commit Message

Comments

Patch