diff mbox series

[v2,3/5] automation: Add the expect script with test case for FVP

Message ID 20231208054637.1973424-4-Henry.Wang@arm.com (mailing list archive)
State New, archived
Headers show
Series automation: Support running FVP Dom0 smoke test for Arm | expand

Commit Message

Henry Wang Dec. 8, 2023, 5:46 a.m. UTC
To interact with the FVP (for example entering the U-Boot shell
and transferring the files by TFTP), we need to connect the
corresponding port by the telnet first. Use an expect script to
simplify the automation of the whole "interacting with FVP" stuff.

The expect script will firstly detect the IP of the host, then
connect to the telnet port of the FVP, set the `serverip` and `ipaddr`
for the TFTP service in the U-Boot shell, and finally boot Xen from
U-Boot and wait for the expected results by Xen, Dom0 and DomU.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v2:
- No change.
---
 .../expect/fvp-base-smoke-dom0-arm64.exp      | 73 +++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100755 automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp

Comments

Michal Orzel Dec. 8, 2023, 8:57 a.m. UTC | #1
Hi Henry,

On 08/12/2023 06:46, Henry Wang wrote:
> 
> 
> To interact with the FVP (for example entering the U-Boot shell
> and transferring the files by TFTP), we need to connect the
> corresponding port by the telnet first. Use an expect script to
> simplify the automation of the whole "interacting with FVP" stuff.
> 
> The expect script will firstly detect the IP of the host, then
> connect to the telnet port of the FVP, set the `serverip` and `ipaddr`
> for the TFTP service in the U-Boot shell, and finally boot Xen from
> U-Boot and wait for the expected results by Xen, Dom0 and DomU.
> 
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

with 1 question...

> ---
> v2:
> - No change.
> ---
>  .../expect/fvp-base-smoke-dom0-arm64.exp      | 73 +++++++++++++++++++
>  1 file changed, 73 insertions(+)
>  create mode 100755 automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
> 
> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
> new file mode 100755
> index 0000000000..25d9a5f81c
> --- /dev/null
> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
> @@ -0,0 +1,73 @@
> +#!/usr/bin/expect
> +
> +set timeout 2000
Do we really need such a big timeout (~30 min)?
Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
+ send_slow slows things down)

~Michal
Henry Wang Dec. 8, 2023, 9:05 a.m. UTC | #2
Hi Michal,

> On Dec 8, 2023, at 16:57, Michal Orzel <michal.orzel@amd.com> wrote:
> 
> Hi Henry,
> 
> On 08/12/2023 06:46, Henry Wang wrote:
>> 
>> 
>> To interact with the FVP (for example entering the U-Boot shell
>> and transferring the files by TFTP), we need to connect the
>> corresponding port by the telnet first. Use an expect script to
>> simplify the automation of the whole "interacting with FVP" stuff.
>> 
>> The expect script will firstly detect the IP of the host, then
>> connect to the telnet port of the FVP, set the `serverip` and `ipaddr`
>> for the TFTP service in the U-Boot shell, and finally boot Xen from
>> U-Boot and wait for the expected results by Xen, Dom0 and DomU.
>> 
>> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
> Reviewed-by: Michal Orzel <michal.orzel@amd.com>

Thanks!

> with 1 question...
> 
>> ---
>> v2:
>> - No change.
>> ---
>> .../expect/fvp-base-smoke-dom0-arm64.exp      | 73 +++++++++++++++++++
>> 1 file changed, 73 insertions(+)
>> create mode 100755 automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>> 
>> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>> new file mode 100755
>> index 0000000000..25d9a5f81c
>> --- /dev/null
>> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>> @@ -0,0 +1,73 @@
>> +#!/usr/bin/expect
>> +
>> +set timeout 2000
> Do we really need such a big timeout (~30 min)?
> Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
> + send_slow slows things down)

This is a really good question. I did have the same question while working on
the negative test today. The timeout 2000 indeed will fail the job at about 30min,
and waiting for it is indeed not really pleasant.

But my second thought would be - from my observation, the overall time now
would vary between 15min ~ 20min, and having a 10min margin is not that crazy
given that we probably will do more testing from the job in the future, and if the
GitLab Arm worker is high loaded, FVP will probably become slower. And normally
we don’t even trigger the timeout as the job will normally pass. So I decided
to keep this.

Mind sharing your thoughts about the better value of the timeout? Probably 25min?

Kind regards,
Henry

> 
> ~Michal
>
Michal Orzel Dec. 8, 2023, 9:11 a.m. UTC | #3
On 08/12/2023 10:05, Henry Wang wrote:
> 
> 
> Hi Michal,
> 
>> On Dec 8, 2023, at 16:57, Michal Orzel <michal.orzel@amd.com> wrote:
>>
>> Hi Henry,
>>
>> On 08/12/2023 06:46, Henry Wang wrote:
>>>
>>>
>>> To interact with the FVP (for example entering the U-Boot shell
>>> and transferring the files by TFTP), we need to connect the
>>> corresponding port by the telnet first. Use an expect script to
>>> simplify the automation of the whole "interacting with FVP" stuff.
>>>
>>> The expect script will firstly detect the IP of the host, then
>>> connect to the telnet port of the FVP, set the `serverip` and `ipaddr`
>>> for the TFTP service in the U-Boot shell, and finally boot Xen from
>>> U-Boot and wait for the expected results by Xen, Dom0 and DomU.
>>>
>>> Signed-off-by: Henry Wang <Henry.Wang@arm.com>
>> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
> 
> Thanks!
> 
>> with 1 question...
>>
>>> ---
>>> v2:
>>> - No change.
>>> ---
>>> .../expect/fvp-base-smoke-dom0-arm64.exp      | 73 +++++++++++++++++++
>>> 1 file changed, 73 insertions(+)
>>> create mode 100755 automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>
>>> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>> new file mode 100755
>>> index 0000000000..25d9a5f81c
>>> --- /dev/null
>>> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>> @@ -0,0 +1,73 @@
>>> +#!/usr/bin/expect
>>> +
>>> +set timeout 2000
>> Do we really need such a big timeout (~30 min)?
>> Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
>> + send_slow slows things down)
> 
> This is a really good question. I did have the same question while working on
> the negative test today. The timeout 2000 indeed will fail the job at about 30min,
> and waiting for it is indeed not really pleasant.
> 
> But my second thought would be - from my observation, the overall time now
> would vary between 15min ~ 20min, and having a 10min margin is not that crazy
> given that we probably will do more testing from the job in the future, and if the
> GitLab Arm worker is high loaded, FVP will probably become slower. And normally
> we don’t even trigger the timeout as the job will normally pass. So I decided
> to keep this.
> 
> Mind sharing your thoughts about the better value of the timeout? Probably 25min?
From what you said that the average is 15-20, I think we can leave it set to 30.
But I wonder if we can do something to decrease the average time. ~20 min is a lot
even for FVP :) Have you tried setting send_slow to something lower than 100ms?
That said, we don't send too many chars to FVP, so I doubt it would play a major role
in the overall time.

I use FVP quite rarely these days, so you should know better if this can be perceived as
usual/normal behavior.

~Michal
Henry Wang Dec. 8, 2023, 9:21 a.m. UTC | #4
Hi Michal,

> On Dec 8, 2023, at 17:11, Michal Orzel <michal.orzel@amd.com> wrote:
> On 08/12/2023 10:05, Henry Wang wrote:
>> 
>> Hi Michal,
>> 
>>> On Dec 8, 2023, at 16:57, Michal Orzel <michal.orzel@amd.com> wrote:
>>> 
>>> Hi Henry,
>>> 
>>> On 08/12/2023 06:46, Henry Wang wrote:
>>>> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>> new file mode 100755
>>>> index 0000000000..25d9a5f81c
>>>> --- /dev/null
>>>> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>> @@ -0,0 +1,73 @@
>>>> +#!/usr/bin/expect
>>>> +
>>>> +set timeout 2000
>>> Do we really need such a big timeout (~30 min)?
>>> Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
>>> + send_slow slows things down)
>> 
>> This is a really good question. I did have the same question while working on
>> the negative test today. The timeout 2000 indeed will fail the job at about 30min,
>> and waiting for it is indeed not really pleasant.
>> 
>> But my second thought would be - from my observation, the overall time now
>> would vary between 15min ~ 20min, and having a 10min margin is not that crazy
>> given that we probably will do more testing from the job in the future, and if the
>> GitLab Arm worker is high loaded, FVP will probably become slower. And normally
>> we don’t even trigger the timeout as the job will normally pass. So I decided
>> to keep this.
>> 
>> Mind sharing your thoughts about the better value of the timeout? Probably 25min?
> From what you said that the average is 15-20, I think we can leave it set to 30.
> But I wonder if we can do something to decrease the average time. ~20 min is a lot
> even for FVP :) Have you tried setting send_slow to something lower than 100ms?
> That said, we don't send too many chars to FVP, so I doubt it would play a major role
> in the overall time.

I agree with the send_slow part. Actually I do have the same concern, here are my current
understanding and I think you will definitely help with your knowledge:
If you check the full log of Dom0 booting, for example [1], you will find that we wasted so
much time in starting the services of the OS (modloop, udev-settle, etc). All of these services
are retried many times but in the end they are still not up, and from my understanding they
won’t affect the actual test(?) If we can somehow get rid of these services from rootfs, I think
we can save a lot of time.

And honestly, I noticed that qemu-alpine-arm64-gcc suffers from the same problem and it also
takes around 15min to finish. So if we managed to tailor the services from the filesystem, we
can save a lot of time.

But I found it difficult to do the proper tailoring, any suggestions?

[1] https://gitlab.com/xen-project/people/henryw/xen/-/jobs/5708557850/artifacts/file/smoke.serial

Kind regards,
Henry

> I use FVP quite rarely these days, so you should know better if this can be perceived as
> usual/normal behavior.
> 
> ~Michal
>
Michal Orzel Dec. 8, 2023, 9:50 a.m. UTC | #5
On 08/12/2023 10:21, Henry Wang wrote:
> 
> 
> Hi Michal,
> 
>> On Dec 8, 2023, at 17:11, Michal Orzel <michal.orzel@amd.com> wrote:
>> On 08/12/2023 10:05, Henry Wang wrote:
>>>
>>> Hi Michal,
>>>
>>>> On Dec 8, 2023, at 16:57, Michal Orzel <michal.orzel@amd.com> wrote:
>>>>
>>>> Hi Henry,
>>>>
>>>> On 08/12/2023 06:46, Henry Wang wrote:
>>>>> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>> new file mode 100755
>>>>> index 0000000000..25d9a5f81c
>>>>> --- /dev/null
>>>>> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>> @@ -0,0 +1,73 @@
>>>>> +#!/usr/bin/expect
>>>>> +
>>>>> +set timeout 2000
>>>> Do we really need such a big timeout (~30 min)?
>>>> Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
>>>> + send_slow slows things down)
>>>
>>> This is a really good question. I did have the same question while working on
>>> the negative test today. The timeout 2000 indeed will fail the job at about 30min,
>>> and waiting for it is indeed not really pleasant.
>>>
>>> But my second thought would be - from my observation, the overall time now
>>> would vary between 15min ~ 20min, and having a 10min margin is not that crazy
>>> given that we probably will do more testing from the job in the future, and if the
>>> GitLab Arm worker is high loaded, FVP will probably become slower. And normally
>>> we don’t even trigger the timeout as the job will normally pass. So I decided
>>> to keep this.
>>>
>>> Mind sharing your thoughts about the better value of the timeout? Probably 25min?
>> From what you said that the average is 15-20, I think we can leave it set to 30.
>> But I wonder if we can do something to decrease the average time. ~20 min is a lot
>> even for FVP :) Have you tried setting send_slow to something lower than 100ms?
>> That said, we don't send too many chars to FVP, so I doubt it would play a major role
>> in the overall time.
> 
> I agree with the send_slow part. Actually I do have the same concern, here are my current
> understanding and I think you will definitely help with your knowledge:
> If you check the full log of Dom0 booting, for example [1], you will find that we wasted so
> much time in starting the services of the OS (modloop, udev-settle, etc). All of these services
> are retried many times but in the end they are still not up, and from my understanding they
> won’t affect the actual test(?) If we can somehow get rid of these services from rootfs, I think
> we can save a lot of time.
> 
> And honestly, I noticed that qemu-alpine-arm64-gcc suffers from the same problem and it also
> takes around 15min to finish. So if we managed to tailor the services from the filesystem, we
> can save a lot of time.
That is not true. Qemu runs the tests relatively fast within few minutes. The reason you see e.g. 12 mins
for some Qemu jobs comes from the timeout we set in Qemu scripts. We don't have yet the solution (we could
do the same as Qubes script) to detect the test success early and exit before timeout. That is why currently
the only way for Qemu tests to finish is by reaching the timeout.

So the problem is not with the rootfs and services (the improvement would not be significant) but with
the simulation being slow. That said, this is something we all know and I expect FVP to only be used in scenarios
which cannot be tested using Qemu or real HW.

~Michal
Henry Wang Dec. 8, 2023, 9:56 a.m. UTC | #6
Hi Michal,

> On Dec 8, 2023, at 17:50, Michal Orzel <michal.orzel@amd.com> wrote:
> On 08/12/2023 10:21, Henry Wang wrote:
>> 
>> 
>> Hi Michal,
>> 
>>> On Dec 8, 2023, at 17:11, Michal Orzel <michal.orzel@amd.com> wrote:
>>> On 08/12/2023 10:05, Henry Wang wrote:
>>>> 
>>>> Hi Michal,
>>>> 
>>>>> On Dec 8, 2023, at 16:57, Michal Orzel <michal.orzel@amd.com> wrote:
>>>>> 
>>>>> Hi Henry,
>>>>> 
>>>>> On 08/12/2023 06:46, Henry Wang wrote:
>>>>>> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>> new file mode 100755
>>>>>> index 0000000000..25d9a5f81c
>>>>>> --- /dev/null
>>>>>> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>> @@ -0,0 +1,73 @@
>>>>>> +#!/usr/bin/expect
>>>>>> +
>>>>>> +set timeout 2000
>>>>> Do we really need such a big timeout (~30 min)?
>>>>> Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
>>>>> + send_slow slows things down)
>>>> 
>>>> This is a really good question. I did have the same question while working on
>>>> the negative test today. The timeout 2000 indeed will fail the job at about 30min,
>>>> and waiting for it is indeed not really pleasant.
>>>> 
>>>> But my second thought would be - from my observation, the overall time now
>>>> would vary between 15min ~ 20min, and having a 10min margin is not that crazy
>>>> given that we probably will do more testing from the job in the future, and if the
>>>> GitLab Arm worker is high loaded, FVP will probably become slower. And normally
>>>> we don’t even trigger the timeout as the job will normally pass. So I decided
>>>> to keep this.
>>>> 
>>>> Mind sharing your thoughts about the better value of the timeout? Probably 25min?
>>> From what you said that the average is 15-20, I think we can leave it set to 30.
>>> But I wonder if we can do something to decrease the average time. ~20 min is a lot
>>> even for FVP :) Have you tried setting send_slow to something lower than 100ms?
>>> That said, we don't send too many chars to FVP, so I doubt it would play a major role
>>> in the overall time.
>> 
>> I agree with the send_slow part. Actually I do have the same concern, here are my current
>> understanding and I think you will definitely help with your knowledge:
>> If you check the full log of Dom0 booting, for example [1], you will find that we wasted so
>> much time in starting the services of the OS (modloop, udev-settle, etc). All of these services
>> are retried many times but in the end they are still not up, and from my understanding they
>> won’t affect the actual test(?) If we can somehow get rid of these services from rootfs, I think
>> we can save a lot of time.
>> 
>> And honestly, I noticed that qemu-alpine-arm64-gcc suffers from the same problem and it also
>> takes around 15min to finish. So if we managed to tailor the services from the filesystem, we
>> can save a lot of time.
> That is not true. Qemu runs the tests relatively fast within few minutes. The reason you see e.g. 12 mins
> for some Qemu jobs comes from the timeout we set in Qemu scripts. We don't have yet the solution (we could
> do the same as Qubes script) to detect the test success early and exit before timeout. That is why currently
> the only way for Qemu tests to finish is by reaching the timeout.
> 
> So the problem is not with the rootfs and services (the improvement would not be significant) but with
> the simulation being slow. That said, this is something we all know and I expect FVP to only be used in scenarios
> which cannot be tested using Qemu or real HW.

Ok, you made a point. Let me do some experiments to see if I can improve. Otherwise maybe
we can live with this until a better solution.

Kind regards,
Henry

> 
> ~Michal
Wei Chen Dec. 8, 2023, 11:13 a.m. UTC | #7
Hi Henry and Michal,

On 2023/12/8 17:56, Henry Wang wrote:
> Hi Michal,
> 
>> On Dec 8, 2023, at 17:50, Michal Orzel <michal.orzel@amd.com> wrote:
>> On 08/12/2023 10:21, Henry Wang wrote:
>>>
>>>
>>> Hi Michal,
>>>
>>>> On Dec 8, 2023, at 17:11, Michal Orzel <michal.orzel@amd.com> wrote:
>>>> On 08/12/2023 10:05, Henry Wang wrote:
>>>>>
>>>>> Hi Michal,
>>>>>
>>>>>> On Dec 8, 2023, at 16:57, Michal Orzel <michal.orzel@amd.com> wrote:
>>>>>>
>>>>>> Hi Henry,
>>>>>>
>>>>>> On 08/12/2023 06:46, Henry Wang wrote:
>>>>>>> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>>> new file mode 100755
>>>>>>> index 0000000000..25d9a5f81c
>>>>>>> --- /dev/null
>>>>>>> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>>> @@ -0,0 +1,73 @@
>>>>>>> +#!/usr/bin/expect
>>>>>>> +
>>>>>>> +set timeout 2000
>>>>>> Do we really need such a big timeout (~30 min)?
>>>>>> Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
>>>>>> + send_slow slows things down)
>>>>>
>>>>> This is a really good question. I did have the same question while working on
>>>>> the negative test today. The timeout 2000 indeed will fail the job at about 30min,
>>>>> and waiting for it is indeed not really pleasant.
>>>>>
>>>>> But my second thought would be - from my observation, the overall time now
>>>>> would vary between 15min ~ 20min, and having a 10min margin is not that crazy
>>>>> given that we probably will do more testing from the job in the future, and if the
>>>>> GitLab Arm worker is high loaded, FVP will probably become slower. And normally
>>>>> we don’t even trigger the timeout as the job will normally pass. So I decided
>>>>> to keep this.
>>>>>
>>>>> Mind sharing your thoughts about the better value of the timeout? Probably 25min?
>>>>  From what you said that the average is 15-20, I think we can leave it set to 30.
>>>> But I wonder if we can do something to decrease the average time. ~20 min is a lot
>>>> even for FVP :) Have you tried setting send_slow to something lower than 100ms?
>>>> That said, we don't send too many chars to FVP, so I doubt it would play a major role
>>>> in the overall time.
>>>
>>> I agree with the send_slow part. Actually I do have the same concern, here are my current
>>> understanding and I think you will definitely help with your knowledge:
>>> If you check the full log of Dom0 booting, for example [1], you will find that we wasted so
>>> much time in starting the services of the OS (modloop, udev-settle, etc). All of these services
>>> are retried many times but in the end they are still not up, and from my understanding they
>>> won’t affect the actual test(?) If we can somehow get rid of these services from rootfs, I think
>>> we can save a lot of time.
>>>
>>> And honestly, I noticed that qemu-alpine-arm64-gcc suffers from the same problem and it also
>>> takes around 15min to finish. So if we managed to tailor the services from the filesystem, we
>>> can save a lot of time.
>> That is not true. Qemu runs the tests relatively fast within few minutes. The reason you see e.g. 12 mins
>> for some Qemu jobs comes from the timeout we set in Qemu scripts. We don't have yet the solution (we could
>> do the same as Qubes script) to detect the test success early and exit before timeout. That is why currently
>> the only way for Qemu tests to finish is by reaching the timeout.
>>
>> So the problem is not with the rootfs and services (the improvement would not be significant) but with
>> the simulation being slow. That said, this is something we all know and I expect FVP to only be used in scenarios
>> which cannot be tested using Qemu or real HW.
> 
> Ok, you made a point. Let me do some experiments to see if I can improve. Otherwise maybe
> we can live with this until a better solution.
> 
> Kind regards,
> Henry
> 

QEMU works like FVP enabled use_real_time flag. How about enable 
use_real_time flag in CI for most test cases, but disable it for
some time sensitive test cases? Normally, enable use_real_time
will give several times improvement of FVP performance.

Cheers,
Wei Chen

>>
>> ~Michal
>
Julien Grall Dec. 8, 2023, 12:05 p.m. UTC | #8
Hi,

On 08/12/2023 09:50, Michal Orzel wrote:
> On 08/12/2023 10:21, Henry Wang wrote:
>>> On Dec 8, 2023, at 17:11, Michal Orzel <michal.orzel@amd.com> wrote:
>>> On 08/12/2023 10:05, Henry Wang wrote:
>>>>
>>>> Hi Michal,
>>>>
>>>>> On Dec 8, 2023, at 16:57, Michal Orzel <michal.orzel@amd.com> wrote:
>>>>>
>>>>> Hi Henry,
>>>>>
>>>>> On 08/12/2023 06:46, Henry Wang wrote:
>>>>>> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>> new file mode 100755
>>>>>> index 0000000000..25d9a5f81c
>>>>>> --- /dev/null
>>>>>> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>> @@ -0,0 +1,73 @@
>>>>>> +#!/usr/bin/expect
>>>>>> +
>>>>>> +set timeout 2000
>>>>> Do we really need such a big timeout (~30 min)?
>>>>> Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
>>>>> + send_slow slows things down)
>>>>
>>>> This is a really good question. I did have the same question while working on
>>>> the negative test today. The timeout 2000 indeed will fail the job at about 30min,
>>>> and waiting for it is indeed not really pleasant.
>>>>
>>>> But my second thought would be - from my observation, the overall time now
>>>> would vary between 15min ~ 20min, and having a 10min margin is not that crazy
>>>> given that we probably will do more testing from the job in the future, and if the
>>>> GitLab Arm worker is high loaded, FVP will probably become slower. And normally
>>>> we don’t even trigger the timeout as the job will normally pass. So I decided
>>>> to keep this.
>>>>
>>>> Mind sharing your thoughts about the better value of the timeout? Probably 25min?
>>>  From what you said that the average is 15-20, I think we can leave it set to 30.
>>> But I wonder if we can do something to decrease the average time. ~20 min is a lot
>>> even for FVP :) Have you tried setting send_slow to something lower than 100ms?
>>> That said, we don't send too many chars to FVP, so I doubt it would play a major role
>>> in the overall time.
>>
>> I agree with the send_slow part. Actually I do have the same concern, here are my current
>> understanding and I think you will definitely help with your knowledge:
>> If you check the full log of Dom0 booting, for example [1], you will find that we wasted so
>> much time in starting the services of the OS (modloop, udev-settle, etc). All of these services
>> are retried many times but in the end they are still not up, and from my understanding they
>> won’t affect the actual test(?) If we can somehow get rid of these services from rootfs, I think
>> we can save a lot of time.
>>
>> And honestly, I noticed that qemu-alpine-arm64-gcc suffers from the same problem and it also
>> takes around 15min to finish. So if we managed to tailor the services from the filesystem, we
>> can save a lot of time.
> That is not true. Qemu runs the tests relatively fast within few minutes. The reason you see e.g. 12 mins
> for some Qemu jobs comes from the timeout we set in Qemu scripts. We don't have yet the solution (we could
> do the same as Qubes script) to detect the test success early and exit before timeout. That is why currently
> the only way for Qemu tests to finish is by reaching the timeout.
> 
> So the problem is not with the rootfs and services (the improvement would not be significant) but with
> the simulation being slow.

 From my experience with the FVP improvement would be significant. A 
normal boot distribution will start a lot of services. I end up to write 
my own initscript doing the bare minimum for creating a guest. This 
saves me a lot of time everytime I needed to test on FVP.

I think we can do the same for the gitlab. Maybe not to the point of 
writing your initscript but cutting down anything unnecessary.

This will avoid the FVP test to become the bottlneck in the gitlab CI.

Chers,
Henry Wang Dec. 8, 2023, 12:17 p.m. UTC | #9
Hi Julien,

> On Dec 8, 2023, at 20:05, Julien Grall <julien@xen.org> wrote:
> 
> Hi,
> 
> On 08/12/2023 09:50, Michal Orzel wrote:
>> On 08/12/2023 10:21, Henry Wang wrote:
>>>> On Dec 8, 2023, at 17:11, Michal Orzel <michal.orzel@amd.com> wrote:
>>>> On 08/12/2023 10:05, Henry Wang wrote:
>>>>> 
>>>>> Hi Michal,
>>>>> 
>>>>>> On Dec 8, 2023, at 16:57, Michal Orzel <michal.orzel@amd.com> wrote:
>>>>>> 
>>>>>> Hi Henry,
>>>>>> 
>>>>>> On 08/12/2023 06:46, Henry Wang wrote:
>>>>>>> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>>> new file mode 100755
>>>>>>> index 0000000000..25d9a5f81c
>>>>>>> --- /dev/null
>>>>>>> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>>> @@ -0,0 +1,73 @@
>>>>>>> +#!/usr/bin/expect
>>>>>>> +
>>>>>>> +set timeout 2000
>>>>>> Do we really need such a big timeout (~30 min)?
>>>>>> Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
>>>>>> + send_slow slows things down)
>>>>> 
>>>>> This is a really good question. I did have the same question while working on
>>>>> the negative test today. The timeout 2000 indeed will fail the job at about 30min,
>>>>> and waiting for it is indeed not really pleasant.
>>>>> 
>>>>> But my second thought would be - from my observation, the overall time now
>>>>> would vary between 15min ~ 20min, and having a 10min margin is not that crazy
>>>>> given that we probably will do more testing from the job in the future, and if the
>>>>> GitLab Arm worker is high loaded, FVP will probably become slower. And normally
>>>>> we don’t even trigger the timeout as the job will normally pass. So I decided
>>>>> to keep this.
>>>>> 
>>>>> Mind sharing your thoughts about the better value of the timeout? Probably 25min?
>>>> From what you said that the average is 15-20, I think we can leave it set to 30.
>>>> But I wonder if we can do something to decrease the average time. ~20 min is a lot
>>>> even for FVP :) Have you tried setting send_slow to something lower than 100ms?
>>>> That said, we don't send too many chars to FVP, so I doubt it would play a major role
>>>> in the overall time.
>>> 
>>> I agree with the send_slow part. Actually I do have the same concern, here are my current
>>> understanding and I think you will definitely help with your knowledge:
>>> If you check the full log of Dom0 booting, for example [1], you will find that we wasted so
>>> much time in starting the services of the OS (modloop, udev-settle, etc). All of these services
>>> are retried many times but in the end they are still not up, and from my understanding they
>>> won’t affect the actual test(?) If we can somehow get rid of these services from rootfs, I think
>>> we can save a lot of time.
>>> 
>>> And honestly, I noticed that qemu-alpine-arm64-gcc suffers from the same problem and it also
>>> takes around 15min to finish. So if we managed to tailor the services from the filesystem, we
>>> can save a lot of time.
>> That is not true. Qemu runs the tests relatively fast within few minutes. The reason you see e.g. 12 mins
>> for some Qemu jobs comes from the timeout we set in Qemu scripts. We don't have yet the solution (we could
>> do the same as Qubes script) to detect the test success early and exit before timeout. That is why currently
>> the only way for Qemu tests to finish is by reaching the timeout.
>> So the problem is not with the rootfs and services (the improvement would not be significant) but with
>> the simulation being slow.
> 
> From my experience with the FVP improvement would be significant. A normal boot distribution will start a lot of services. I end up to write my own initscript doing the bare minimum for creating a guest. This saves me a lot of time everytime I needed to test on FVP.

+1, I feel the same, but I've never done the time measurement though.

> I think we can do the same for the gitlab. Maybe not to the point of writing your initscript but cutting down anything unnecessary.

Yeah I can try to play with removing some of the unnecessary services when preparing the rootfs
for Dom0 (see patch 4).

Kind regards,
Henry

> This will avoid the FVP test to become the bottlneck in the gitlab CI.
> 
> Chers,
> 
> -- 
> Julien Grall
Henry Wang Dec. 8, 2023, 12:27 p.m. UTC | #10
Hi Wei, Michal,

> On Dec 8, 2023, at 19:13, Wei Chen <Wei.Chen@arm.com> wrote:
> 
> Hi Henry and Michal,
> 
>>>>>>> On 08/12/2023 06:46, Henry Wang wrote:
>>>>>>>> diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>>>> new file mode 100755
>>>>>>>> index 0000000000..25d9a5f81c
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
>>>>>>>> @@ -0,0 +1,73 @@
>>>>>>>> +#!/usr/bin/expect
>>>>>>>> +
>>>>>>>> +set timeout 2000
>>>>>>> Do we really need such a big timeout (~30 min)?
>>>>>>> Looking at your test job, it took 16 mins (quite a lot but I know FVP is slow
>>>>>>> + send_slow slows things down)
>>>>>> 
>>>>>> This is a really good question. I did have the same question while working on
>>>>>> the negative test today. The timeout 2000 indeed will fail the job at about 30min,
>>>>>> and waiting for it is indeed not really pleasant.
>>>>>> 
>>>>>> But my second thought would be - from my observation, the overall time now
>>>>>> would vary between 15min ~ 20min, and having a 10min margin is not that crazy
>>>>>> given that we probably will do more testing from the job in the future, and if the
>>>>>> GitLab Arm worker is high loaded, FVP will probably become slower. And normally
>>>>>> we don’t even trigger the timeout as the job will normally pass. So I decided
>>>>>> to keep this.
>>>>>> 
>>>>>> Mind sharing your thoughts about the better value of the timeout? Probably 25min?
>>>>> From what you said that the average is 15-20, I think we can leave it set to 30.
>>>>> But I wonder if we can do something to decrease the average time. ~20 min is a lot
>>>>> even for FVP :) Have you tried setting send_slow to something lower than 100ms?
>>>>> That said, we don't send too many chars to FVP, so I doubt it would play a major role
>>>>> in the overall time.
>>>> 
>>>> I agree with the send_slow part. Actually I do have the same concern, here are my current
>>>> understanding and I think you will definitely help with your knowledge:
>>>> If you check the full log of Dom0 booting, for example [1], you will find that we wasted so
>>>> much time in starting the services of the OS (modloop, udev-settle, etc). All of these services
>>>> are retried many times but in the end they are still not up, and from my understanding they
>>>> won’t affect the actual test(?) If we can somehow get rid of these services from rootfs, I think
>>>> we can save a lot of time.
>>>> 
>>>> And honestly, I noticed that qemu-alpine-arm64-gcc suffers from the same problem and it also
>>>> takes around 15min to finish. So if we managed to tailor the services from the filesystem, we
>>>> can save a lot of time.
>>> That is not true. Qemu runs the tests relatively fast within few minutes. The reason you see e.g. 12 mins
>>> for some Qemu jobs comes from the timeout we set in Qemu scripts. We don't have yet the solution (we could
>>> do the same as Qubes script) to detect the test success early and exit before timeout. That is why currently
>>> the only way for Qemu tests to finish is by reaching the timeout.
>>> 
>>> So the problem is not with the rootfs and services (the improvement would not be significant) but with
>>> the simulation being slow. That said, this is something we all know and I expect FVP to only be used in scenarios
>>> which cannot be tested using Qemu or real HW.
>> Ok, you made a point. Let me do some experiments to see if I can improve. Otherwise maybe
>> we can live with this until a better solution.
>> Kind regards,
>> Henry
> 
> QEMU works like FVP enabled use_real_time flag. How about enable use_real_time flag in CI for most test cases, but disable it for
> some time sensitive test cases? Normally, enable use_real_time
> will give several times improvement of FVP performance.

I am seeing below from the FVP parameter lists of the one we are currently using. The "Deprecated" word worries
me a bit (the old version FVP does not have the “Deprecated" though).
```
bp.refcounter.use_real_time=0                         # (bool  , init-time) default = '0'      : **Deprecated, this parameter will be removed in future versions** Update the Generic Timer counter at a real-time base frequency instead of simulator time
```

Also, from my testing in the GitLab pipeline, I was not able to see significant time improvement. So I guess
instead I will try what Julien suggests to see if things can be better.

Kind regards,
Henry 

> 
> Cheers,
> Wei Chen
> 
>>> 
>>> ~Michal
Stefano Stabellini Dec. 8, 2023, 9:30 p.m. UTC | #11
On Fri, 8 Dec 2023, Julien Grall wrote:
> On 08/12/2023 09:50, Michal Orzel wrote:
> > On 08/12/2023 10:21, Henry Wang wrote:
> > > > On Dec 8, 2023, at 17:11, Michal Orzel <michal.orzel@amd.com> wrote:
> > > > On 08/12/2023 10:05, Henry Wang wrote:
> > > > > 
> > > > > Hi Michal,
> > > > > 
> > > > > > On Dec 8, 2023, at 16:57, Michal Orzel <michal.orzel@amd.com> wrote:
> > > > > > 
> > > > > > Hi Henry,
> > > > > > 
> > > > > > On 08/12/2023 06:46, Henry Wang wrote:
> > > > > > > diff --git
> > > > > > > a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
> > > > > > > b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
> > > > > > > new file mode 100755
> > > > > > > index 0000000000..25d9a5f81c
> > > > > > > --- /dev/null
> > > > > > > +++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
> > > > > > > @@ -0,0 +1,73 @@
> > > > > > > +#!/usr/bin/expect
> > > > > > > +
> > > > > > > +set timeout 2000
> > > > > > Do we really need such a big timeout (~30 min)?
> > > > > > Looking at your test job, it took 16 mins (quite a lot but I know
> > > > > > FVP is slow
> > > > > > + send_slow slows things down)
> > > > > 
> > > > > This is a really good question. I did have the same question while
> > > > > working on
> > > > > the negative test today. The timeout 2000 indeed will fail the job at
> > > > > about 30min,
> > > > > and waiting for it is indeed not really pleasant.
> > > > > 
> > > > > But my second thought would be - from my observation, the overall time
> > > > > now
> > > > > would vary between 15min ~ 20min, and having a 10min margin is not
> > > > > that crazy
> > > > > given that we probably will do more testing from the job in the
> > > > > future, and if the
> > > > > GitLab Arm worker is high loaded, FVP will probably become slower. And
> > > > > normally
> > > > > we don’t even trigger the timeout as the job will normally pass. So I
> > > > > decided
> > > > > to keep this.
> > > > > 
> > > > > Mind sharing your thoughts about the better value of the timeout?
> > > > > Probably 25min?
> > > >  From what you said that the average is 15-20, I think we can leave it
> > > > set to 30.
> > > > But I wonder if we can do something to decrease the average time. ~20
> > > > min is a lot
> > > > even for FVP :) Have you tried setting send_slow to something lower than
> > > > 100ms?
> > > > That said, we don't send too many chars to FVP, so I doubt it would play
> > > > a major role
> > > > in the overall time.
> > > 
> > > I agree with the send_slow part. Actually I do have the same concern, here
> > > are my current
> > > understanding and I think you will definitely help with your knowledge:
> > > If you check the full log of Dom0 booting, for example [1], you will find
> > > that we wasted so
> > > much time in starting the services of the OS (modloop, udev-settle, etc).
> > > All of these services
> > > are retried many times but in the end they are still not up, and from my
> > > understanding they
> > > won’t affect the actual test(?) If we can somehow get rid of these
> > > services from rootfs, I think
> > > we can save a lot of time.
> > > 
> > > And honestly, I noticed that qemu-alpine-arm64-gcc suffers from the same
> > > problem and it also
> > > takes around 15min to finish. So if we managed to tailor the services from
> > > the filesystem, we
> > > can save a lot of time.
> > That is not true. Qemu runs the tests relatively fast within few minutes.
> > The reason you see e.g. 12 mins
> > for some Qemu jobs comes from the timeout we set in Qemu scripts. We don't
> > have yet the solution (we could
> > do the same as Qubes script) to detect the test success early and exit
> > before timeout. That is why currently
> > the only way for Qemu tests to finish is by reaching the timeout.
> > 
> > So the problem is not with the rootfs and services (the improvement would
> > not be significant) but with
> > the simulation being slow.
> 
> From my experience with the FVP improvement would be significant. A normal
> boot distribution will start a lot of services. I end up to write my own
> initscript doing the bare minimum for creating a guest. This saves me a lot of
> time everytime I needed to test on FVP.
> 
> I think we can do the same for the gitlab. Maybe not to the point of writing
> your initscript but cutting down anything unnecessary.
> 
> This will avoid the FVP test to become the bottlneck in the gitlab CI.

Along the same lines another idea would be to use busybox alone (no
Alpine Linux) as Dom0 rootfs. That's going to be faster, but you
cannot really use xl to create DomUs due to libraries and other
dependencies but you can for sure create additional guests using
Dom0less, see for instance
automation/scripts/qemu-smoke-dom0less-arm64.sh

So if you have troubles improving the boot times of Dom0 + xl create an
alternative would be to create two Linux dom0less DomUs both of them
with only busybox as ramdisk.
diff mbox series

Patch

diff --git a/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
new file mode 100755
index 0000000000..25d9a5f81c
--- /dev/null
+++ b/automation/scripts/expect/fvp-base-smoke-dom0-arm64.exp
@@ -0,0 +1,73 @@ 
+#!/usr/bin/expect
+
+set timeout 2000
+
+# Command to use to run must be given as first argument
+# if options are required, quotes must be used:
+# xxx.exp "cmd opt1 opt2"
+set runcmd [lindex $argv 0]
+
+# Maximum number of line to be printed, this can be used to prevent runs to
+# go forever on errors when Xen is rebooting
+set maxline 1000
+
+# Configure slow parameters used with send -s
+# This allows us to slow down console writes to prevent issues with slow
+# emulators or targets.
+# Format here is: {NUM TIME} which reads as wait TIME seconds each NUM of
+# characters, here we send 1 char each 100ms
+set send_slow {1 .1}
+
+proc test_boot {{maxline} {host_ip}} {
+    expect_after {
+        -re "(.*)\r" {
+            if {$maxline != 0} {
+                set maxline [expr {$maxline - 1}]
+                if {$maxline <= 0} {
+                    send_user "ERROR-Toomuch!\n"
+                    exit 2
+                }
+            }
+            exp_continue
+        }
+        timeout {send_user "ERROR-Timeout!\n"; exit 3}
+        eof {send_user "ERROR-EOF!\n"; exit 4}
+    }
+
+    # Extract the telnet port numbers from FVP output, because the telnet ports
+    # are not guaranteed to be fixed numbers.
+    expect -re {terminal_0: Listening for serial connection on port [0-9]+}
+    set terminal_0 $expect_out(0,string)
+    if {[regexp {port (\d+)} $terminal_0 match port_0]} {
+        puts "terminal_0 port is $port_0"
+    } else {
+        puts "terminal_0 port not found"
+        exit 5
+    }
+
+    spawn bash -c "telnet localhost $port_0"
+    expect -re "Hit any key to stop autoboot.*"
+    send -s "  \r"
+    send -s "setenv serverip $host_ip; setenv ipaddr $host_ip; tftpb 0x80200000 boot.scr; source 0x80200000\r"
+
+    # Initial Xen boot
+    expect -re "\(XEN\).*Freed .* init memory."
+
+    # Dom0 and DomU
+    expect -re "Domain-0.*"
+    expect -re "BusyBox.*"
+    expect -re "/ #.*"
+}
+
+# Get host IP
+spawn bash -c "hostname -I | awk '{print \$1}'"
+expect -re {(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})}
+set host_ip $expect_out(0,string)
+
+# Start the FVP and run the test
+spawn bash -c "$runcmd"
+
+test_boot 2000 "$host_ip"
+
+send_user "\nExecution with SUCCESS\n"
+exit 0