mbox series

[v2,0/6] python/machine: use socketpair() for console socket

Message ID 20230725180337.2937292-1-jsnow@redhat.com (mailing list archive)
Headers show
Series python/machine: use socketpair() for console socket | expand

Message

John Snow July 25, 2023, 6:03 p.m. UTC
Like we did for the QMP socket, use socketpair() for the console socket
so that hopefully there isn't a race condition during early boot where
data might get dropped on the floor.

May or may not help with various race conditions where early console
output is not showing up in the logs and/or potentially being missed by
wait_for_console_pattern.

V2:
  - Fixed some Socket ownership/garbage collection problems
  - Fixed callers of now-dropped VM arguments/properties
  - added a dedicated sock_fd arg to ConsoleSocket()
  - now using socketpair() for qtest console, too.
  - dropped sock_dir arg from *all* machine.py classes
  - Tested quite a bit more thoroughly ...

CI: https://gitlab.com/jsnow/qemu/-/pipelines/945067498

John Snow (6):
  python/machine: move socket setup out of _base_args property
  python/machine: close sock_pair in cleanup path
  python/console_socket: accept existing FD in initializer
  python/machine: use socketpair() for console connections
  python/machine: use socketpair() for qtest connection
  python/machine: remove unused sock_dir argument

 python/qemu/machine/console_socket.py      | 29 ++++++++---
 python/qemu/machine/machine.py             | 58 +++++++++++++---------
 python/qemu/machine/qtest.py               | 54 +++++++++++++++-----
 tests/avocado/acpi-bits.py                 |  5 +-
 tests/avocado/avocado_qemu/__init__.py     |  2 +-
 tests/avocado/machine_aspeed.py            |  5 +-
 tests/qemu-iotests/iotests.py              |  2 +-
 tests/qemu-iotests/tests/copy-before-write |  3 +-
 8 files changed, 104 insertions(+), 54 deletions(-)

Comments

Peter Maydell July 27, 2023, 1:05 p.m. UTC | #1
On Tue, 25 Jul 2023 at 19:04, John Snow <jsnow@redhat.com> wrote:
>
> Like we did for the QMP socket, use socketpair() for the console socket
> so that hopefully there isn't a race condition during early boot where
> data might get dropped on the floor.
>
> May or may not help with various race conditions where early console
> output is not showing up in the logs and/or potentially being missed by
> wait_for_console_pattern.
>

This seems to improve the test I was having trouble with on s390
(machine_aarch64_sbsaref.py:Aarch64SbsarefMachine.test_sbsaref_edk2_firmware)

However it still fails sometimes, apparently we get the first line
of output from the guest but lose a couple of subsequent lines.
Here's a failing log:

2023-07-27 12:55:29,870 protocol         L0481 DEBUG| Transitioning
from 'Runstate.CONNECTING' to 'Runstate.RUNNING'.
2023-07-27 12:55:29,930 __init__         L0153 DEBUG| NOTICE:  Booting
Trusted Firmware
2023-07-27 12:55:29,955 __init__         L0153 DEBUG| NOTICE:  BL1: Booting BL2
2023-07-27 12:55:29,955 __init__         L0153 DEBUG| NOTICE:  BL2:
v2.9(release):v2.9.0-51-gc0d8ee386
2023-07-27 12:55:29,955 __init__         L0153 DEBUG| NOTICE:  BL2:
Built : 16:44:16, May 30 2023
2023-07-27 12:55:30,092 __init__         L0153 DEBUG| NOTICE:  BL1: Booting BL31
2023-07-27 12:55:30,092 __init__         L0153 DEBUG| NOTICE:  BL31:
v2.9(release):v2.9.0-51-gc0d8ee386
2023-07-27 12:55:30,092 __init__         L0153 DEBUG| NOTICE:  BL31:
Built : 16:44:16, May 30 2023
2023-07-27 12:55:30,092 __init__         L0153 DEBUG| UEFI firmware
(version 1.0 built at 17:14:57 on Mar 21 2023)
2023-07-27 12:55:45,281 __init__         L0153 DEBUG|
ESC[2JESC[04DESC[=3hESC[2JESC[09DESC[2JESC[04DESC[0mESC[30mESC[40m
2023-07-27 12:55:45,538 __init__         L0153 DEBUG|
ESC[01;01HESC[0mESC[34mESC[47m
2023-07-27 12:55:45,642 __init__         L0153 DEBUG|
ESC[01;01HESC[02;02HQEMU SBSA-REF MachineESC[03;02Harm-virtESC[44C2.00
GHzESC[04;02H1.0ESC[49C1024 MB
RAMESC[05;02HESC[52CESC[06;02HESC[52CESC[0mESC[37mESC[40mESC[21;01H
2023-07-27 12:55:45,840 __init__         L0153 DEBUG|
ESC[21;01HESC[0mESC[30mESC[40mESC[25;53H  ESC[01D  ESC[01D  ESC[01D
ESC[01D  ESC[01D  ESC[01D  ESC[01D  ESC[01D  ESC[01D  ESC[01D  ESC[01D
 ESC[01D  ESC[01D  ESC[01D  ESC[01D  ESC[01D  ESC[01D  ESC[01D
ESC[01D  ESC[01D  ESC[0mESC[30mESC[47mESC[07;01H
2023-07-27 12:55:45,926 __init__         L0153 DEBUG| ESC[07;01H
2023-07-27 12:55:46,032 __init__         L0153 DEBUG|
ESC[0mESC[37mESC[40mESC[30C<Standard English>ESC[0mESC[30mESC[47m
   ESC[57D   Select Language            ESC[0mESC[34mESC[47mESC[09;01H

ESC[0mESC[30mESC[47mESC[10;01H   ESC[02D>ESC[01CDevice Manager
                               ESC[11;01H   ESC[02D>ESC[01CBoot
Manager                                          ESC[12;01H
ESC[02D>ESC[01CBoot Maintenance Manager
ESC[0mESC[34mESC[47mESC[13;01H
                ESC[0mESC[30mESC[47mESC[14;01H   Continue
                                ESC[15;01H   Reset
                            ESC[16;01H
                        ESC[17;01H
                    ESC[18;01H
                ESC[19;01H
            ESC[20;01H
2023-07-27 12:55:46,067 __init__         L0153 DEBUG|
ESC[0mESC[37mESC[40mESC[23;02H ESC[22;02H ESC[50C
   ESC[51D                          ESC[23;53H
  ESC[77D^v=Move Highlight       ESC[22;03H
ESC[23;27H<Enter>=Select Entry      ESC[0mESC[34mESC[47mESC[08;58HThis
is the option
2023-07-27 12:55:46,076 __init__         L0153 DEBUG| ESC[57Cone
adjusts to change
2023-07-27 12:55:46,076 __init__         L0153 DEBUG| ESC[57Cthe
language for the
2023-07-27 12:55:46,080 __init__         L0153 DEBUG| ESC[57Ccurrent system
2023-07-27 12:55:46,085 __init__         L0153 DEBUG| ESC[57C
2023-07-27 12:55:46,085 __init__         L0153 DEBUG| ESC[57C
2023-07-27 12:55:46,087 __init__         L0153 DEBUG| ESC[57C
2023-07-27 12:55:46,090 __init__         L0153 DEBUG| ESC[57C
2023-07-27 12:55:46,094 __init__         L0153 DEBUG| ESC[57C
2023-07-27 12:55:46,094 __init__         L0153 DEBUG| ESC[57C
2023-07-27 12:55:46,100 __init__         L0153 DEBUG| ESC[57C
2023-07-27 12:55:46,101 __init__         L0153 DEBUG| ESC[57C
2023-07-27 12:58:25,379 stacktrace       L0039 ERROR|
2023-07-27 12:58:25,380 stacktrace       L0041 ERROR| Reproduced
traceback from:
/home/linux1/qemu/build/aarch64/tests/venv/lib/python3.8/site-packages/avocado/core/test.py:770
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR| Traceback (most
recent call last):
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|   File
"/home/linux1/qemu/build/aarch64/tests/venv/lib/python3.8/site-packages/avocado/core/decorators.py",
line 90, in wrapper
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|     return
function(obj, *args, **kwargs)
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|   File
"/home/linux1/qemu/build/aarch64/tests/avocado/machine_aarch64_sbsaref.py",
line 96, in test_sbsaref_edk2_firmware
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|
wait_for_console_pattern(self, "BL1: v2.9(release):v2.9")
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|   File
"/home/linux1/qemu/build/aarch64/tests/avocado/avocado_qemu/__init__.py",
line 199, in wait_for_console_pattern
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|
_console_interaction(test, success_message, failure_message, None,
vm=vm)
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|   File
"/home/linux1/qemu/build/aarch64/tests/avocado/avocado_qemu/__init__.py",
line 148, in _console_interaction
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|     msg =
console.readline().decode().strip()
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|   File
"/usr/lib/python3.8/socket.py", line 669, in readinto
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|     return
self._sock.recv_into(b)
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|   File
"/home/linux1/qemu/build/aarch64/tests/venv/lib/python3.8/site-packages/avocado/plugins/runner.py",
line 77, in sigterm_handler
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR|     raise
RuntimeError("Test interrupted by SIGTERM")
2023-07-27 12:58:25,381 stacktrace       L0045 ERROR| RuntimeError:
Test interrupted by SIGTERM

Compared to the output logs for a passing run, you can see that
although we captured the first line of output from the guest
("Booting Trusted Firmware") we missed lines 2 and 3 (the
BL1 version line and the BL1 Built: line).

2023-07-27 12:54:07,676 protocol         L0481 DEBUG| Transitioning
from 'Runstate.CONNECTING' to 'Runstate.RUNNING'.
2023-07-27 12:54:07,690 __init__         L0153 DEBUG| NOTICE:  Booting
Trusted Firmware
2023-07-27 12:54:07,691 __init__         L0153 DEBUG| NOTICE:  BL1:
v2.9(release):v2.9.0-51-gc0d8ee386
2023-07-27 12:54:07,691 __init__         L0153 DEBUG| BL1: Built :
16:44:16, May 30 2023
2023-07-27 12:54:07,754 __init__         L0153 DEBUG| NOTICE:  BL1: Booting BL2
2023-07-27 12:54:07,828 __init__         L0153 DEBUG| NOTICE:  BL2:
v2.9(release):v2.9.0-51-gc0d8ee386
2023-07-27 12:54:07,828 __init__         L0153 DEBUG| NOTICE:  BL2:
Built : 16:44:16, May 30 2023
2023-07-27 12:54:07,855 __init__         L0153 DEBUG| NOTICE:  BL1: Booting BL31
2023-07-27 12:54:07,877 __init__         L0153 DEBUG| NOTICE:  BL31:
v2.9(release):v2.9.0-51-gc0d8ee386
2023-07-27 12:54:07,883 __init__         L0153 DEBUG| NOTICE:  BL31:
Built : 16:44:16, May 30 2023
2023-07-27 12:54:07,891 __init__         L0153 DEBUG| UEFI firmware
(version 1.0 built at 17:14:57 on Mar 21 2023)
2023-07-27 12:54:18,354 __init__         L0153 DEBUG|
ESC[2JESC[04DESC[=3hESC[2JESC[09DESC[2JESC[04DESC[0mESC[30mESC[40m
2023-07-27 12:54:18,539 __init__         L0153 DEBUG|
ESC[01;01HESC[0mESC[34mESC[47m
2023-07-27 12:54:18,631 __init__         L0153 DEBUG|
ESC[01;01HESC[02;02HQEMU SBSA-REF MachineESC[03;02Harm-virtESC[44C2.00
GHzESC[04;02H1.0ESC[49C1024 MB
RAMESC[05;02HESC[52CESC[06;02HESC[52CESC[0mESC[37mESC[40mESC[21;01H
2023-07-27 12:54:18,639 machine          L0627 DEBUG| Shutting down VM
appliance; timeout=30
2023-07-27 12:54:18,639 machine          L0551 DEBUG| Attempting
graceful termination
2023-07-27 12:54:18,639 machine          L0518 DEBUG| Closing console socket
2023-07-27 12:54:18,640 machine          L0561 DEBUG| Politely asking
QEMU to terminate


thanks
-- PMM